Javascript Spelling corrector

I have written a Javascript spellchecker/corrector based on Peter Norvig’s Python 21-liner.

Click here to view my code. It is still not perfect. Comments, bug reports welcome.

Thanks to V for speeding this up.

Updated : Suggestions made by Dante (see comments) have been incorporated and the new code has been uploaded.

4 Responses to “Javascript Spelling corrector”

  1. Dante Says:

    Some pointers …

    1)Wordexists function should be applied to resultset before

    returning the resultset in edits1.Thats most probably affecting

    the performance of edits2.

    2)Rewriting train(features)
    for(f in features)
    {
    if(model[features[f]])
    model[features[f]]+=1;
    else
    model[features[f]]=1;
    }

    3)I dont think its a particularly good idea to import the corpus

    as I hope this is a part of web application. The simplest

    brown corpus is about 7 MB. Instead you can import the

    words and their frequency that have been already

    preprocessed. Here is the frequency list from brown corpus

    that I have developed for my other projects

    http://www.mediafire.com/?sharekey=5cacc0942f2ed052d2db6fb9a8902bda

  2. Intensity ! Says:

    Hi Dante

    Yes, (2) is an optimization that should be done. i will change the code and re-upload.
    (3) is a good idea. Perhaps train method can be executed on the server side and the pre-processed corpus can be used on the client (browser).

    Regarding (1), though it is true that calling wordexists on the result set of edits1 will improve the performance, I am not sure, if it will eliminate certain possible states from the overall results. Is there a way to ascertain that the reduced result set of edits1 would still provide the same set that Known_edits2() does when edits1 is not reduced (using wordexists). Maybe I will test it with brown corpus and analyze the impact of a reduced edits1.

    Also, thanks for the pre-processed brown corpus file.

  3. Dante Says:

    Hi Intensity!
    On thinking again I get why the reduced resultset should not
    be returned in edits1. Lets take ‘we’ as the input. the
    transition we->wre->were would not happen if wordexists
    was applied in edits1.There are other consequences as well.

    Why not try Levenshtein Distance Algorithm

    for each word in corpus
    if(len[word]-len[input] is not between -2 and 2)
    continue;
    find edit distance
    if(edit distance > 2)
    continue;
    result[word]=1
    output word with max probability

  4. Intensity ! Says:

    Will try and post the results.

    Just a thought … if Levenshtein would anyway account for edit distances <=2, then there is no need to call the edits1( ) function anymore.

    ok … Implemented LD, there is tremendous improvement in performance. Thanks for the pointers Dante.
    I have re-uploaded the new source. I have retained both approaches for comparison.

Leave a Reply