<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Javascript Spelling corrector</title>
	<atom:link href="http://nullpointers.wordpress.com/2008/08/28/javascript-spellchecker/feed/" rel="self" type="application/rss+xml" />
	<link>http://nullpointers.wordpress.com/2008/08/28/javascript-spellchecker/</link>
	<description>a tech rag-picker's bin</description>
	<lastBuildDate>Thu, 06 Aug 2009 09:33:04 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Intensity !</title>
		<link>http://nullpointers.wordpress.com/2008/08/28/javascript-spellchecker/#comment-46</link>
		<dc:creator>Intensity !</dc:creator>
		<pubDate>Mon, 01 Sep 2008 08:04:42 +0000</pubDate>
		<guid isPermaLink="false">http://nullpointers.wordpress.com/?p=178#comment-46</guid>
		<description>Will try and post the results.

Just a thought ... if Levenshtein would anyway account for edit distances &lt;=2, then there is no need to call the edits1( ) function anymore.


ok ... Implemented LD, there is&lt;strong&gt; tremendous improvement in performance&lt;/strong&gt;. Thanks for the pointers Dante.
I have re-uploaded the new source. I have retained both approaches for comparison.</description>
		<content:encoded><![CDATA[<p>Will try and post the results.</p>
<p>Just a thought &#8230; if Levenshtein would anyway account for edit distances &lt;=2, then there is no need to call the edits1( ) function anymore.</p>
<p>ok &#8230; Implemented LD, there is<strong> tremendous improvement in performance</strong>. Thanks for the pointers Dante.<br />
I have re-uploaded the new source. I have retained both approaches for comparison.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dante</title>
		<link>http://nullpointers.wordpress.com/2008/08/28/javascript-spellchecker/#comment-45</link>
		<dc:creator>Dante</dc:creator>
		<pubDate>Sun, 31 Aug 2008 16:15:15 +0000</pubDate>
		<guid isPermaLink="false">http://nullpointers.wordpress.com/?p=178#comment-45</guid>
		<description>Hi Intensity!
On thinking again I get why the reduced resultset should not 
be returned in edits1. Lets take &#039;we&#039; as the input. the 
transition we-&gt;wre-&gt;were would not happen if wordexists 
was applied in edits1.There are other consequences as well.

Why not try Levenshtein Distance Algorithm 

for each word in corpus
	if(len[word]-len[input] is not between -2 and 2) 
		continue;
	find edit distance
	if(edit distance &gt; 2)
		continue;
	result[word]=1
output word with max probability</description>
		<content:encoded><![CDATA[<p>Hi Intensity!<br />
On thinking again I get why the reduced resultset should not<br />
be returned in edits1. Lets take &#8216;we&#8217; as the input. the<br />
transition we-&gt;wre-&gt;were would not happen if wordexists<br />
was applied in edits1.There are other consequences as well.</p>
<p>Why not try Levenshtein Distance Algorithm </p>
<p>for each word in corpus<br />
	if(len[word]-len[input] is not between -2 and 2)<br />
		continue;<br />
	find edit distance<br />
	if(edit distance &gt; 2)<br />
		continue;<br />
	result[word]=1<br />
output word with max probability</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Intensity !</title>
		<link>http://nullpointers.wordpress.com/2008/08/28/javascript-spellchecker/#comment-44</link>
		<dc:creator>Intensity !</dc:creator>
		<pubDate>Sun, 31 Aug 2008 09:11:21 +0000</pubDate>
		<guid isPermaLink="false">http://nullpointers.wordpress.com/?p=178#comment-44</guid>
		<description>Hi Dante 

Yes, (2) is an optimization  that should be done. i will change the code and re-upload.
(3) is a good idea. Perhaps train method can be executed on the server side and the pre-processed corpus can be used on the client (browser).

Regarding (1), though it is true that calling wordexists on the result set of edits1 will improve the performance, I am not sure, if it will eliminate certain possible states from the overall results. Is there a way to ascertain that the reduced result set of edits1 would still provide the same set that Known_edits2() does when edits1 is not reduced (using wordexists). Maybe I will test it with brown corpus and analyze the impact of a reduced edits1.

Also, thanks for the pre-processed brown corpus file.</description>
		<content:encoded><![CDATA[<p>Hi Dante </p>
<p>Yes, (2) is an optimization  that should be done. i will change the code and re-upload.<br />
(3) is a good idea. Perhaps train method can be executed on the server side and the pre-processed corpus can be used on the client (browser).</p>
<p>Regarding (1), though it is true that calling wordexists on the result set of edits1 will improve the performance, I am not sure, if it will eliminate certain possible states from the overall results. Is there a way to ascertain that the reduced result set of edits1 would still provide the same set that Known_edits2() does when edits1 is not reduced (using wordexists). Maybe I will test it with brown corpus and analyze the impact of a reduced edits1.</p>
<p>Also, thanks for the pre-processed brown corpus file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dante</title>
		<link>http://nullpointers.wordpress.com/2008/08/28/javascript-spellchecker/#comment-43</link>
		<dc:creator>Dante</dc:creator>
		<pubDate>Sun, 31 Aug 2008 07:03:15 +0000</pubDate>
		<guid isPermaLink="false">http://nullpointers.wordpress.com/?p=178#comment-43</guid>
		<description>Some pointers ... 

1)Wordexists function should be applied to resultset before 

returning the resultset in edits1.Thats most probably affecting 

the performance of edits2.

2)Rewriting train(features)
	for(f in features)
	{
		if(model[features[f]])
			model[features[f]]+=1;
		else
			model[features[f]]=1;
	}

3)I dont think its a particularly good idea to import the corpus 

as I hope this is a part of web application. The simplest 

brown corpus is about 7 MB. Instead you can import the 

words and their frequency that have been already 

preprocessed. Here is the frequency list from brown corpus 

that I have developed for my other projects

http://www.mediafire.com/?sharekey=5cacc0942f2ed052d2db6fb9a8902bda</description>
		<content:encoded><![CDATA[<p>Some pointers &#8230; </p>
<p>1)Wordexists function should be applied to resultset before </p>
<p>returning the resultset in edits1.Thats most probably affecting </p>
<p>the performance of edits2.</p>
<p>2)Rewriting train(features)<br />
	for(f in features)<br />
	{<br />
		if(model[features[f]])<br />
			model[features[f]]+=1;<br />
		else<br />
			model[features[f]]=1;<br />
	}</p>
<p>3)I dont think its a particularly good idea to import the corpus </p>
<p>as I hope this is a part of web application. The simplest </p>
<p>brown corpus is about 7 MB. Instead you can import the </p>
<p>words and their frequency that have been already </p>
<p>preprocessed. Here is the frequency list from brown corpus </p>
<p>that I have developed for my other projects</p>
<p><a href="http://www.mediafire.com/?sharekey=5cacc0942f2ed052d2db6fb9a8902bda" rel="nofollow">http://www.mediafire.com/?sharekey=5cacc0942f2ed052d2db6fb9a8902bda</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>
