What is the algorithm class for identifying spelling errors without the use of a dictionary?

Question

Please consider the following situation: in a segment of text, there are five occurrences of the string "slnFile", and one occurrence of "snlFile". The latter is a misspelling, but note that the former word is not in a proper dictionary ("slnFile" is a variable name indicating a "visual studio solution file", making sense only to the author of the text segment).

I can think of a simple spell checking implementation myself: find all word pairs in text segment where spelling differs by one character, indicate any word with a frequency count of 1 as suspect. (I know this is not a perfect solution.)

My question: what is the name for the class of algorithms that deal with this problem?

necromancer · Accepted Answer · 2013-09-10 21:33:33Z

2

Calculate the Damerau-Levenshtein distance between all words in the vocabulary. Flag those that occur very infrequently and have a particularly small distance to a word that occured frequently.

answered Sep 10, 2013 at 21:33

necromancer

24.7k22 gold badges72 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sabuncu Over a year ago

+1 Thank you, this is what I needed. The Wikipedia article is full of useful links. Will accept as answer as soon as SO lets me.

necromancer Over a year ago

Glad that it was helpful!

Jim Mischel Over a year ago

+1. A refinement would be to do a frequency count on all the words, and then only do the distance calculation for infrequent words (i.e. 200 words, 3 occur infrequently, so you do approximately 600 distance comparisons rather than 20,000). There's probably no need to compare two frequently-occurring words.

Collectives™ on Stack Overflow

What is the algorithm class for identifying spelling errors without the use of a dictionary?

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related