Skip to main content
1 of 2
Lance Pollard
  • 2.8k
  • 1
  • 23
  • 41

What do you do with the hash code after running a word through the Double Metaphone algorithm used in fuzzy text search?

I found this Python code implementing Double Metaphone, and it basically has a bunch of if statements handling each letter, though I'm not exactly sure the decisions that went into each branch yet. But the output are two hashes of consonants it seems.

result = doublemetaphone('Jose')
self.assertEquals(result, ('HS', ''))
result = doublemetaphone('cambrillo')
self.assertEquals(result, ('KMPRL', 'KMPR'))
result = doublemetaphone('otto')
self.assertEquals(result, ('AT', ''))
result = doublemetaphone('aubrey')
self.assertEquals(result, ('APR', ''))
result = doublemetaphone('maurice')
self.assertEquals(result, ('MRS', ''))
result = doublemetaphone('auto')
self.assertEquals(result, ('AT', ''))
result = doublemetaphone('maisey')
self.assertEquals(result, ('MS', ''))
result = doublemetaphone('catherine')
self.assertEquals(result, ('K0RN', 'KTRN'))

The question I have now is, what are you supposed to do with these hashes? Say I have 1 million words and I convert them all to hashes, do I put them into some sort of trie, and just do a basic trie lookup check somehow? If so, roughly what goes on there? If not, what is actually done with the hashes to make the Metaphone algorithm work on large dictionary datasets?

Lance Pollard
  • 2.8k
  • 1
  • 23
  • 41