My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads

Here are the most commonly used algorithms and auxiliary utilities for fuzzy (similarity) string search in large dictionaries written in Java.

  • Levenshtein Distance (with cutoff and prefix version)
  • Damerau-Levenshtein Distance (with cutoff and prefix version)
  • Extension (Spell-checker) Method
  • N-Gram Method (with some modifications)
  • Signature Hash Method
  • Bitap (Shift-Or with Wu-Manber modifications)
  • Burkhard-Keller (BK) Trees
  • Skip algorithm

All implementations are aimed to provide simplicity and clarity of algorithm's work.

Related articles are at http://ntz-develop.blogspot.com/

Powered by Google Project Hosting