fuzzy-search-tools


Tools for fuzzy string search in text and dictionaries written in Java

Here are the most commonly used algorithms and auxiliary utilities for fuzzy (similarity) string search in large dictionaries written in Java.

  • Levenshtein Distance (with cutoff and prefix version)
  • Damerau-Levenshtein Distance (with cutoff and prefix version)
  • Extension (Spell-checker) Method
  • N-Gram Method (with some modifications)
  • Signature Hash Method
  • Bitap (Shift-Or with Wu-Manber modifications)
  • Burkhard-Keller (BK) Trees
  • Skip algorithm

All implementations are aimed to provide simplicity and clarity of algorithm's work.

Related articles are at http://ntz-develop.blogspot.com/

You can checkout sources from svn repository at http://code.google.com/p/fuzzy-search-tools/source/browse/ or download source snapshot at http://code.google.com/p/fuzzy-search-tools/downloads/list

Project Information

  • License: GNU GPL v3
  • 30 stars
  • svn-based source control

Labels:
Algorithm Search Fuzzy Levenstein NGram SignatureHash Damerau Similarity Bitap ShiftOr Java Spellcheck BKTree