
fuzzy-search-tools
Here are the most commonly used algorithms and auxiliary utilities for fuzzy (similarity) string search in large dictionaries written in Java.
- Levenshtein Distance (with cutoff and prefix version)
- Damerau-Levenshtein Distance (with cutoff and prefix version)
- Extension (Spell-checker) Method
- N-Gram Method (with some modifications)
- Signature Hash Method
- Bitap (Shift-Or with Wu-Manber modifications)
- Burkhard-Keller (BK) Trees
- Skip algorithm
All implementations are aimed to provide simplicity and clarity of algorithm's work.
Related articles are at http://ntz-develop.blogspot.com/
You can checkout sources from svn repository at http://code.google.com/p/fuzzy-search-tools/source/browse/ or download source snapshot at http://code.google.com/p/fuzzy-search-tools/downloads/list
Project Information
- License: GNU GPL v3
- 30 stars
- svn-based source control
Labels:
Algorithm
Search
Fuzzy
Levenstein
NGram
SignatureHash
Damerau
Similarity
Bitap
ShiftOr
Java
Spellcheck
BKTree