My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads
Links

This is a set of C++ classes implementing various recursive n-gram hashing techniques, also called rolling hashing (http://en.wikipedia.org/wiki/Rolling_hash), including:

  • Randomized Karp-Rabin (sometimes called Rabin-Karp)
  • Hashing by Cyclic Polynomials (also known as Buzhash)
  • Hashing by Irreducible Polynomials

Code sample:

	const uint n(3);//n-grams
        const uint L(7); // you need 7 bits
	CyclicHash hf(n,L );
        for(uint32 k = 0; k<n;++k) {
	  chartype c = ... ; // grab some character
	  hf.eat(c); // feed it to the hasher
	}
        while(...) { // go over your string
                chartype c = ... ;// point to the next character
                chartype out = ...; // character we want to forget
		hf.update(out,c); // update hash value
                hf.hashvalue; // at all times, this contains the hash value
        }
        

Github:

Please see https://github.com/lemire/rollinghashcpp for the Github repository.

Reference:

  • Daniel Lemire, Owen Kaser: Recursive n-gram hashing is pairwise independent, at best, Computer Speech & Language, Volume 24, Issue 4, October 2010, Pages 698-710 arXiv:0705.4676 http://arxiv.org/abs/0705.4676
Powered by Google Project Hosting