Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MurmurHash3_x86_32 Algorithm Does Not match the one in Wikipedia #25

Open
GoogleCodeExporter opened this issue Mar 16, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

Your MurmurHash3 implementation uses the following tail code:

  switch(len & 3)
  {
  case 3: k1 ^= tail[2] << 16;
  case 2: k1 ^= tail[1] << 8;
  case 1: k1 ^= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };


But this does match the (otherwise identical) MurmurHash algorithm in Wikipedia:


    with any remainingBytesInKey
        remainingBytes <- SwapEndianOrderOf(remainingBytesInKey)
        // Note: Endian swapping is only necessary on big-endian machines.
        //       The purpose is to place the meaningful digits towards the low end of the value,
        //       so that these digits have the greatest potential to affect the low range digits
        //       in the subsequent multiplication.  Consider that locating the meaningful digits
        //       in the high range would produce a greater effect upon the high digits of the
        //       multiplication, and notably, that such high digits are likely to be discarded
        //       by the modulo arithmetic under overflow.  We don't want that.

        remainingBytes <- remainingBytes * c1
        remainingBytes <- (remainingBytes << r1) OR (remainingBytes >> (32 - r1))
        remainingBytes <- remainingBytes * c2

        hash <- hash XOR remainingBytes


To make the two versions match, you'd have to change the bitwise-xor in your 
implementation to bitwise-or, as follows:

  switch(len & 3)
  {
  case 3: k1 |= tail[2] << 16;
  case 2: k1 |= tail[1] << 8;
  case 1: k1 |= tail[0];
          k1 *= c1; k1 = ROTL32(k1,15); k1 *= c2; h1 ^= k1;
  };



I'm just wondering which implementation is correct.  I can't figure out where 
the canonical implementation of the algorithm is.


Original issue reported on code.google.com by Matthew....@gmail.com on 17 Aug 2013 at 7:07

@GoogleCodeExporter
Copy link
Author

Here's the Wikipedia link:
http://en.wikipedia.org/wiki/MurmurHash

Original comment by Matthew....@gmail.com on 17 Aug 2013 at 7:08

@GoogleCodeExporter
Copy link
Author

The one here is the canonical one. Wikipedia is always a copy of something 
else, and often out of date and/or incorrect.

Original comment by frank...@gmail.com on 3 Dec 2013 at 6:37

@GoogleCodeExporter
Copy link
Author

Since k1 starts of as 0, there is no difference in the result between XOR and 
OR.

To go even further, instead of OR-ing you can reverse the cases (1-2-3 instead 
of 3-2-1), and make each of them shift 8 to the right. All this switch does is 
make sure that the left-aligned bytes are shifted to the right and zero padded.

Original comment by ruud.alt...@gmail.com on 3 Feb 2015 at 2:07

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant