|
MurmurHash3
MurmurHash3 information and brief performance results
Introduction(Note - this information is a bit stale, and refers to an earlier version of Murmur3. I'll rewrite it when I have time.) MurmurHash3 is the successor to MurmurHash2. It comes in 3 variants - a 32-bit version that targets low latency for hash table use and two 128-bit versions for generating unique identifiers for large blocks of data, one each for x86 and x64 platforms. DetailsMurmurHash3's mix functions are based on this snippet - k *= c1; k = rotl(k,r1); k *= c2; h ^= k; h = rotl(h,r1); h = h*m1+n1; 'k' is a block of the key, 'h' is a block of the hash state, and 'rN'/'mN'/'cN' are constants. For each block of the key, we pre-mix it using two constants and a rotate, xor it into the hash block, and them mix the hash block using a rotate and a multiply-add. MurmurHash3's 32-bit finalizer is h ^= h >> 16; h *= 0x85ebca6b; h ^= h >> 13; h *= 0xc2b2ae35; h ^= h >> 16; and its 64-bit finalizer is h ^= h >> 33; h *= 0xff51afd7ed558ccd; h ^= h >> 33; h *= 0xc4ceb9fe1a85ec53; h ^= h >> 33; The constants for the finalizers were generated by a simple simulated-annealing algorithm, and both avalanche all bits of 'h' to within 0.25% bias. Block inter-mixingThe 128-bit variants mix multiple blocks of key data in parallel. To ensure that all the intermediate hash blocks affect each other, Murmurhash3 does a few simple operations interleaved with the block mix - The 64-bit, 2-block inter-mix is h1 += h2; h2 = _rotl64(h2,41); h2 += h1; The 32-bit, 4-block inter-mix is h1 += h2; h1 += h3; h1 += h4; h1 = _rotl(h1,17); h2 += h1; h3 += h1; h4 += h1; where 'hN' is one block of the hash value. Bulk speed test, hashing an 8-byte-aligned 256k blockResults are from an Intel Core 2 Quad Q9650 running at 3.0 ghz, running on a single core. FNV_x86_32 - 554 mb/sec FNV_x64_32 - 715 mb/sec SuperFastHash_x86_32 - 1224 mb/sec (1) SuperFastHash_x64_32 - 1311 mb/sec Lookup3_x86_32 - 1234 mb/sec Lookup3_x64_32 - 1265 mb/sec MurmurHash2_x86_32 - 2577 mb/sec MurmurHash2_x86_64 - 3352 mb/sec (2) MurmurHash2_x64_64 - 2857 mb/sec MurmurHash3_x86_32 - 3105 mb/sec MurmurHash3_x86_128 - 2684 mb/sec MurmurHash3_x64_128 - 5058 mb/sec (3) (1) - SuperFastHash has very poor collision properties, which have been documented elsewhere. (2) - MurmurHash2_x86_64 computes two 32-bit results in parallel and mixes them at the end, which is fast but means that collision resistance is only as good as a 32-bit hash. I suggest avoiding this variant. (3) - That's about 1.68 bytes per cycle, or about 9.5 cycles per 16-byte chunk. The inner loop is 20 instructions long, so we're sustaining over 2 instructions per cycle. Hooray for modern platforms with fast 64-bit multipliers and superscalar architectures. :) |
The macro rotl() appears to be part of Visual C++ dialect. It would be nicer if you could provide code that compiles with any compiler (i.e., ISO standard).
Cross-platform support is in the works, but I'm a bit pressed for time - the platform-specific stuff should mostly be restricted to rotl and some timing / setup code.
gcc will turn an expression like:
into a single "roll" instruction on x86-type processors (ok, sometimes it will actually use "rorl" with a count of (32 - amount), but same difference).
So you could use a portable macro like:
and that should work a treat on gcc (and perhaps other compilers).
p.s., note that only works if "num" is a 32-bit type; for gcc on x86_64 (and x86), that's true for int / unsigned int, but one might want to throw in a few casts to make it explicit.
(or even better, use an inline function, but I guess those aren't really portable in C...)
Would you mind kindly adding this fast SSE2-based hash to your list of tested hash functions? http://cessu.blogspot.com/2008/11/hashing-with-sse2-revisited-or-my-hash.html
And instead of rotl64, just use 64 where it says 32.
Though I'm using an inline functions instead (LLVM compiler on OS X):
uint32_t inline rotl(uint32_t value, int8_t amount) { }
uint64_t inline rotl64(uint64_t value, int8_t amount) { }
To get this working on Android/NDK (GCC 4.4.0) I had to do the following:
Note that I have not yet done any real testing, so YMMV :)
This header file is quite useful for a portable rotl and rotl64:
https://www.cosic.esat.kuleuven.be/nessie/testvectors/test/portable.h
If anyone is interested I've ported the murmur3 hash to mingw gcc and it seems to be working.
What would be a good way to use the mix functions to hash 32/64 bit integers? Its sure possible to simply use bmix32 twice and fmix32 for a 64 bit int. But I guess there are some steps that are not really necessary then.
How do I generate 64-bit hash values? Looking at MurmurHash3_x64_128, it returns the 128-bit hash value in 2 uint64_t values, h1 and h2. So do I simply take either h1 or h2? Which one should I take?
@a.goed - Use the finalization mix functions fmix32/fmix64 - they do exactly what you want.
@lancer6 - Yep, just throw away whatever bits you don't use. All bits of the output should be equally well mixed.
I'm a bit confused about the results I'm getting. I got the latest trunk (r96) and ran smhasher on Murmur3C and Murmur3F on a Core i5 760 system thinking the 64bit Murmur3F should run faster.
Under the bulk speed test, Murmur3C averages 2363.28 MiB/sec while Murmur3F averages 452.15 MiB/sec. In all of the small key speed tests, Murmur3F also used roughly 3 times as many cycles per hash on average.
I'm just trying to figure out why my results are so different from those shown.
Jonathan - That's almost definitely due to your using a 32-bit compiler - Murmur3F needs to be compiled in x64 mode, otherwise the 64-bit math operations are done as function calls (verrrry slow). In Visual Studio this is done by creating a new build configuration, but I'm not sure how things will work with the new CMake setup.
-Austin
That was exactly the problem. I knew I was overlooking something simple.
Thanks!
What's the intended purpose of the seed param? I'm considering implementing a progressive MurmurHash3_x86_32 in pure ANSI-C. In which case a similar effect to the seed can be achieved by appending it at the end as regular data just before the finalisation stage. This also allows generating multiple hashes with different "seeds" without reprocessing the entire byte stream.
Hello,JSpren...@gmail.com,
can you send me your "c" implementation of murmurhash3?
abhay180@yahoo.com
Hello,
Has somebody implmented the Murmurhash3 in "c". I want to take a look at the code
Here's a C version of MurmurHash 3, with some handy API documentation and an example program:
https://github.com/PeterScott/murmur3
To make the .cpp file Borland C++ compatible:
#if defined(MSC_VER)
...
// Other compilers
#else // defined(MSC_VER)
#ifdef BORLANDC #define FORCE_INLINE inline #else #define FORCE_INLINE attribute((always_inline)) #endif
...
#endif
To Peter Scott: Hello, I started using your C version in a personal project. Thank you very much! (You may contact me if you like: denis DOT spir AT gmail DOT com. I may also have some question.) Denis
Hi, why didn't you do something like the following for non-MSC compilers? Or is it that other compilers optimize it out, and Visual Studio requires the intrinsic? I'm was just wondering about the oddness.
#if defined(i386)
inline uint32_t rotl32 ( uint32_t x, int8_t r ) {
}inline uint64_t rotl64 ( uint64_t x, int8_t r ) {
}#else .. old code ..
how to get the binary such as md5sum?
@khubert - Virtually all compilers are smart enough to convert the shifts and ors into a rotate instruction (you can verify this by dissassembling the function in GDB or Visual Studio), so adding the assembly doesn't gain anything.
Hi guys,
how can I write an incremental version of murmur3 x64_128, that combines hashing results in one final hash; Keeping the final result collition free?
Many Thanks
How about using a previous hash value to initialize h1 and h2 instead of using a seed? Will this work?
thanks
I am curious why the constants used in the body of x64_128 (n1 in the Details section of the wiki) are only 32bits. They are also 32 bits for the x86 versions, so it makes me think they would be 64 bits for the x64 version.
Be careful with the macros/inline functions pasted above:
(NUM << AMOUNT) | (NUM >> (32 - AMOUNT))
(NUM << AMOUNT) | (NUM >> (64 - AMOUNT))
These won't work correctly if AMOUNT > 32 or 64, respectively. The correct versions are:
(NUM << (AMOUNT & 31)) | (NUM >> (32 - (AMOUNT & 31)))
(NUM << (AMOUNT & 63)) | (NUM >> (64 - (AMOUNT & 63)))
Hi guys,
I've just included an incremental implementation of MurmurHash3 in one of my projects an stumbled upon this: http://emboss.github.com/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/
I was just bitten by the fact that the block count values are all signed 32 bit, and this is a problem when hashing very large data (> 2 GB). Values should be changed to size_t:
const int nblocks = len / 16; // <--- should be const size_t
FORCE_INLINE uint64_t getblock64 ( const uint64_t p, int i ) // <- 32 bit index should be size_t { }
Have you considered testing against tabulation hashing as well?
Isn't the 32b and 64b accesses (uint32_t k1 = getblock32(blocks,i);) breaking the strict-aliasing rule ? Shouldn't memcpy be used instead (memcpy(&k1, data + i*4, 4);) ?
If you need a fast Java implementation of Murmur3A or Murmur3F, check out my new open source project: https://github.com/greenrobot/greenrobot-common
Murmur3F does 3.7 GB/s using sun.misc.Unsafe, which is the fastest implementation known to me. For more information including performance benchmarks, have a look at https://github.com/greenrobot/greenrobot-common/blob/master/hash-functions.md