My favorites | Sign in
Project Home Wiki Issues Source
Search
for
MurmurHash3  
MurmurHash3 information and brief performance results
Updated Apr 3, 2011 by aappl...@google.com

Introduction

(Note - this information is a bit stale, and refers to an earlier version of Murmur3. I'll rewrite it when I have time.)

MurmurHash3 is the successor to MurmurHash2. It comes in 3 variants - a 32-bit version that targets low latency for hash table use and two 128-bit versions for generating unique identifiers for large blocks of data, one each for x86 and x64 platforms.

Details

MurmurHash3's mix functions are based on this snippet -

k *= c1; 
k = rotl(k,r1); 
k *= c2;

h ^= k;

h = rotl(h,r1);	
h = h*m1+n1;

'k' is a block of the key, 'h' is a block of the hash state, and 'rN'/'mN'/'cN' are constants.

For each block of the key, we pre-mix it using two constants and a rotate, xor it into the hash block, and them mix the hash block using a rotate and a multiply-add.

MurmurHash3's 32-bit finalizer is

h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;

and its 64-bit finalizer is

h ^= h >> 33;
h *= 0xff51afd7ed558ccd;
h ^= h >> 33;
h *= 0xc4ceb9fe1a85ec53;
h ^= h >> 33;

The constants for the finalizers were generated by a simple simulated-annealing algorithm, and both avalanche all bits of 'h' to within 0.25% bias.

Block inter-mixing

The 128-bit variants mix multiple blocks of key data in parallel. To ensure that all the intermediate hash blocks affect each other, Murmurhash3 does a few simple operations interleaved with the block mix -

The 64-bit, 2-block inter-mix is

h1 += h2;
h2 = _rotl64(h2,41);
h2 += h1;

The 32-bit, 4-block inter-mix is

h1 += h2;
h1 += h3;
h1 += h4;
h1 = _rotl(h1,17);
h2 += h1;
h3 += h1;
h4 += h1;

where 'hN' is one block of the hash value.

Bulk speed test, hashing an 8-byte-aligned 256k block

Results are from an Intel Core 2 Quad Q9650 running at 3.0 ghz, running on a single core.

FNV_x86_32           -  554 mb/sec 
FNV_x64_32           -  715 mb/sec
SuperFastHash_x86_32 - 1224 mb/sec (1) 
SuperFastHash_x64_32 - 1311 mb/sec
Lookup3_x86_32       - 1234 mb/sec 
Lookup3_x64_32       - 1265 mb/sec

MurmurHash2_x86_32   - 2577 mb/sec
MurmurHash2_x86_64   - 3352 mb/sec (2)

MurmurHash2_x64_64   - 2857 mb/sec

MurmurHash3_x86_32   - 3105 mb/sec 
MurmurHash3_x86_128  - 2684 mb/sec
MurmurHash3_x64_128  - 5058 mb/sec (3)

(1) - SuperFastHash has very poor collision properties, which have been documented elsewhere.

(2) - MurmurHash2_x86_64 computes two 32-bit results in parallel and mixes them at the end, which is fast but means that collision resistance is only as good as a 32-bit hash. I suggest avoiding this variant.

(3) - That's about 1.68 bytes per cycle, or about 9.5 cycles per 16-byte chunk. The inner loop is 20 instructions long, so we're sustaining over 2 instructions per cycle. Hooray for modern platforms with fast 64-bit multipliers and superscalar architectures. :)

Comment by sebastia...@gmail.com, Nov 14, 2010

The macro rotl() appears to be part of Visual C++ dialect. It would be nicer if you could provide code that compiles with any compiler (i.e., ISO standard).

Comment by project member tanj...@gmail.com, Nov 14, 2010

Cross-platform support is in the works, but I'm a bit pressed for time - the platform-specific stuff should mostly be restricted to rotl and some timing / setup code.

Comment by snogglethorpe, Nov 18, 2010

gcc will turn an expression like:

   (NUM << AMOUNT) | (NUM >> (32 - AMOUNT))

into a single "roll" instruction on x86-type processors (ok, sometimes it will actually use "rorl" with a count of (32 - amount), but same difference).

So you could use a portable macro like:

#define ROTL32(num,amount) (((num) << (amount)) | ((num) >> (32 - (amount))))

and that should work a treat on gcc (and perhaps other compilers).

Comment by snogglethorpe, Nov 18, 2010

p.s., note that only works if "num" is a 32-bit type; for gcc on x86_64 (and x86), that's true for int / unsigned int, but one might want to throw in a few casts to make it explicit.

(or even better, use an inline function, but I guess those aren't really portable in C...)

Comment by simo...@gmail.com, Dec 29, 2010

Would you mind kindly adding this fast SSE2-based hash to your list of tested hash functions? http://cessu.blogspot.com/2008/11/hashing-with-sse2-revisited-or-my-hash.html

Comment by mqu...@gmail.com, Jan 20, 2011

And instead of rotl64, just use 64 where it says 32.

Though I'm using an inline functions instead (LLVM compiler on OS X):

uint32_t inline rotl(uint32_t value, int8_t amount) {

return ((value) << (amount)) | ((value) >> (32 - (amount)));
}

uint64_t inline rotl64(uint64_t value, int8_t amount) {

return ((value) << (amount)) | ((value) >> (64 - (amount)));
}

Comment by nuskooler, Jan 27, 2011

To get this working on Android/NDK (GCC 4.4.0) I had to do the following:

  1. I utilized the ROTL stuff from the above comment
  2. Create a FORCE_INLINE define for where forceinline was used:
  3. #if defined(_MSC_VER)
    	#define FORCE_INLINE	__forceinline
    #else
    	#define	FORCE_INLINE	__attribute__((always_inline)) 
    #endif

Note that I have not yet done any real testing, so YMMV :)

Comment by fullung@gmail.com, Feb 7, 2011

This header file is quite useful for a portable rotl and rotl64:

https://www.cosic.esat.kuleuven.be/nessie/testvectors/test/portable.h

Comment by JSpren...@gmail.com, Mar 10, 2011

If anyone is interested I've ported the murmur3 hash to mingw gcc and it seems to be working.

Comment by a.goed...@googlemail.com, Mar 13, 2011

What would be a good way to use the mix functions to hash 32/64 bit integers? Its sure possible to simply use bmix32 twice and fmix32 for a 64 bit int. But I guess there are some steps that are not really necessary then.

Comment by lancer6...@gmail.com, Mar 14, 2011

How do I generate 64-bit hash values? Looking at MurmurHash3_x64_128, it returns the 128-bit hash value in 2 uint64_t values, h1 and h2. So do I simply take either h1 or h2? Which one should I take?

Comment by project member tanj...@gmail.com, Mar 14, 2011

@a.goed - Use the finalization mix functions fmix32/fmix64 - they do exactly what you want.

@lancer6 - Yep, just throw away whatever bits you don't use. All bits of the output should be equally well mixed.

Comment by jonathan...@gmail.com, Mar 27, 2011

I'm a bit confused about the results I'm getting. I got the latest trunk (r96) and ran smhasher on Murmur3C and Murmur3F on a Core i5 760 system thinking the 64bit Murmur3F should run faster.

Under the bulk speed test, Murmur3C averages 2363.28 MiB/sec while Murmur3F averages 452.15 MiB/sec. In all of the small key speed tests, Murmur3F also used roughly 3 times as many cycles per hash on average.

I'm just trying to figure out why my results are so different from those shown.

Comment by project member tanj...@gmail.com, Mar 27, 2011

Jonathan - That's almost definitely due to your using a 32-bit compiler - Murmur3F needs to be compiled in x64 mode, otherwise the 64-bit math operations are done as function calls (verrrry slow). In Visual Studio this is done by creating a new build configuration, but I'm not sure how things will work with the new CMake setup.

-Austin

Comment by jonathan...@gmail.com, Mar 27, 2011

That was exactly the problem. I knew I was overlooking something simple.

Thanks!

Comment by shane.a....@gmail.com, May 10, 2011

What's the intended purpose of the seed param? I'm considering implementing a progressive MurmurHash3_x86_32 in pure ANSI-C. In which case a similar effect to the seed can be achieved by appending it at the end as regular data just before the finalisation stage. This also allows generating multiple hashes with different "seeds" without reprocessing the entire byte stream.

Comment by abhay...@gmail.com, Jul 27, 2011

Hello,JSpren...@gmail.com,

can you send me your "c" implementation of murmurhash3?

abhay180@yahoo.com

Comment by abhay...@gmail.com, Aug 1, 2011

Hello,

Has somebody implmented the Murmurhash3 in "c". I want to take a look at the code

Comment by sketer...@gmail.com, Dec 10, 2011

Here's a C version of MurmurHash 3, with some handy API documentation and an example program:

https://github.com/PeterScott/murmur3

Comment by risingba...@gmail.com, Feb 6, 2012

To make the .cpp file Borland C++ compatible:

#if defined(MSC_VER)

...

// Other compilers

#else // defined(MSC_VER)

#ifdef BORLANDC #define FORCE_INLINE inline #else #define FORCE_INLINE attribute((always_inline)) #endif

...

#endif

Comment by denis.s...@gmail.com, Apr 20, 2012

To Peter Scott: Hello, I started using your C version in a personal project. Thank you very much! (You may contact me if you like: denis DOT spir AT gmail DOT com. I may also have some question.) Denis

Comment by khub...@gmail.com, Jun 7, 2012

Hi, why didn't you do something like the following for non-MSC compilers? Or is it that other compilers optimize it out, and Visual Studio requires the intrinsic? I'm was just wondering about the oddness.

#if defined(i386)

|| defined(x86_64)
defined(X86)

inline uint32_t rotl32 ( uint32_t x, int8_t r ) {

asm volatile("rol %b1, %0" : "+mr" (x) : "Ic" (r)); return x;
}

inline uint64_t rotl64 ( uint64_t x, int8_t r ) {

asm volatile("rol %b1, %0" : "+mr" (x) : "Jc" (r)); return x;
}

#else .. old code ..

Comment by kxiao.ti...@gmail.com, Jun 20, 2012

how to get the binary such as md5sum?

Comment by project member aappl...@google.com, Jun 20, 2012

@khubert - Virtually all compilers are smart enough to convert the shifts and ors into a rotate instruction (you can verify this by dissassembling the function in GDB or Visual Studio), so adding the assembly doesn't gain anything.

Comment by gianluca...@gmail.com, Jul 2, 2012

Hi guys,

how can I write an incremental version of murmur3 x64_128, that combines hashing results in one final hash; Keeping the final result collition free?

Many Thanks

Comment by gianluca...@gmail.com, Jul 2, 2012

How about using a previous hash value to initialize h1 and h2 instead of using a seed? Will this work?

thanks

Comment by t.brando...@gmail.com, Aug 12, 2012

I am curious why the constants used in the body of x64_128 (n1 in the Details section of the wiki) are only 32bits. They are also 32 bits for the x86 versions, so it makes me think they would be 64 bits for the x64 version.

Comment by lpar...@gmail.com, Sep 12, 2012

Be careful with the macros/inline functions pasted above:

(NUM << AMOUNT) | (NUM >> (32 - AMOUNT))

(NUM << AMOUNT) | (NUM >> (64 - AMOUNT))

These won't work correctly if AMOUNT > 32 or 64, respectively. The correct versions are:

(NUM << (AMOUNT & 31)) | (NUM >> (32 - (AMOUNT & 31)))

(NUM << (AMOUNT & 63)) | (NUM >> (64 - (AMOUNT & 63)))

Comment by Ariel....@gmail.com, Dec 27, 2012

Hi guys,

I've just included an incremental implementation of MurmurHash3 in one of my projects an stumbled upon this: http://emboss.github.com/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/

Comment by brenda...@gmail.com, Mar 29, 2014

I was just bitten by the fact that the block count values are all signed 32 bit, and this is a problem when hashing very large data (> 2 GB). Values should be changed to size_t:

const int nblocks = len / 16; // <--- should be const size_t

FORCE_INLINE uint64_t getblock64 ( const uint64_t p, int i ) // <- 32 bit index should be size_t {

return pi?;
}

Comment by lobais, Jul 10, 2014

Have you considered testing against tabulation hashing as well?

Comment by leclerc....@gmail.com, Oct 12, 2014

Isn't the 32b and 64b accesses (uint32_t k1 = getblock32(blocks,i);) breaking the strict-aliasing rule ? Shouldn't memcpy be used instead (memcpy(&k1, data + i*4, 4);) ?

Comment by mar...@greenrobot.de, Nov 5, 2014

If you need a fast Java implementation of Murmur3A or Murmur3F, check out my new open source project: https://github.com/greenrobot/greenrobot-common

Murmur3F does 3.7 GB/s using sun.misc.Unsafe, which is the fastest implementation known to me. For more information including performance benchmarks, have a look at https://github.com/greenrobot/greenrobot-common/blob/master/hash-functions.md

Powered by Google Project Hosting