MurmurHash
{{Short description|Computer function}}
{{Use dmy dates|date=January 2021}}
MurmurHash is a non-cryptographic hash function suitable for general hash-based lookup.{{cite web |url=http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |title=Hadoop in Java |publisher=Hbase.apache.org |date=24 July 2011 |accessdate=13 January 2012 |url-status=dead |archiveurl=https://web.archive.org/web/20120112023407/http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |archivedate=12 January 2012}}
[http://materias.fi.uba.ar/7500/chouza-tesisingenieriainformatica.pdf Chouza et al].
{{cite web|url=http://www.inesc-id.pt/ficheiros/publicacoes/5453.pdf |title=Couceiro et al. |language=pt |accessdate=13 January 2012
|page= 14
}}
It was created by Austin Appleby in 2008{{cite web|author=Tanjent (tanjent) wrote,3 March 2008 13:31:00 |url=http://tanjent.livejournal.com/756623.html |title=MurmurHash first announcement |publisher=Tanjent.livejournal.com |accessdate=13 January 2012}} and, as of 8 January 2016,{{cite web|author=Austin Appleby|url=https://github.com/aappleby/smhasher|title=SMHasher|publisher=Github.com|accessdate=23 September 2024}} is hosted on GitHub along with its test suite named SMHasher. It also exists in a number of variants,{{cite web|url=http://simonhf.wordpress.com/2010/09/25/murmurhash160/ |title=MurmurHash2-160 |publisher=Simonhf.wordpress.com |date=25 September 2010 |accessdate=13 January 2012}} all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
Unlike cryptographic hash functions, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
Variants
=MurmurHash1=
The original MurmurHash was created as an attempt to make a faster function than Lookup3.{{cite web |title=MurmurHash1 |website=GitHub |url=https://github.com/aappleby/smhasher/wiki/MurmurHash1 |accessdate=12 January 2019}} Although successful, it had not been tested thoroughly and was not capable of providing 64-bit hashes as in Lookup3. Its design would be later built upon in MurmurHash2, combining a multiplicative hash (similar to the Fowler–Noll–Vo hash function) with an Xorshift.
=MurmurHash2=
MurmurHash2{{cite web|title=MurmurHash2 on Github|website=GitHub |url=https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp}} yields a 32- or 64-bit value. It comes in multiple variants, including some that allow incremental hashing and aligned or neutral versions.
- MurmurHash2 (32-bit, x86)—The original version; contains a flaw that weakens collision in some cases.{{cite web |title=MurmurHash2Flaw |website=GitHub |url=https://github.com/aappleby/smhasher/wiki/MurmurHash2Flaw |accessdate=15 January 2019}}
- MurmurHash2A (32-bit, x86)—A fixed variant using Merkle–Damgård construction. Slightly slower.
- CMurmurHash2A (32-bit, x86)—MurmurHash2A, but works incrementally.
- MurmurHashNeutral2 (32-bit, x86)—Slower, but endian- and alignment-neutral.
- MurmurHashAligned2 (32-bit, x86)—Slower, but does aligned reads (safer on some platforms).
- MurmurHash64A (64-bit, x64)—The original 64-bit version. Optimized for 64-bit arithmetic.
- MurmurHash64B (64-bit, x86)—A 64-bit version optimized for 32-bit platforms. It is not a true 64-bit hash due to insufficient mixing of the stripes.{{cite web |title=MurmurHash3 (see note on MurmurHash2_x86_64) |website=GitHub |url=https://github.com/aappleby/smhasher/wiki/MurmurHash3#bulk-speed-test-hashing-an-8-byte-aligned-256k-block |accessdate=15 January 2019}}
The person who originally found the flaw{{what|date=July 2021}} in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160.{{cite web |title=MurmurHash2_160 |date=25 September 2010 |url=https://simonhf.wordpress.com/2010/09/25/murmurhash160/ |accessdate=12 January 2019}}
=MurmurHash3=
The current version, completed April 3, 2011, is MurmurHash3,{{cite web|title=MurmurHash3 on Github|website=GitHub |url=https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp}}{{cite web | first = Adam | last = Horvath | url = http://blog.teamleadnet.com/2012/08/murmurhash3-ultra-fast-hash-algorithm.html | title = MurMurHash3, an ultra fast hash algorithm for C# / .NET | date = 10 August 2012 }} which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher, a hash function test suite.
Implementations
The canonical implementation is in C++, but there are efficient ports for a variety of popular languages, including Python,{{cite web|url=https://code.google.com/p/pyfasthash/ |title=pyfasthash in Python |accessdate=13 January 2012}} C,{{cite web|url=https://github.com/wolkykim/qlibc |title=C implementation in qLibc by Seungyoung Kim|website=GitHub }} Go,{{cite web|url=https://github.com/spaolacci/murmur3 |title=murmur3 in Go|website=GitHub }} C#,{{cite web|last=Landman |first=Davy |url=http://landman-code.blogspot.com/2009/02/c-superfasthash-and-murmurhash2.html |title=Davy Landman in C# |publisher=Landman-code.blogspot.com |accessdate=13 January 2012}} D,{{Cite web|url=https://dlang.org/phobos/std_digest_murmurhash.html|title=std.digest.murmurhash - D Programming Language|website=dlang.org|access-date=2016-11-05}} [https://github.com/tkaemming/lua-murmurhash3 Lua], Perl,{{cite web|url=http://metacpan.org/module/Digest::MurmurHash |title=Toru Maesaka in Perl |publisher=metacpan.org |accessdate=13 January 2012}} Ruby,{{cite web|author=Yuki Kurihara |url=https://github.com/ksss/digest-murmurhash |title=Digest::MurmurHash |publisher=GitHub.com |date=16 October 2014 |accessdate=18 March 2015}} Rust,{{Cite web|title = stusmall/murmur3|url = https://github.com/stusmall/murmur3|website = GitHub|accessdate = 2015-11-29}} PHP,{{cite web|url=https://github.com/lastguest/murmurhash-php |title=PHP userland implementation of MurmurHash3 |publisher=github.com |accessdate=18 December 2017}}{{cite web|url=https://php.watch/versions/8.1/MurmurHash3 |title=PHP 8.1 with MurmurHash3 support}} Common Lisp,{{cite web|url=https://bitbucket.org/tarballs_are_good/murmurhash3|title=tarballs_are_good / murmurhash3|accessdate=7 February 2015}} Haskell,{{cite web|url=http://hackage.haskell.org/package/murmur-hash |title=Haskell |publisher=Hackage.haskell.org |accessdate=13 January 2012}} Elm,{{cite web|url=https://package.elm-lang.org/packages/Skinney/murmur3/latest/ |title=Elm |publisher=package.elm-lang.org |accessdate=12 June 2019}} Clojure,{{cite web|url=https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Murmur3.java|title=Murmur3.java in Clojure source code on Github|publisher=clojure.org|accessdate=2014-03-11}} Scala,{{cite web|url=https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/hashing/MurmurHash3.scala|title=Scala standard library implementation|website=GitHub |date=26 September 2014}} Java,[https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/hash/Hashing.html Murmur3], part of Guava{{cite web|url=https://github.com/greenrobot/greenrobot-common/blob/master/hash-functions.md|title=Murmur3A and Murmur3F Java classes on Github|publisher=greenrobot|accessdate=5 November 2014}} Erlang,{{cite web|url=https://github.com/bipthelin/murmerl3|title=bipthelin/murmerl3|work=GitHub|accessdate=21 October 2015}} Swift,{{cite web|author=Daisuke T |url=https://github.com/daisuke-t-jp/MurmurHash-Swift |title=MurmurHash-Swift |publisher=GitHub.com |date=7 February 2019 |accessdate=10 February 2019}} Object Pascal,[https://github.com/Xor-el/HashLib4Pascal GitHub - Xor-el/HashLib4Pascal: Hashing for Modern Object Pascal] Kotlin,{{cite web|url=https://github.com/goncalossilva/kotlinx-murmurhash |title=goncalossilva/kotlinx-murmurhash |publisher=GitHub.com |date=10 December 2021 |accessdate=14 December 2021}} JavaScript,{{cite web|author=raycmorgan (owner) |url=https://gist.github.com/588423 |title=Javascript implementation by Ray Morgan |publisher=Gist.github.com |accessdate=13 January 2012}} and OCaml.{{cite web|author=INRIA |url=https://github.com/ocaml/ocaml/blob/trunk/runtime/hash.c |title=OCaml Source |publisher=GitHub.com}}
It has been adopted into a number of open-source projects, most notably libstdc++ (ver 4.6), nginx (ver 1.0.1),{{cite web|url=http://nginx.org/en/CHANGES |title=nginx |accessdate=13 January 2012}} Rubinius,{{cite web|url=https://github.com/rubinius/rubinius/commit/1d69526c484cc9435a7198e41b8995db6c3acf1a |title=Rubinius |website=GitHub |accessdate=29 February 2012}} libmemcached (the C driver for Memcached),{{cite web|url=http://libmemcached.org/libMemcached.html|title=libMemcached|work=libmemcached.org|accessdate=21 October 2015}} npm (nodejs package manager),{{cite web|url=https://github.com/npm/write-file-atomic/commit/22dd8759076fc8d0327d50693283060e479afafc |title=switch from MD5 to murmur|website=GitHub }} maatkit,{{cite web|url=https://code.google.com/p/maatkit/source/detail?r=3273 |title=maatkit |date=24 March 2009 |accessdate=13 January 2012}} Hadoop, Kyoto Cabinet,{{cite web|url=http://fallabs.com/kyotocabinet/spex.html |title=Kyoto Cabinet specification |publisher=Fallabs.com |date=4 March 2011 |accessdate=13 January 2012}} Cassandra,{{cite web|url=http://wiki.apache.org/cassandra/Partitioners|title=Partitioners|publisher=apache.org |date=2013-11-15|accessdate=2013-12-19}}{{cite web |url=https://www.youtube.com/watch?v=d7o6a75sfY0 |title=Introduction to Apache Cassandra™ + What's New in 4.0 by Patrick McFadin. DataStax Presents |work=YouTube |date=April 10, 2019}} Solr,{{cite web|url=http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/codecs/bloom/MurmurHash2.html|archive-url=https://web.archive.org/web/20150402190133/http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/codecs/bloom/MurmurHash2.html|url-status=usurped|archive-date=2 April 2015|title=Solr MurmurHash2 Javadoc|date=31 August 2022 }} vowpal wabbit,{{cite web|url=https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/explore/hash.h|title=hash.cc in vowpalwabbit source code|website=GitHub }} Elasticsearch,{{cite web|url=https://www.elastic.co/guide/en/elasticsearch/reference/2.0/breaking_20_crud_and_routing_changes.html#_routing_hash_function|title=Elasticsearch 2.0 - CRUD and routing changes}} Guava,{{cite web|url=https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/Hashing.java|title=Guava Hashing.java|website=GitHub }} Kafka,{{cite web|url=https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/BuiltInPartitioner.java|title=Kafka BuiltInPartitioner.java|website=GitHub }} and [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/vdo-integration RedHat Virtual Data Optimizer (VDO)].Virtual Data Optimizer [https://github.com/dm-vdo/kvdo source code]
Vulnerabilities
Hash functions can be vulnerable to collision attacks, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and Daniel J. Bernstein were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called HashDoS attacks.{{cite web|url=https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/|title=Breaking Murmur: Hash-flooding DoS Reloaded}} With the use of differential cryptanalysis, they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend using their own SipHash instead.
Algorithm
algorithm Murmur3_32 is
// Note: In this version, all arithmetic is performed with unsigned 32-bit integers.
// In the case of overflow, the result is reduced modulo {{math|232}}.
input: key, len, seed
c1 ← 0xcc9e2d51
c2 ← 0x1b873593
r1 ← 15
r2 ← 13
m ← 5
n ← 0xe6546b64
hash ← seed
for each fourByteChunk of key do
k ← fourByteChunk
k ← k × c1
k ← k ROL r1
k ← k × c2
hash ← hash XOR k
hash ← hash ROL r2
hash ← (hash × m) + n
with any remainingBytesInKey do
remainingBytes ← SwapToLittleEndian(remainingBytesInKey)
// Note: Endian swapping is only necessary on big-endian machines.
// The purpose is to place the meaningful digits towards the low end of the value,
// so that these digits have the greatest potential to affect the low range digits
// in the subsequent multiplication. Consider that locating the meaningful digits
// in the high range would produce a greater effect upon the high digits of the
// multiplication, and notably, that such high digits are likely to be discarded
// by the modulo arithmetic under overflow. We don't want that.
remainingBytes ← remainingBytes × c1
remainingBytes ← remainingBytes ROL r1
remainingBytes ← remainingBytes × c2
hash ← hash XOR remainingBytes
hash ← hash XOR len
hash ← hash XOR (hash >> 16)
hash ← hash × 0x85ebca6b
hash ← hash XOR (hash >> 13)
hash ← hash × 0xc2b2ae35
hash ← hash XOR (hash >> 16)
A sample C implementation follows (for little-endian CPUs):
static inline uint32_t murmur_32_scramble(uint32_t k) {
k *= 0xcc9e2d51;
k = (k << 15) | (k >> 17);
k *= 0x1b873593;
return k;
}
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
{
uint32_t h = seed;
uint32_t k;
/* Read in groups of 4. */
for (size_t i = len >> 2; i; i--) {
// Here is a source of differing results across endiannesses.
// A swap here has no effects on hash properties though.
memcpy(&k, key, sizeof(uint32_t));
key += sizeof(uint32_t);
h ^= murmur_32_scramble(k);
h = (h << 13) | (h >> 19);
h = h * 5 + 0xe6546b64;
}
/* Read the rest. */
k = 0;
for (size_t i = len & 3; i; i--) {
k <<= 8;
k |= key[i - 1];
}
// A swap is *not* necessary here because the preceding loop already
// places the low bytes in the low places according to whatever endianness
// we use. Swaps only apply when the memory is copied in a chunk.
h ^= murmur_32_scramble(k);
/* Finalize. */
h ^= len;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
class="wikitable"
|+Tests !Test string !Seed value !Hash value (hexadecimal) !Hash value (decimal) |
|0x00000000
| | |
|0x00000001
| | |
|0xffffffff
| | |
test
| | | |
test
| | | |
Hello, world!
| | | |
Hello, world!
| | | |
The quick brown fox jumps over the lazy dog
| | | |
The quick brown fox jumps over the lazy dog
| | | |
See also
References
{{reflist|colwidth=30em}}
External links
- [https://github.com/aappleby/smhasher Official SMHasher site]
{{DEFAULTSORT:Murmurhash}}
Category:Hash function (non-cryptographic)