runtime: profile and tune map code #3885

gopherbot · 2012-07-29T17:25:28Z

by yury.shulaev:

I did a trivial map benchmark using Go (gc), JavaScript (V8) and Python (Python3, PyPy)
and here are the results:

$ export GOMAXPROCS=""; go build map.go; time ./map

real    0m0.656s
user    0m0.652s
sys     0m0.000s

$ time node map.js

real    0m0.055s
user    0m0.040s
sys     0m0.012s

$ time python3 map.py

real    0m0.613s
user    0m0.592s
sys     0m0.016s

$ time pypy map.py

real    0m0.143s
user    0m0.124s
sys     0m0.016s

Looks like Go's map is 12 times slower than V8's and 4.5 times slower than PyPy's.
Well, JavaScript objects can only use strings as keys, but Python's dict does not
have this limitation and it is still being able to run in 0.143s.

Is it possible to adapt V8's map implementation to Go? By the way, it looks simpler:
https://github.com/v8/v8/blob/master/src/hashmap.h
vs
http://code.google.com/p/go/source/browse/src/pkg/runtime/hashmap.c


Code used in the benchmark:

map.go:

package main

func main() {
    a := make(map[int]int)

    for i := 0; i < 10000000; i++ {
        a[123] = 456
    }
}


map.js:

function main() {
    var a = {};

    for (var i = 0; i < 10000000; i++) {
        a[123] = 456;
    }
}

main();


map.py:

#!/usr/bin/env python3

def main():
    a = {}

    for i in range(10000000):
        a[123] = 456

if __name__ == '__main__':
    main()


Profiler output for 10 times more iterations:

Total: 772 samples
     385  49.9%  49.9%      504  65.3% hash_lookup
     106  13.7%  63.6%      106  13.7% runtime.memhash
      86  11.1%  74.7%       86  11.1% runtime.rnd
      60   7.8%  82.5%      618  80.1% runtime.mapaccess
      37   4.8%  87.3%       37   4.8% runtime.memcopy32
      33   4.3%  91.6%      765  99.1% main.main
      32   4.1%  95.7%      732  94.8% runtime.mapaccess1
      20   2.6%  98.3%       20   2.6% hash_indirect
      13   1.7% 100.0%       13   1.7% runtime.memequal32
       0   0.0% 100.0%      765  99.1% runtime.main


OS and hardware:

Linux 3.2.0-27-generic #43-Ubuntu SMP Fri Jul 6 14:25:57 UTC 2012 x86_64 x86_64 x86_64
GNU/Linux

go version go1 (which is 'apt-get install golang')

Intel Core i5, 4 GB RAM

remyoudompheng · 2012-07-29T17:36:37Z

Comment 1:

V8 hashmaps look pretty similar to Python's maps, except that Python probing is linear
congruential. Gccgo uses an array with linked lists for collisions, it's quite faster
than standard Go library too.

gopherbot · 2012-07-29T17:42:05Z

Comment 2 by yury.shulaev:

Yes, I trired gccgo but it was only 2 times faster than gc.

gopherbot · 2012-07-29T17:42:41Z

Comment 3 by yury.shulaev:

Yes, I tried gccgo but it was only 2 times faster than gc.

gopherbot · 2012-07-31T08:29:11Z

Comment 4 by yury.shulaev:

Well, my benchmark turned out to be not fair enough. V8's execution time was almost
equal to empty loop time, maybe it somehow eliminated repeated assignments. I changed
"a[123] = 456" to "a[i] = i", so it no longer can be eliminated, and results were a
little different:
V8 ~0.7s
PyPy ~1.1s
Go ~3.8s (x5.4 V8, x3.4 PyPy)
A little better for Go now, but still doen't look optimal. And gccgo performed worse
this time.
I'm really not an expert here, but i didn't found existing benchmarks of Go's map type.
It will be great if someone can write less synthetic benchmarks doing random reading and
writing and using various types as keys.

gopherbot · 2012-07-31T09:43:32Z

Comment 5 by yury.shulaev:

Found a dict benchmark code for Python (it uses pybench):
https://github.com/jonashaag/cpython/blob/master/Tools/pybench/Dict.py

rsc · 2012-09-12T19:40:40Z

Comment 6:

The map algorithm is chosen to work well with even very very large maps. It would not
surprise me if Chrome and Python do not do as well in those cases.
It is possible that there is some low-hanging fruit, and it is also possible that small
maps should use a different layout entirely.

Labels changed: added priority-later, removed priority-triage.

Status changed to Accepted.

rsc · 2012-09-12T21:42:50Z

Comment 7:

Labels changed: added go1.1maybe.

gopherbot · 2012-09-13T15:11:19Z

Comment 8 by yury.shulaev:

But in the comment #4 I used "a[i] = i" statement in the loop, so the final map
contained 10 million entries. Isn't it large enough?

rsc · 2012-09-13T16:52:53Z

Comment 9:

That's only ~100 MB.

rsc · 2012-12-10T04:02:52Z

Comment 11:

Labels changed: added size-l.

randall77 · 2013-03-12T17:47:49Z

Comment 12:

This issue was updated by revision a5d4024.

Helps some by making hashing faster.  Go time drops from
0.65s to 0.51s.
R=rsc, r, bradfitz, remyoudompheng, khr, dsymonds, minux.ma, elias.naur
CC=golang-dev
https://golang.org/cl/7543043

rsc · 2013-03-12T20:19:30Z

Comment 13:

[The time for maybe has passed.]

Labels changed: removed go1.1maybe.

ugorji · 2013-03-14T06:42:52Z

Comment 14:

I was seeing some bad results within my codebase that looked up information 
within a map. I had to look up a mapping for reflect.Type every turn in the loop
and the map access increased my benchmark times significantly.
I did some analysis comparing to Node.JS maps, for both string and integer 
key/values, checking time spent for inserts, deletes and gets for different sizes 
of the map.
I've attached 4 files with my results:
- comparelangs.go
- comparelangs.js
- test.sh (for running with appropriate arguments)
- output.txt (output from a sample run).
Simple Analysis Results:
   Node.JS significantly better than Go Maps for insert and retrieval.
     Go Maps performance degrades at retrieval as map size increases.
   Node.JS terrible/unusable at delete once since of map grows.
   Summary is that for typical map usage of small size maps that are, 
     mostly doing retrievals, Go maps do not perform as well.

Attachments:

comparelangs.go (1516 bytes)
comparelangs.js (1532 bytes)
test.sh (1072 bytes)
output.txt (1960 bytes)

ugorji · 2013-03-14T17:42:34Z

Comment 15:

I did some more tests with some more telling data.
I'm not sure how to interpret it; Hopefully it makes sense to you guys and you can 
use it as a data point.
In the test, I try to see which would perform better between linear searching and map
retrieval.
Clearly, map retrieval is much better for integer keys (and I expect for most of the
other 
numeric types in Go). It stayed constant whereas linear search increased (as expected).
However, for string keys, it seems map retrieval was always worse than linear searching 
a slice (in this test) even as we increased the size of the map or slice. Map access
time 
increased as n increased, even "averaging" more than the linear search in the slice.
I've attached 3 files:
- map_vs_slice_test.go (benchmark tests)
- test.sh (run the benchmark with increasing sizes of n from 2, 4, 8, ... 256)
- output.txt (sample run)
Just guessing, but could this be because the hash function is called each time on the
string?
Is there a chance that we could do what was done for like reflect.Type, and cache a hash 
in the string? This way, we don't compute the hash every time if the same string is
sent. 
As a data point, Java caches the hash code in the String object also to save on hash 
computation time. For an immutable value, I think it makes sense.

Attachments:

map_vs_slice_test.go (3663 bytes)
test.sh (163 bytes)
output.txt (1250 bytes)

rsc · 2013-03-15T15:28:51Z

Comment 16:

Re #15 the benchmark is not as telling as it might seem. The n strings being considered
in the search look like:
1
02
003
0004
00005
000006
0000007
00000008
000000009
0000000010
00000000011
000000000012
0000000000013
00000000000014
000000000000015
0000000000000016
As you increase the size of the set, you are also increasing the average length of the
strings, which also increases the average length of the strings being searched for. The
map lookup must of course hash the key in order to decide where in the table to look,
and that grows with this average length. So this format is pessimal for the map. It is
also optimal for the slice scan, because string comparison doesn't even look at the
string data unless the string lengths match. There is only one entry in the slice with
the right length, so the whole scan does on average n/2 length comparisons (very cheap)
and one length-n string comparison. In contrast the map lookup does on average one
length-n string hash and one length-n string comparison. The length-n string hash and
the n/2 length comparisons are about equal in complexity, so the slice scan and the map
lookup take roughly the same amount of time.
Real data does not look like this.
If you change sstr to return strconv.Itoa, so that there are some strings with the same
length in the set of keys, the map quickly wins:
   n     scan      map
   2     22.2     59.0
   4     34.2     60.3
   8     50.4     63.2
  16     60.3     67.0
  32      100     67.8
  64      184     66.4
 128      277     64.7
 256      504     68.4
 512     1225     64.6
1024     2721     57.9
2048     3930     65.4
4096     9631     66.4
It's not worth doing more experiments on the current map implementation.  Keith is
working on a new one that may be ready for Go 1.1 or may not be, but either way the
current one's days are numbered.

Labels changed: added go1.1maybe.

Owner changed to @randall77.

Status changed to Started.

ugorji · 2013-03-15T15:41:39Z

Comment 17:

Thanks Russ. 
Re #15: I didn't take into consideration that the fact that the string 
comparison would skip checking string data if the lengths didn't match. My oversight.
Thanks for the follow up.

bradfitz · 2013-03-19T18:34:37Z

Comment 18:

Attaching a benchmark file.
$ hg id
f9e10e016342 
~/mapbench$ go test -bench=. 
 PASS
BenchmarkBigStr_0      10000        251329 ns/op
BenchmarkBigStr_4      10000        255585 ns/op
BenchmarkBigStr_8      10000        253203 ns/op
BenchmarkBigStr_16     10000        256934 ns/op
BenchmarkBigStr_32     10000        260725 ns/op
BenchmarkBigStr_64     10000        251684 ns/op
BenchmarkBigStr_512    10000        252046 ns/op
BenchmarkBigStr2_0     10000        251376 ns/op
BenchmarkBigStr2_4     10000        273891 ns/op
BenchmarkBigStr2_8     10000        252239 ns/op
BenchmarkBigStr2_16    10000        250466 ns/op
BenchmarkBigStr2_32    10000        250904 ns/op
BenchmarkBigStr2_64    10000        251077 ns/op
BenchmarkBigStr2_512       10000        252529 ns/op
BenchmarkSmallStr_0 50000000            37.5 ns/op
BenchmarkSmallStr_4 50000000            36.5 ns/op
BenchmarkSmallStr_8 50000000            37.2 ns/op
BenchmarkSmallStr_16    50000000            44.8 ns/op
BenchmarkSmallStr_32    50000000            36.3 ns/op
BenchmarkSmallStr_64    50000000            43.6 ns/op
BenchmarkSmallStr_512   50000000            41.7 ns/op
BenchmarkSmallStr_1024  50000000            42.5 ns/op
BenchmarkSmallStr_1M    50000000            52.5 ns/op
BenchmarkSmallStr2_0    50000000            37.4 ns/op
BenchmarkSmallStr2_4    50000000            37.2 ns/op
BenchmarkSmallStr2_8    50000000            42.7 ns/op
BenchmarkSmallStr2_16   50000000            45.8 ns/op
BenchmarkSmallStr2_32   50000000            40.7 ns/op
BenchmarkSmallStr2_64   50000000            40.5 ns/op
BenchmarkSmallStr2_512  50000000            37.3 ns/op
BenchmarkSmallStr2_1024 50000000            38.0 ns/op
BenchmarkSmallStr2_1M   50000000            46.9 ns/op
BenchmarkInt_0  50000000            32.9 ns/op
BenchmarkInt_4  50000000            33.4 ns/op
BenchmarkInt_8  50000000            37.2 ns/op
BenchmarkInt_16 50000000            33.0 ns/op
BenchmarkInt_32 50000000            39.7 ns/op
BenchmarkInt_64 50000000            48.3 ns/op
BenchmarkInt_512    50000000            36.7 ns/op
BenchmarkInt_1024   50000000            36.0 ns/op
BenchmarkInt_1M 50000000            41.9 ns/op
BenchmarkInt2_0 50000000            34.1 ns/op
BenchmarkInt2_4 50000000            35.9 ns/op
BenchmarkInt2_8 50000000            42.6 ns/op
BenchmarkInt2_16    50000000            34.0 ns/op
BenchmarkInt2_32    50000000            37.3 ns/op
BenchmarkInt2_64    50000000            39.9 ns/op
BenchmarkInt2_512   50000000            34.2 ns/op
BenchmarkInt2_1024  50000000            37.1 ns/op
BenchmarkInt2_1M    50000000            46.5 ns/op
BenchmarkPtr_32 50000000            41.7 ns/op
BenchmarkPtr2_32    50000000            33.3 ns/op

Attachments:

map_test.go (5039 bytes)

randall77 · 2013-03-20T20:51:41Z

Comment 19:

This issue was updated by revision 00224a3.

Go time drops from 0.51s to 0.34s.
R=r, rsc, m3b, dave, bradfitz, khr, ugorji, remyoudompheng
CC=golang-dev
https://golang.org/cl/7504044

robpike · 2013-05-18T15:04:12Z

Comment 20:

Labels changed: added go1.2maybe, removed go1.1maybe.

rsc · 2013-07-30T19:01:12Z

Comment 21:

Is there anything more worth doing? I'm inclined to mark this Fixed.

robpike · 2013-08-07T00:23:16Z

Comment 22:

Marking as fixed.

Status changed to Fixed.

gopherbot added fixed labels Aug 7, 2013

rsc added this to the Go1.2 milestone Apr 14, 2015

rsc removed the go1.2maybe label Apr 14, 2015

golang locked and limited conversation to collaborators Jun 24, 2016

gopherbot added the FrozenDueToAge label Jun 24, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: profile and tune map code #3885

runtime: profile and tune map code #3885

gopherbot commented Jul 29, 2012

remyoudompheng commented Jul 29, 2012

gopherbot commented Jul 29, 2012

gopherbot commented Jul 29, 2012

gopherbot commented Jul 31, 2012

gopherbot commented Jul 31, 2012

rsc commented Sep 12, 2012

rsc commented Sep 12, 2012

gopherbot commented Sep 13, 2012

rsc commented Sep 13, 2012

rsc commented Dec 10, 2012

randall77 commented Mar 12, 2013

rsc commented Mar 12, 2013

ugorji commented Mar 14, 2013

ugorji commented Mar 14, 2013

rsc commented Mar 15, 2013

ugorji commented Mar 15, 2013

bradfitz commented Mar 19, 2013

randall77 commented Mar 20, 2013

robpike commented May 18, 2013

rsc commented Jul 30, 2013

robpike commented Aug 7, 2013

runtime: profile and tune map code #3885

runtime: profile and tune map code #3885

Comments

gopherbot commented Jul 29, 2012

remyoudompheng commented Jul 29, 2012

gopherbot commented Jul 29, 2012

gopherbot commented Jul 29, 2012

gopherbot commented Jul 31, 2012

gopherbot commented Jul 31, 2012

rsc commented Sep 12, 2012

rsc commented Sep 12, 2012

gopherbot commented Sep 13, 2012

rsc commented Sep 13, 2012

rsc commented Dec 10, 2012

randall77 commented Mar 12, 2013

rsc commented Mar 12, 2013

ugorji commented Mar 14, 2013

ugorji commented Mar 14, 2013

rsc commented Mar 15, 2013

ugorji commented Mar 15, 2013

bradfitz commented Mar 19, 2013

randall77 commented Mar 20, 2013

robpike commented May 18, 2013

rsc commented Jul 30, 2013

robpike commented Aug 7, 2013