My favorites | Sign in
Logo
             
Search
for
Updated Oct 04, 2009 by nathan.s...@gmail.com
Benchmarking  

Intro

Started with few blog posts and with the help of many contributes, this project is now benchmarking much more then just protobuf and thrift. Thanks to all who looked at the code, contributed, suggested and pointed bugs. Three major contributions are from cowtowncoder who fixed the stax code, Chris Pettitt who added the json code and David Bernard for the xstream and java externalizable. The charts below are displaying the latest results. Note that the charts are scaled to best fit the results and they might be misleading in come cases. If you wish to see the numbers scroll down to the chart at the end of the page. Overall we have benchmarks for protobuf, thrift, java, scala, few implementations of stax, binaryxml, json, xstream, javolution, hessian, avro, sbinary and JSON Marshaller.

Numbers are not everything

Benchmarks can be very misleading. Different datasets will provide different results and sometimes the marginal performance boost is eclipsed by other features like forward and backward compatibility, cross language support, and more.

Charts

Total Time

Including creating an object, serializing and deserializing

Serialization Time

Serializing with a new object each time (object creation time not included)

Deserialization Time

Most expensive operation

Serialization Size

May vary a lot depending on number of repetitions in lists, usage of number compacting in protobuf, strings vs numerics and more. Interesting point is Scala and Java which holds the name of the classes in the serialized form. I.e. longer class names = larger serialized form. In Scala its worse since the Scala compiler creates more implicit classes then java.

Object Creation Time

Object creation is not so meaningful since it takes in average 100 nano to create an object. The surprise comes from protobuf which takes a very long time to create an object. Its the only point in this set of benchmarks where it didn't perform as well as thrift. Scala (and to a lesser point - java) on the other hand is fast, seems like its a good language to handle in memory data structures but when coming to serialization you might want to check the alternatives.

Numbers

Times in nano sec, size in bytes

                        ,   Object create,   Serialization, Deserialization,      Total Time, Serialized Size
avro-generic            ,      2060.87878,      2101.17100,      2558.25150,      6720.30128,        211
avro-specific           ,      1105.29553,      1557.05050,      1902.88300,      4565.22903,        211
protobuf                ,       221.58725,      3007.45900,      1998.95350,      5227.99975,        231
thrift                  ,       128.72947,      3381.54200,      3572.52200,      7082.79347,        353
hessian                 ,        77.04348,   1082814.74200,     34491.53600,   1117383.32148,        541
kryo                    ,        77.45473,      3570.39650,      2220.67450,      5868.52573,        236
kryo-optimized          ,        73.89616,      3599.40400,      2412.61550,      6085.91566,        217
java                    ,        77.19895,     10969.64300,     47893.20500,     58940.04695,        919
java (externalizable)   ,        78.31142,      4562.19350,     13157.85400,     17798.35892,        397
scala                   ,        59.99520,     27947.09350,    128664.98550,    156672.07420,       2024
json (jackson)          ,        77.37814,      2332.30950,      3608.25750,      6017.94514,        378
JsonMarshaller          ,        77.37377,      9855.64450,     17218.58350,     27151.60177,        348
stax/woodstox           ,        77.10098,      4266.49750,      8026.93400,     12370.53248,        475
stax/aalto              ,        75.12022,      3473.18900,      5525.17050,      9073.47972,        475
binaryxml/FI            ,        76.54575,      9568.22350,      9809.16300,     19453.93225,        300
xstream (xpp)           ,        77.42962,     79780.94700,    116464.72650,    196323.10312,        833
xstream (xpp with conv) ,        76.16364,     11972.95550,     22764.88450,     34814.00364,        361
xstream (stax)          ,        77.80239,     74171.23800,    117309.15500,    191558.19539,        871
xstream (stax with conv),        76.92650,      8711.69450,     14755.05200,     23543.67300,        399
javolution xmlformat    ,        74.24016,      4019.80100,      8764.34100,     12858.38216,        419
sbinary                 ,        58.54493,      3265.29500,      2214.90800,      5538.74793,        264

Comment by darugar, Mar 22, 2009

Great stuff. A couple of quick nit-pick items (I'd happily fix but I don't have wiki edit permissions):

- Would be good to indicate the axis on the graphs, if only to say whether longer bars are better or worse. - Also might be useful to break up the graphs with a horizontal line or something, it's hard to tell whether the text goes with graph above or below it. - "Tree major contributions" => "Three major contributions" - "May very a lot" => "May vary a lot"

Comment by tsaloranta, Mar 23, 2009

One minor comment (which I can help with): perhaps also simple sum of deser + ser, since oftentimes this is the overall time taken, either by one side of conversatio n (server or client) or by both combined.

I have also noticed that results vary a lot between different machines, my older home workstation giving totally different numbers from work machine.

Comment by david.bernard.31, Mar 24, 2009

I did some change (for other result)

optimise Xpp to generate compactXml instead of Pretty/indented allow XStream to select the implementation of Stax (use Woodstox by default) fix label of xstream configuration

Comment by joselo, Mar 25, 2009

Would be nice if you compare against the optimize-for-speed version of protobuf, which is not the default.

Comment by david.bernard.31, Mar 26, 2009

results from protobuf are with "optimize-for speed".

use the mailing-list/group

Comment by tmnichols, Mar 27, 2009

I'd love to see how Hessian (http://hessian.caucho.com/) and Ice (http://www.zeroc.com/ice.html) stack up here. Awesome work guys.

Comment by tsaloranta, Mar 31, 2009

Joselo: "optimize-for-speed" is on. Without it, PB is much slower (for me, 3x).

tmnichols: I can add Hessian, should be simple (done it before)

One minor change: I removed 10x multiplier for object creation, otherwise it'll skew results (since you don't create 10 instances, serializer just one). I think it was originally just used to make run time long enough.

Comment by tsaloranta, Mar 31, 2009

Due to change in object creation, here's numbers I get on my work station, for reference (eishay can update 'official' results too):

                        ,   Object create,   Serialization, Deserialization,      Total Time, Serialized Size
protobuf                ,        59.19145,       537.94075,       294.13640,       891.26860,        217
thrift                  ,        52.96210,       494.54925,       522.01160,      1069.52295,        314
java                    ,        48.08995,      1736.61825,      7258.44035,      9043.14855,        845
java (externalizable)   ,        46.87810,       690.94225,      1870.71525,      2608.53560,        315
scala                   ,        45.36270,      4018.60430,     19122.84775,     23186.81475,       1950
stax/woodstox           ,        48.64915,       751.47950,      1018.75335,      1818.88200,        406
stax/aalto              ,        48.75460,       543.86225,       785.45220,      1378.06905,        406
binaryxml/FI            ,        48.73490,      1550.54480,      1537.80335,      3137.08305,        224
json (jackson)          ,        48.67715,       382.99400,       593.11015,      1024.78130,        310
xstream (xpp)           ,        48.57065,      9741.56200,     17444.81985,     27234.95250,        759
xstream (xpp with conv) ,        48.66520,      1704.66610,      5507.66675,      7260.99805,        287
xstream (stax)          ,        48.84445,      9799.39625,     15580.16955,     25428.41025,        797
xstream (stax with conv),        48.62815,      1440.40925,      2615.74715,      4104.78455,        325
javolution xmlformat    ,        48.96825,       535.32845,      1315.22985,      1899.52655,        345
sbinary                 ,        45.05305,       406.14195,       386.88480,       838.07980,        190
Comment by tsaloranta, Apr 01, 2009

Hmmh. I think Xstream(stax) has something funny going on, since serialized size differs from that of Xstream(xpp). Perhaps name aliasing is not done with stax? Otherwise sizes should be the same.

One more thing: it would be great if during warmup roundtrip checking was done: start with object, serialize, deserialize, verify that results match. This to ensure ser/deser are not broken -- while serialized size gives some indication, it's not reliable test.

Comment by arnieg0001, Apr 06, 2009

for Java VM based tests, it'd be helpful if you mention which version of the JDK. There seems to be a big difference between 1.5 and 1.6

Comment by eishay, Apr 08, 2009

Sorry for late response, didn't get notifications on the comments. In the future please comment on the project mailing list.

@darugar thanks! fixed

@tsaloranta isn't it the "Total Time" ?

@joselo we are using optimize for speed in this benchmark. The difference is very large, check out this post for more info: http://www.eishay.com/2008/11/protobuf-with-option-optimize-for-speed.html

@tmnichols contributions are happily accepted, shoot me an email or tweet @eishay if you have some time to work on it

@tsaloranta very interesting, I'll check it out when I'll have some time. protobuf seems to be now faster then json, I know some people who will be delighted :-)

@arnieg0001 we used java6 but note that the results are not absolute but a comparison. I guess (though need to verify) that the jvm version should effect them proportionally.

Comment by opencoeli, Apr 16, 2009

Please include also memory usage. Very often it is possible to serialize 1 GB of object structures, but deserialization will fail with OutOfMemory? or simular exception.

Comment by eishay, Apr 16, 2009

That's a bit hard to do. Any suggestions?

Comment by tsaloranta, Apr 16, 2009

Memory usage measurementis non-trivial, but more importantly I am not sure I see the point. Why? Because most impls (and all fast ones) are effectively streaming/chunking, so the only thing retained in memory are the business objects (result or input). And those should have about the same size.

That could expose some flaws/suboptimalities in implementations, perhaps that would be the point? And others would work up until point where memory is used by "legitimate" objects.

Comment by stinkyminky, Apr 17, 2009

The number is in 'nanosecond' but I see the following in the code

return iterationTime(delta) / 10d;

So, the numbers are scaled wrong???

Comment by eishay, Apr 18, 2009

Thanks @stinkyminky, you are right, there was a merge clobber which left behind some buggy code. Fixing it and loading the correct data withthe latest results from @tsaloranta additional serializer code.

Comment by chad_wal...@yahoo.com, May 12, 2009

Eishay, thanks for putting these benchmarks together. I do some work on Thrift and I am looking into the benchmarks to understand some things that don't quite fit.

I posted a bit of a report and some follow-up to the thrift-dev mailing list.

http://mail-archives.apache.org/mod_mbox/incubator-thrift-dev/200905.mbox/%3CC62E9622.97C4C%25cwalter@microsoft.com%3E

You can see more follow-up on the rest of the mailing list archives there too -- just look for posts with the title "Report on thrift-protobuf-compare" in http://mail-archives.apache.org/mod_mbox/incubator-thrift-dev/200905.mbox/thread

The main upshot is that: 1. You should update to a more recent version of Thrift and use TCompactProtocol 2. The API you have chosen around byte puts Thrift at a bit of disadvantage since it doesn't support that directly. Also, the way that you serialize once and then deserialize 10000 times doesn't fit well with Thrift's transport models, which are more designed for real world RPC cases. 3. By making some fairly minor modifications to Thrift and the benchmark program, I was able to get Thrift to perform at roughly the same level as protocol buffers, albeit with some modest room for improvement on deserialization that I am still investigating 4. The biggest issue is that the dynamic serializations are not being tested in a manner that gives a true apples-to-apples comparison with Thrift, protocol buffers, and other statically generated systems. This makes Thrift and protocol buffers look like they perform no better or even worse than, say, JSON, but this is not an accurate reflection of the underlying realities because the JSON serialization and deserialization implementations are not actually correct. You can see this by adding another person to the list in StdMediaSerializer?::create(). I am happy to discuss further.

Let's chat more about this and see if there are ways to get an accurate picture of the benchmarks that puts each implementation in its best possible light without compromising correctness.

Cheers,

Chad

Comment by eishay, May 12, 2009

Hi Chad,

That's an awesome feedback, thanks!

There are indeed few things we need to fix, some are obvious like the URLs in the java std serializer and some are less like the way dynamic serializers are handling lists (a good point which I totally agree on). I also agree we should exploit the serializers as much as possible for best performance, but without being unfair to the other serializers. Since you invested so much in it, I hope you could help me a bit more. Could you please start a discussion in the project's group and cross link it with the one at the thrift-dev list? As for fixing / enhancing the code, right now I am a bit overloaded but I promise to work on it when I'll have time. Could you please post appropriate bugs? Or even better, if you have the time and will take the oath of being fair to other serializers then I'll be happy to make you a committer.

As a side note, usage patterns and data set may have a huge effect of any of the serializers performances. Given that, the code samples of how do the APIs looks like and how to use them correctly might be more valuable them the numbers in the wiki.

Comment by chad_wal...@yahoo.com, May 12, 2009

Eishay, let me look into what is feasible in terms of time commitment and also code contribution. I can certainly file some bugs and propose some enhancements.

Comment by chad_wal...@yahoo.com, May 13, 2009

I added two issues. Issue #4 is much much more significant. The way things are currently is giving people the incorrect perception that, say, "json performs as well or better than protocol buffers and Thrift". However, the benchmarks only show this because the json implementation is not doing a proper serialization and deserialization. Just because you can write one object into a string and then read that string back to produce the same object does not mean that you have a proper implementation of serialization and deserialization -- you need to be able to do the same for a variety of similar objects and many of these implementations do not stand up under what should be non-material changes to the input object.

Comment by eishay, May 13, 2009

Thanks Chad!

Comment by tsaloranta, Jun 24, 2009

FWIW, I improved json and stax/xml serializers to resolve issue #4 regarding these serializers. Same could and should be done for other serializers that have the issue. For me the numbers didn't seem to change a lot, but it'd be good to run 'official' numbers using same setup as with earlier published numbers.

Comment by eishay, Jun 29, 2009

Updated the page with latest results (using Java6), added the new serializers.

Comment by maitai.truong, Jun 30, 2009

Hello,

Where can I download the source for those performance tests?

Thanks Tai

Comment by eishay, Aug 04, 2009

Added JsonMarshaller? serializer (by Pascal-Louis Perez) results http://code.google.com/p/jsonmarshaller/

Comment by nathan.s...@gmail.com, Sep 28, 2009

Added Kryo results: http://code.google.com/p/kryo/ Updated the page to latest results, which include some bug fixes important to accurate comparison. Also sorted the charts from smallest to largest, since smaller is always better this makes it easier to compare.


Sign in to add a comment
Hosted by Google Code