|
Benchmarking
IntroStarted with few blog posts and with the help of many contributes, this project is now benchmarking much more then just protobuf and thrift. Thanks to all who looked at the code, contributed, suggested and pointed bugs. Three major contributions are from cowtowncoder who fixed the stax code, Chris Pettitt who added the json code and David Bernard for the xstream and java externalizable. The charts below are displaying the latest results. Note that the charts are scaled to best fit the results and they might be misleading in come cases. If you wish to see the numbers scroll down to the chart at the end of the page. Overall we have benchmarks for protobuf, thrift, java, scala, few implementations of stax, binaryxml, json, xstream, javolution, hessian, avro, sbinary and JSON Marshaller. Numbers are not everythingBenchmarks can be very misleading. Different datasets will provide different results and sometimes the marginal performance boost is eclipsed by other features like forward and backward compatibility, cross language support, and more. ChartsTotal TimeIncluding creating an object, serializing and deserializing
Serialization TimeSerializing with a new object each time (object creation time not included)
Deserialization TimeMost expensive operation
Serialization SizeMay vary a lot depending on number of repetitions in lists, usage of number compacting in protobuf, strings vs numerics and more. Interesting point is Scala and Java which holds the name of the classes in the serialized form. I.e. longer class names = larger serialized form. In Scala its worse since the Scala compiler creates more implicit classes then java.
Object Creation TimeObject creation is not so meaningful since it takes in average 100 nano to create an object. The surprise comes from protobuf which takes a very long time to create an object. Its the only point in this set of benchmarks where it didn't perform as well as thrift. Scala (and to a lesser point - java) on the other hand is fast, seems like its a good language to handle in memory data structures but when coming to serialization you might want to check the alternatives.
NumbersTimes in nano sec, size in bytes , Object create, Serialization, Deserialization, Total Time, Serialized Size avro-generic , 2060.87878, 2101.17100, 2558.25150, 6720.30128, 211 avro-specific , 1105.29553, 1557.05050, 1902.88300, 4565.22903, 211 protobuf , 221.58725, 3007.45900, 1998.95350, 5227.99975, 231 thrift , 128.72947, 3381.54200, 3572.52200, 7082.79347, 353 hessian , 77.04348, 1082814.74200, 34491.53600, 1117383.32148, 541 kryo , 77.45473, 3570.39650, 2220.67450, 5868.52573, 236 kryo-optimized , 73.89616, 3599.40400, 2412.61550, 6085.91566, 217 java , 77.19895, 10969.64300, 47893.20500, 58940.04695, 919 java (externalizable) , 78.31142, 4562.19350, 13157.85400, 17798.35892, 397 scala , 59.99520, 27947.09350, 128664.98550, 156672.07420, 2024 json (jackson) , 77.37814, 2332.30950, 3608.25750, 6017.94514, 378 JsonMarshaller , 77.37377, 9855.64450, 17218.58350, 27151.60177, 348 stax/woodstox , 77.10098, 4266.49750, 8026.93400, 12370.53248, 475 stax/aalto , 75.12022, 3473.18900, 5525.17050, 9073.47972, 475 binaryxml/FI , 76.54575, 9568.22350, 9809.16300, 19453.93225, 300 xstream (xpp) , 77.42962, 79780.94700, 116464.72650, 196323.10312, 833 xstream (xpp with conv) , 76.16364, 11972.95550, 22764.88450, 34814.00364, 361 xstream (stax) , 77.80239, 74171.23800, 117309.15500, 191558.19539, 871 xstream (stax with conv), 76.92650, 8711.69450, 14755.05200, 23543.67300, 399 javolution xmlformat , 74.24016, 4019.80100, 8764.34100, 12858.38216, 419 sbinary , 58.54493, 3265.29500, 2214.90800, 5538.74793, 264 |
Sign in to add a comment
Great stuff. A couple of quick nit-pick items (I'd happily fix but I don't have wiki edit permissions):
- Would be good to indicate the axis on the graphs, if only to say whether longer bars are better or worse. - Also might be useful to break up the graphs with a horizontal line or something, it's hard to tell whether the text goes with graph above or below it. - "Tree major contributions" => "Three major contributions" - "May very a lot" => "May vary a lot"
One minor comment (which I can help with): perhaps also simple sum of deser + ser, since oftentimes this is the overall time taken, either by one side of conversatio n (server or client) or by both combined.
I have also noticed that results vary a lot between different machines, my older home workstation giving totally different numbers from work machine.
I did some change (for other result)
optimise Xpp to generate compactXml instead of Pretty/indented allow XStream to select the implementation of Stax (use Woodstox by default) fix label of xstream configuration
Would be nice if you compare against the optimize-for-speed version of protobuf, which is not the default.
results from protobuf are with "optimize-for speed".
use the mailing-list/group
I'd love to see how Hessian (http://hessian.caucho.com/) and Ice (http://www.zeroc.com/ice.html) stack up here. Awesome work guys.
Joselo: "optimize-for-speed" is on. Without it, PB is much slower (for me, 3x).
tmnichols: I can add Hessian, should be simple (done it before)
One minor change: I removed 10x multiplier for object creation, otherwise it'll skew results (since you don't create 10 instances, serializer just one). I think it was originally just used to make run time long enough.
Due to change in object creation, here's numbers I get on my work station, for reference (eishay can update 'official' results too):
Hmmh. I think Xstream(stax) has something funny going on, since serialized size differs from that of Xstream(xpp). Perhaps name aliasing is not done with stax? Otherwise sizes should be the same.
One more thing: it would be great if during warmup roundtrip checking was done: start with object, serialize, deserialize, verify that results match. This to ensure ser/deser are not broken -- while serialized size gives some indication, it's not reliable test.
for Java VM based tests, it'd be helpful if you mention which version of the JDK. There seems to be a big difference between 1.5 and 1.6
Sorry for late response, didn't get notifications on the comments. In the future please comment on the project mailing list.
@darugar thanks! fixed
@tsaloranta isn't it the "Total Time" ?
@joselo we are using optimize for speed in this benchmark. The difference is very large, check out this post for more info: http://www.eishay.com/2008/11/protobuf-with-option-optimize-for-speed.html
@tmnichols contributions are happily accepted, shoot me an email or tweet @eishay if you have some time to work on it
@tsaloranta very interesting, I'll check it out when I'll have some time. protobuf seems to be now faster then json, I know some people who will be delighted :-)
@arnieg0001 we used java6 but note that the results are not absolute but a comparison. I guess (though need to verify) that the jvm version should effect them proportionally.
Please include also memory usage. Very often it is possible to serialize 1 GB of object structures, but deserialization will fail with OutOfMemory? or simular exception.
That's a bit hard to do. Any suggestions?
Memory usage measurementis non-trivial, but more importantly I am not sure I see the point. Why? Because most impls (and all fast ones) are effectively streaming/chunking, so the only thing retained in memory are the business objects (result or input). And those should have about the same size.
That could expose some flaws/suboptimalities in implementations, perhaps that would be the point? And others would work up until point where memory is used by "legitimate" objects.
The number is in 'nanosecond' but I see the following in the code
So, the numbers are scaled wrong???
Thanks @stinkyminky, you are right, there was a merge clobber which left behind some buggy code. Fixing it and loading the correct data withthe latest results from @tsaloranta additional serializer code.
Eishay, thanks for putting these benchmarks together. I do some work on Thrift and I am looking into the benchmarks to understand some things that don't quite fit.
I posted a bit of a report and some follow-up to the thrift-dev mailing list.
http://mail-archives.apache.org/mod_mbox/incubator-thrift-dev/200905.mbox/%3CC62E9622.97C4C%25cwalter@microsoft.com%3E
You can see more follow-up on the rest of the mailing list archives there too -- just look for posts with the title "Report on thrift-protobuf-compare" in http://mail-archives.apache.org/mod_mbox/incubator-thrift-dev/200905.mbox/thread
The main upshot is that: 1. You should update to a more recent version of Thrift and use TCompactProtocol 2. The API you have chosen around byte puts Thrift at a bit of disadvantage since it doesn't support that directly. Also, the way that you serialize once and then deserialize 10000 times doesn't fit well with Thrift's transport models, which are more designed for real world RPC cases. 3. By making some fairly minor modifications to Thrift and the benchmark program, I was able to get Thrift to perform at roughly the same level as protocol buffers, albeit with some modest room for improvement on deserialization that I am still investigating 4. The biggest issue is that the dynamic serializations are not being tested in a manner that gives a true apples-to-apples comparison with Thrift, protocol buffers, and other statically generated systems. This makes Thrift and protocol buffers look like they perform no better or even worse than, say, JSON, but this is not an accurate reflection of the underlying realities because the JSON serialization and deserialization implementations are not actually correct. You can see this by adding another person to the list in StdMediaSerializer?::create(). I am happy to discuss further.
Let's chat more about this and see if there are ways to get an accurate picture of the benchmarks that puts each implementation in its best possible light without compromising correctness.
Cheers,
Chad
Hi Chad,
That's an awesome feedback, thanks!
There are indeed few things we need to fix, some are obvious like the URLs in the java std serializer and some are less like the way dynamic serializers are handling lists (a good point which I totally agree on). I also agree we should exploit the serializers as much as possible for best performance, but without being unfair to the other serializers. Since you invested so much in it, I hope you could help me a bit more. Could you please start a discussion in the project's group and cross link it with the one at the thrift-dev list? As for fixing / enhancing the code, right now I am a bit overloaded but I promise to work on it when I'll have time. Could you please post appropriate bugs? Or even better, if you have the time and will take the oath of being fair to other serializers then I'll be happy to make you a committer.
As a side note, usage patterns and data set may have a huge effect of any of the serializers performances. Given that, the code samples of how do the APIs looks like and how to use them correctly might be more valuable them the numbers in the wiki.
Eishay, let me look into what is feasible in terms of time commitment and also code contribution. I can certainly file some bugs and propose some enhancements.
I added two issues. Issue #4 is much much more significant. The way things are currently is giving people the incorrect perception that, say, "json performs as well or better than protocol buffers and Thrift". However, the benchmarks only show this because the json implementation is not doing a proper serialization and deserialization. Just because you can write one object into a string and then read that string back to produce the same object does not mean that you have a proper implementation of serialization and deserialization -- you need to be able to do the same for a variety of similar objects and many of these implementations do not stand up under what should be non-material changes to the input object.
Thanks Chad!
FWIW, I improved json and stax/xml serializers to resolve issue #4 regarding these serializers. Same could and should be done for other serializers that have the issue. For me the numbers didn't seem to change a lot, but it'd be good to run 'official' numbers using same setup as with earlier published numbers.
Updated the page with latest results (using Java6), added the new serializers.
Hello,
Where can I download the source for those performance tests?
Thanks Tai
Sure, go to: http://code.google.com/p/thrift-protobuf-compare/source/checkout
Added JsonMarshaller? serializer (by Pascal-Louis Perez) results http://code.google.com/p/jsonmarshaller/
Added Kryo results: http://code.google.com/p/kryo/ Updated the page to latest results, which include some bug fixes important to accurate comparison. Also sorted the charts from smallest to largest, since smaller is always better this makes it easier to compare.