My favorites | Sign in
Google
                
Search
for
Updated Oct 20, 2009 by collinw
Labels: Featured
Release2009Q3  
Details on the 2009Q3 release

Unladen Swallow 2009Q3

Unladen Swallow 2009Q3 is the second release of Unladen Swallow to use LLVM for native code generation, and the first to use runtime feedback for optimization. To obtain the 2009Q3 release, run

svn checkout http://unladen-swallow.googlecode.com/svn/branches/release-2009Q3-maint unladen-2009Q3

The Unladen Swallow team does not recommend wide adoption of the 2009Q3 release. This is intended as a checkpoint of our progress, a milestone on the long path to our eventual performance goals. Note that Unladen Swallow tracks LLVM trunk fairly closely, and will not build against LLVM 2.5 or 2.6.

Highlights:

  • Unladen Swallow 2009Q3 uses up to 930% less memory than the 2009Q2 release.
  • Execution performance has improved by 15-70%, depending on benchmark.
  • Unladen Swallow 2009Q3 integrates with gdb 7.0 to better support debugging of JIT-compiled code.
  • Unladen Swallow 2009Q3 integrates with OProfile 0.9.4 and later to provide seemless profiling across Python and C code, if configured with --with-oprofile=<oprofile-prefix>.
  • Many bugs and restrictions in LLVM's JIT have been fixed. In particular, the 2009Q2 limitation of 16MB of machine code has been lifted.
  • Unladen Swallow 2009Q3 passes the tests for all the third-party tools and libraries listed on the Testing page. Significantly for many projects, this includes compatibility with Twisted, Django, NumPy and Swig.

Lowlights:

  • LLVM's JIT and other infrastructure needed more work than was expected. As a result, we did not have time to improve performance as much as we would have liked.
  • Memory usage is still 2-3x that of Python 2.6.1. However, there is more overhead that can be eliminated for the 2009Q4 release.

Memory Usage

In the memory benchmarks, we compared the fastest configuration for Q3 against the fastest configuration for Q2. The Q2 configuration is the same as what was reported in Release2009Q2.

2009Q2 vs 2009Q3
slowspitfire:
$ ./perf.py -r -b slowspitfire --args "-j always," --track_memory ../q2/python ../q3/python
Mem max: 212344.000 -> 96884.000: 119.17% smaller
Usage over time: http://tinyurl.com/yfy3w3p

ai:
$ ./perf.py -r -b ai --args "-j always," --track_memory ../q2/python ../q3/python
Mem max: 95012.000 -> 14020.000: 577.69% smaller
Usage over time: http://tinyurl.com/yz7v4xj

slowpickle:
$ ./perf.py -r -b slowpickle --args "-j always," --track_memory ../q2/python ../q3/python
Mem max: 96876.000 -> 18996.000: 409.98% smaller
Usage over time: http://tinyurl.com/yf4a3sj

slowunpickle:
$ ./perf.py -r -b slowunpickle --args "-j always," --track_memory ../q2/python ../q3/python
Mem max: 96876.000 -> 14076.000: 588.24% smaller
Usage over time: http://tinyurl.com/yfzv2mn

django:
$ ./perf.py -r -b django --args "-j always," --track_memory ../q2/python ../q3/python
Mem max: 159064.000 -> 27160.000: 485.66% smaller
Usage over time: http://tinyurl.com/ykdmdml

rietveld:
$ ./perf.py -r -b rietveld --args "-j always," --track_memory ../q2/python ../q3/python
Mem max: 575116.000 -> 55952.000: 927.87% smaller
Usage over time: http://tinyurl.com/yf3rcbb

GDB Support

The Unladen Swallow team added support to gdb 7.0 that allow JIT compilers to emit DWARF debugging information so that gdb can function properly in the presence of JIT-compiled code. This interface should be sufficiently generic that any JIT compiler can take advantage of it.

Example backtrace before:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaaabdfbd10 (LWP 25476)]
0x00002aaaabe7d1a8 in ?? ()
(gdb) bt
#0  0x00002aaaabe7d1a8 in ?? ()
#1  0x0000000000000003 in ?? ()
#2  0x0000000000000004 in ?? ()
#3  0x00032aaaabe7cfd0 in ?? ()
#4  0x00002aaaabe7d12c in ?? ()
#5  0x00022aaa00000003 in ?? ()
#6  0x00002aaaabe7d0aa in ?? ()
#7  0x01000002abe7cff0 in ?? ()
#8  0x00002aaaabe7d02c in ?? ()
#9  0x0100000000000001 in ?? ()
#10 0x00000000014388e0 in ?? ()
#11 0x00007fff00000001 in ?? ()
#12 0x0000000000b870a2 in llvm::JIT::runFunction (this=0x1405b70,
F=0x14024e0, ArgValues=@0x7fffffffe050)
   at /home/rnk/llvm-gdb/lib/ExecutionEngine/JIT/JIT.cpp:395
#13 0x0000000000baa4c5 in llvm::ExecutionEngine::runFunctionAsMain
(this=0x1405b70, Fn=0x14024e0, argv=@0x13f06f8, envp=0x7fffffffe3b0)
   at /home/rnk/llvm-gdb/lib/ExecutionEngine/ExecutionEngine.cpp:377
#14 0x00000000007ebd52 in main (argc=2, argv=0x7fffffffe398,
envp=0x7fffffffe3b0) at /home/rnk/llvm-gdb/tools/lli/lli.cpp:208

And a backtrace after this patch:

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaabe7d1a8 in baz ()
(gdb) bt
#0  0x00002aaaabe7d1a8 in baz ()
#1  0x00002aaaabe7d12c in bar ()
#2  0x00002aaaabe7d0aa in foo ()
#3  0x00002aaaabe7d02c in main ()
#4  0x0000000000b870a2 in llvm::JIT::runFunction (this=0x1405b70,
F=0x14024e0, ArgValues=...)
   at /home/rnk/llvm-gdb/lib/ExecutionEngine/JIT/JIT.cpp:395
#5  0x0000000000baa4c5 in llvm::ExecutionEngine::runFunctionAsMain
(this=0x1405b70, Fn=0x14024e0, argv=..., envp=0x7fffffffe3c0)
   at /home/rnk/llvm-gdb/lib/ExecutionEngine/ExecutionEngine.cpp:377
#6  0x00000000007ebd52 in main (argc=2, argv=0x7fffffffe3a8,
envp=0x7fffffffe3c0) at /home/rnk/llvm-gdb/tools/lli/lli.cpp:208

So much nicer.

See http://llvm.org/docs/DebuggingJITedCode.html for more details. Thanks to our intern, Reid Kleckner, for doing the heavy lifting on this feature!

Benchmarks

2009Q3 uses a more sophisticated system for determining which functions to compile than did 2009Q2. Accordingly, we no longer use Unladen Swallow's -j always option when benchmarking 2009Q3.

Benchmarking was done on an Intel Core 2 Duo 6600 @ 2.40GHz with 4GB RAM with a 32-bit userspace.

2009Q2 vs 2009Q3

slowspitfire:
$ ./perf.py -r -b slowspitfire --args "-j always," ../q2/python ../q3/python
Min: 0.690717 -> 0.622342: 10.99% faster
Avg: 0.692846 -> 0.624929: 10.87% faster
Significant (t=165.901211, a=0.95)
Stddev: 0.00348 -> 0.00215: 62.23% smaller

ai:
$ ./perf.py -r -b ai --args "-j always," ../q2/python ../q3/python
Min: 0.525973 -> 0.459890: 14.37% faster
Avg: 0.529790 -> 0.464647: 14.02% faster
Significant (t=69.943861, a=0.95)
Stddev: 0.00238 -> 0.00900: 73.55% larger

slowpickle:
$ ./perf.py -r -b slowpickle --args "-j always," ../q2/python ../q3/python
Min: 0.732290 -> 0.597355: 22.59% faster
Avg: 0.733397 -> 0.615644: 19.13% faster
Significant (t=13.096018, a=0.95)
Stddev: 0.00208 -> 0.08989: 97.68% larger

slowunpickle:
$ ./perf.py -r -b slowunpickle --args "-j always," ../q2/python ../q3/python
Min: 0.314137 -> 0.264590: 18.73% faster
Avg: 0.314825 -> 0.276463: 13.88% faster
Significant (t=9.762778, a=0.95)
Stddev: 0.00100 -> 0.03928: 97.45% larger

django:
$ ./perf.py -r -b django --args "-j always," ../q2/python ../q3/python
Min: 1.095181 -> 0.946080: 15.76% faster
Avg: 1.096714 -> 0.949940: 15.45% faster
Significant (t=315.826693, a=0.95)
Stddev: 0.00088 -> 0.00456: 80.82% larger

rietveld:
$ ./perf.py -r -b rietveld --args "-j always," ../q2/python ../q3/python
Min: 0.578493 -> 0.516558: 11.99% faster
Avg: 0.583965 -> 0.619006: 5.66% slower
Significant (t=-2.009135, a=0.95)
Stddev: 0.00804 -> 0.17422: 95.39% larger

call_simple: $ ./perf.py -r -b call_simple --args "-j always," ../q2/python ../q3/python
Min: 1.618273 -> 0.908331: 78.16% faster
Avg: 1.632256 -> 0.924890: 76.48% faster
Significant (t=433.008411, a=0.95)
Stddev: 0.00847 -> 0.01397: 39.38% larger


Comment by kalle.happonen, Nov 11, 2009

Seems like nice progress! Definately an interesting project to follow as a python coder. On the other hand, one could remark about the comparisons. It's hard for memory usage to be over 100% smaller than what it's compared to, but over 100% larger works....

Comment by nixarn, Nov 11, 2009

Awesome - keep up the good work!

Comment by volshebnyi, Nov 11, 2009

Really great improvement since Q2, waiting for optimized regex in Q4. Thanks for new backtrace! =)

Comment by doug.farrell, Nov 13, 2009

I'm curious if the release of the Go language by Google will impact the Unladden Swallow project, personally, I hope not. :)

Comment by lobais, Nov 17 (3 days ago)

#kalle.happonen yes, I thought the same. Probably they mean that Q2 uses 930% more than Q3, giving a reduction of 89%. Still, a very nice accomplishment.

#doug.farrell I doubt that either Go nor Noop will impact Unladden Swallow too much. I think they are both rather experimental, fun projects, whereas the swallow is very serious.


Sign in to add a comment