My favorites | Sign in
Google
                
Search
for
Updated Oct 20, 2009 by collinw
Release2009Q2  
Details on the 2009Q2 release

Unladen Swallow 2009Q2

Unladen Swallow 2009Q2 is the first release of Unladen Swallow to use LLVM for native code generation. To obtain the 2009Q2 release, run

svn checkout http://unladen-swallow.googlecode.com/svn/branches/release-2009Q2-maint unladen-2009Q2

The Unladen Swallow team does not recommend wide adoption of the 2009Q2 release. This is intended as a checkpoint of our progress, a milestone on the long path to our eventual performance goals. 2009Q2 can compile all pure-Python code to correct native machine code, but is intended to set the stage for more significant performance improvements in the 2009Q3 release that will take advantage of the LLVM-based compiler infrastructure built in Q2.

Highlights:

  • Unladen Swallow 2009Q2 uses LLVM to compile hot functions (anything called more than 10000 times) to machine code. A -j always command-line option is available to force all functions though LLVM.
  • Unladen Swallow 2009Q2 starts up faster than 2009Q1.
  • A number of buggy corner cases in the 2009Q1 version of cPickle have been fixed.
  • Unladen Swallow 2009Q2 passes the tests for all the third-party tools and libraries listed on the Testing page. Significantly for many projects, this includes compatibility with Twisted, Django, NumPy and Swig.

Lowlights:

  • Memory usage has increased by 10x. We have thus far spent no time improving this; lowering memory usage is a goal for the 2009Q3 release.
  • LLVM's JIT memory manager is limited to 16MB of native code. This is not a problem in practice, but interferes with regrtest.py runs. This is being fixed upstream in LLVM, and the result patch will be backported to the 2009Q2 release branch.

Benchmarks

2009Q2 uses a very simple function to determine whether to compile a given function to machine code. Accordingly, we use Unladen Swallow's -j always flag to force all functions through LLVM, which gives us a more accurate picture of how our native code generation facility is performing.

Benchmarking was done on an Intel Core 2 Duo 6600 @ 2.40GHz with 4GB RAM.

2009Q1 vs 2009Q2
(32-bit; gcc 4.0.3; perf.py -r --args ",-j always -O2")

ai:
Min: 0.490245 -> 0.477799: 2.60% faster
Avg: 0.492445 -> 0.481081: 2.36% faster
Significant (t=42.490318, a=0.95)
Stddev: 0.00075 -> 0.00257: 70.92% larger

django:
Min: 1.097285 -> 1.031586: 6.37% faster
Avg: 1.099378 -> 1.034914: 6.23% faster
Significant (t=191.190350, a=0.95)
Stddev: 0.00142 -> 0.00306: 53.49% larger

slowpickle:
Min: 0.735551 -> 0.652740: 12.69% faster
Avg: 0.737914 -> 0.653076: 12.99% faster
Significant (t=258.803262, a=0.95)
Stddev: 0.00327 -> 0.00023: 1320.90% smaller

slowspitfire:
Min: 0.788618 -> 0.663307: 18.89% faster
Avg: 0.790304 -> 0.665141: 18.82% faster
Significant (t=338.460137, a=0.95)
Stddev: 0.00294 -> 0.00224: 31.54% smaller

slowunpickle:
Min: 0.317278 -> 0.279072: 13.69% faster
Avg: 0.318411 -> 0.280351: 13.58% faster
Significant (t=174.904639, a=0.95)
Stddev: 0.00088 -> 0.00199: 55.74% larger

2009Q1 vs 2009Q2
(32-bit; gcc 4.0.3; perf.py -r)

normal_startup:
Min: 0.378594 -> 0.294137: 28.71% faster
Avg: 0.400236 -> 0.306967: 30.38% faster
Significant (t=22.105565, a=0.95)
Stddev: 0.00915 -> 0.04119: 77.80% larger


Comment by ChuanTong.Huang, Jul 14, 2009

顶下。。。。

Comment by lacker, Jul 14, 2009

LGTM++

Comment by python23, Jul 14, 2009

good work!

Comment by dangwy, Jul 14, 2009

yeah,i like this!

Comment by dangwy, Jul 14, 2009

why checkout url doesn't exist!!!

Comment by willie.tw, Jul 14, 2009

Awesome works!! Thank you guys. :D

Comment by wangyanguang, Jul 14, 2009

best message!very like python.

Comment by pank7yardbird, Jul 14, 2009

memory 10x? I really hope that can be fixed...

Comment by suraci.alex, Jul 14, 2009

Congratulations!

Comment by ilya.sandler, Jul 14, 2009

> Memory usage has increased by 10x.

Where has this increase happened? Application code storage (e.g. JIT overhead, tracing data, etc)? Or application data (data structures used by application itself) storage?

Comment by ruanjiayuan, Jul 14, 2009

Good Job

Comment by tomhsx, Jul 14, 2009

awesome work keep it up.

Comment by pcdinh, Jul 14, 2009

Great news

Comment by peng2006, Jul 14, 2009

wonderful job

Comment by goalzz85, Jul 14, 2009

春哥纯爷们,铁血真汉子

Comment by pamad05, Jul 15, 2009

Awesome guys!

Comment by David.Gaarenstroom, Jul 15, 2009

I tried the Q2 release and I most say it is a bit of disappointing to me. Some code runs marginally faster than CPython but most runs a bit slower. But it runs just fine on LLVM, and that was the most important achievement AFAIK...

But you are not seriously using gcc-4.0.3 for comparison, are you?

Comment by xuhanf, Jul 15, 2009

good job!

Comment by juanjux, Jul 15, 2009

Excellent work. As a Python programmer since 1.5.x versions and Django developer, I'm waiting anxiously for the Q3 release with that 5x performance improvement!

Comment by f...@elastic.org, Jul 15, 2009

Do I understand correctly that the JIT is giving you up to 30% better performance? I hope it becomes less underwhelming in the future.

Comment by miguel.filipe, Jul 15, 2009

Why do you guys used gcc-4.0.3 (over 3years old) instead of a more recent (and much improved, it seems) gcc-4.4 or gcc-4.3 ?

Comment by Lucian.B...@gmail.com, Jul 15, 2009

10x memory usage is a serious deal-breaker. Fix that ASAP. Underwhelming speedups aren't that much of an issue, for now.

Right now, PyPy? is a much better prospect for real world usage, even though it only supports ctypes for C modules.

Comment by derago, Jul 15, 2009

http://newcode.bastart.eu.org:8000/ cherrypy + sqlalchemy + genshi + babel + unladen swallow :)

Comment by derago, Jul 15, 2009

ps: my site is far from production quality obviously...

Comment by drawkbox, Jul 15, 2009

wow now on to python 3! :)

Comment by turian, Jul 15, 2009

What about benchmarks wrt the original python implementation?

Comment by xielingsen, Jul 15, 2009

Good job! 很棒!

Comment by locke23rus, Jul 16, 2009
derago,

error 404 on page http://newcode.bastart.eu.org:8000/forumfrm =-)

Comment by derago, Jul 16, 2009

Yes, the 404 is because I use the CherryPy? standalone server for the unladen-swallow version of the site. The /forum URL is normally mapped by a bit of Apache magic. The site seems to be ~20% faster, but currently I can't compare it because one site runs with mod_python and the other with the CherryPy? server. I will do a benchmark with unladen swallow and python 2.6 with the standalone server today.

Comment by kirillkh, Jul 16, 2009

> Stddev: 0.00142 -> 0.00306: 53.49% larger

I think you're counting the percentage wrong. It should be 115% larger. When counting percentage, the original value should be used as reference, not the new value.

Comment by bryanpieper, Jul 17, 2009

Looking forward to the positive impact of this project on the python community in the near future!

Comment by karthikeyan.m, Jul 17, 2009

Nice work. How do you handle a method which gets called only once, but which has a long running loop which does a lot of work ? Do you have any mechanism similar to the OSR mechanism in Hotspot ? If not, do you see any problems implementing this with LLVM ?

Comment by reid.kleckner, Jul 18, 2009

To a bunch of the above comments:

  • No, we don't have on stack replacement. We'll have to look at how Hotspot does that, but that's a long way away.
  • We're aware the JIT gives up to 30% improvement. These benchmarks are not speedups that users will see, but rather they verify to us that we have not created regressions. This quarter was about correctness and profiling, next quarter is optimization.
  • Percentages are intentionally measured as (new - old) / new, as described on the Benchmarks page.
  • We're looking into the memory usage problem. Mostly memory usage is only a problem when using -j always because it compiles every function and module body and then it holds onto the LLVM IR. It's probably safe for us to throw some of that away.

Thanks for the questions!

Comment by huangyy, Jul 18, 2009

太好了

Comment by BlueWater0121, Jul 19, 2009

好象很不错的样子!^_^

Comment by yarshure, Jul 20, 2009

好象很不错的样子!^_^

Comment by springv, Jul 30, 2009

很期待!!

Comment by ro...@reportlab.com, Aug 05, 2009

What's the correct way to post feedback (or is there a mailing list)?

I'm trying to build unladen-2009Q2 system is 32bit 386 7.1-RELEASE FreeBSD. I tried

./configure --prefix=/home/rptlab/UNLADEN/ --enable-unicode=ucs2 works fine

make which fails with

"Makefile", line 80: Missing dependency operator
"Makefile", line 82: Need an operator
"Makefile", line 84: Need an operator

make: fatal errors encountered -- cannot continue

I'm guessing that might be due to makefile peculiarities (although the python-2.5 makefile works out of the box)

with gnu make I see things going fine at first and then

llvm[1]: ***** Completed Release-Asserts Build
gmake[1]: Leaving directory `/usr/home/rptlab/devel/unladen-2009Q2/Util/llvm'
g++ -pthread -c -fno-strict-aliasing -fwrapv -Wall -Wstrict-prototypes -DNDEBUG -g -O3  -I. -IInclude -I. -I./Include   -DPy_BUILD_CORE -o Modules/python.o ./Modules/python.c
cc1plus: warning: command line option "-Wstrict-prototypes" is valid for C/ObjC but not for C++
./Modules/python.c: In function 'int main(int, char**)':
./Modules/python.c:20: error: 'fpgetmask' was not declared in this scope
./Modules/python.c:21: error: 'fpsetmask' was not declared in this scope
gmake: *** [Modules/python.o] Error 1

Any way to find out what's going wrong? Should I try and install a port of llvm and then reconfigure to use a built in llvm?

Comment by xielingsen, Aug 17, 2009

Good job! 期待进一步的提升。

Comment by greatpet, Aug 24, 2009

Not impressed. On my machine the following Fibonacci code finishes in 0.22s with Psyco, but Unladen2009Q2? takes 3.8s, about the same as standard CPython, 17 times slower than Psyco.

def fib(x):

if x==0 or x==1:
return 1
else:
return fib(x-2)+fib(x-1)
print fib(33)

Comment by nagy.attila, Aug 31, 2009

For the fpgetmask problem, change floatingpoint.h in Modules/python.c to ieeefp.h Patch here: http://code.google.com/p/unladen-swallow/issues/detail?id=77&colspec=ID%20Type%20Status%20Priority%20Release%20Owner%20Summary

Comment by goldmaneye, Oct 01, 2009

Great progress!

@greatpet, you might have missed this comment from above:

We're aware the JIT gives up to 30% improvement. These benchmarks are not speedups that users will see, but rather they verify to us that we have not created regressions. This quarter was about correctness and profiling, next quarter is optimization.

Comment by vsapre80, Oct 13, 2009

Any news on the Q3 release of UnladenSwallow?


Sign in to add a comment