Unladen Swallow 2009Q2Unladen Swallow 2009Q2 is the first release of Unladen Swallow to use LLVM for native code generation. To obtain the 2009Q2 release, run svn checkout http://unladen-swallow.googlecode.com/svn/branches/release-2009Q2-maint unladen-2009Q2 The Unladen Swallow team does not recommend wide adoption of the 2009Q2 release. This is intended as a checkpoint of our progress, a milestone on the long path to our eventual performance goals. 2009Q2 can compile all pure-Python code to correct native machine code, but is intended to set the stage for more significant performance improvements in the 2009Q3 release that will take advantage of the LLVM-based compiler infrastructure built in Q2. Highlights: - Unladen Swallow 2009Q2 uses LLVM to compile hot functions (anything called more than 10000 times) to machine code. A -j always command-line option is available to force all functions though LLVM.
- Unladen Swallow 2009Q2 starts up faster than 2009Q1.
- A number of buggy corner cases in the 2009Q1 version of cPickle have been fixed.
- Unladen Swallow 2009Q2 passes the tests for all the third-party tools and libraries listed on the Testing page. Significantly for many projects, this includes compatibility with Twisted, Django, NumPy and Swig.
Lowlights: - Memory usage has increased by 10x. We have thus far spent no time improving this; lowering memory usage is a goal for the 2009Q3 release.
- LLVM's JIT memory manager is limited to 16MB of native code. This is not a problem in practice, but interferes with regrtest.py runs. This is being fixed upstream in LLVM, and the result patch will be backported to the 2009Q2 release branch.
Benchmarks2009Q2 uses a very simple function to determine whether to compile a given function to machine code. Accordingly, we use Unladen Swallow's -j always flag to force all functions through LLVM, which gives us a more accurate picture of how our native code generation facility is performing. Benchmarking was done on an Intel Core 2 Duo 6600 @ 2.40GHz with 4GB RAM. 2009Q1 vs 2009Q2 (32-bit; gcc 4.0.3; perf.py -r --args ",-j always -O2") ai: Min: 0.490245 -> 0.477799: 2.60% faster Avg: 0.492445 -> 0.481081: 2.36% faster Significant (t=42.490318, a=0.95) Stddev: 0.00075 -> 0.00257: 70.92% larger django: Min: 1.097285 -> 1.031586: 6.37% faster Avg: 1.099378 -> 1.034914: 6.23% faster Significant (t=191.190350, a=0.95) Stddev: 0.00142 -> 0.00306: 53.49% larger slowpickle: Min: 0.735551 -> 0.652740: 12.69% faster Avg: 0.737914 -> 0.653076: 12.99% faster Significant (t=258.803262, a=0.95) Stddev: 0.00327 -> 0.00023: 1320.90% smaller slowspitfire: Min: 0.788618 -> 0.663307: 18.89% faster Avg: 0.790304 -> 0.665141: 18.82% faster Significant (t=338.460137, a=0.95) Stddev: 0.00294 -> 0.00224: 31.54% smaller slowunpickle: Min: 0.317278 -> 0.279072: 13.69% faster Avg: 0.318411 -> 0.280351: 13.58% faster Significant (t=174.904639, a=0.95) Stddev: 0.00088 -> 0.00199: 55.74% larger 2009Q1 vs 2009Q2 (32-bit; gcc 4.0.3; perf.py -r) normal_startup: Min: 0.378594 -> 0.294137: 28.71% faster Avg: 0.400236 -> 0.306967: 30.38% faster Significant (t=22.105565, a=0.95) Stddev: 0.00915 -> 0.04119: 77.80% larger
|
顶下。。。。
LGTM++
good work!
yeah,i like this!
why checkout url doesn't exist!!!
Awesome works!! Thank you guys. :D
best message!very like python.
memory 10x? I really hope that can be fixed...
Congratulations!
> Memory usage has increased by 10x.
Where has this increase happened? Application code storage (e.g. JIT overhead, tracing data, etc)? Or application data (data structures used by application itself) storage?
Good Job
awesome work keep it up.
Great news
wonderful job
春哥纯爷们,铁血真汉子
Awesome guys!
I tried the Q2 release and I most say it is a bit of disappointing to me. Some code runs marginally faster than CPython but most runs a bit slower. But it runs just fine on LLVM, and that was the most important achievement AFAIK...
But you are not seriously using gcc-4.0.3 for comparison, are you?
good job!
Excellent work. As a Python programmer since 1.5.x versions and Django developer, I'm waiting anxiously for the Q3 release with that 5x performance improvement!
Do I understand correctly that the JIT is giving you up to 30% better performance? I hope it becomes less underwhelming in the future.
Why do you guys used gcc-4.0.3 (over 3years old) instead of a more recent (and much improved, it seems) gcc-4.4 or gcc-4.3 ?
10x memory usage is a serious deal-breaker. Fix that ASAP. Underwhelming speedups aren't that much of an issue, for now.
Right now, PyPy? is a much better prospect for real world usage, even though it only supports ctypes for C modules.
http://newcode.bastart.eu.org:8000/ cherrypy + sqlalchemy + genshi + babel + unladen swallow :)
ps: my site is far from production quality obviously...
wow now on to python 3! :)
What about benchmarks wrt the original python implementation?
Good job! 很棒!
error 404 on page http://newcode.bastart.eu.org:8000/forumfrm =-)
Yes, the 404 is because I use the CherryPy? standalone server for the unladen-swallow version of the site. The /forum URL is normally mapped by a bit of Apache magic. The site seems to be ~20% faster, but currently I can't compare it because one site runs with mod_python and the other with the CherryPy? server. I will do a benchmark with unladen swallow and python 2.6 with the standalone server today.
> Stddev: 0.00142 -> 0.00306: 53.49% larger
I think you're counting the percentage wrong. It should be 115% larger. When counting percentage, the original value should be used as reference, not the new value.
Looking forward to the positive impact of this project on the python community in the near future!
Nice work. How do you handle a method which gets called only once, but which has a long running loop which does a lot of work ? Do you have any mechanism similar to the OSR mechanism in Hotspot ? If not, do you see any problems implementing this with LLVM ?
To a bunch of the above comments:
Thanks for the questions!
太好了
好象很不错的样子!^_^
好象很不错的样子!^_^
很期待!!
What's the correct way to post feedback (or is there a mailing list)?
I'm trying to build unladen-2009Q2 system is 32bit 386 7.1-RELEASE FreeBSD. I tried
make which fails with
make: fatal errors encountered -- cannot continue
I'm guessing that might be due to makefile peculiarities (although the python-2.5 makefile works out of the box)
with gnu make I see things going fine at first and then
Any way to find out what's going wrong? Should I try and install a port of llvm and then reconfigure to use a built in llvm?
Good job! 期待进一步的提升。
Not impressed. On my machine the following Fibonacci code finishes in 0.22s with Psyco, but Unladen2009Q2? takes 3.8s, about the same as standard CPython, 17 times slower than Psyco.
def fib(x):
print fib(33)For the fpgetmask problem, change floatingpoint.h in Modules/python.c to ieeefp.h Patch here: http://code.google.com/p/unladen-swallow/issues/detail?id=77&colspec=ID%20Type%20Status%20Priority%20Release%20Owner%20Summary
Great progress!
@greatpet, you might have missed this comment from above:
We're aware the JIT gives up to 30% improvement. These benchmarks are not speedups that users will see, but rather they verify to us that we have not created regressions. This quarter was about correctness and profiling, next quarter is optimization.
Any news on the Q3 release of UnladenSwallow?