Posted on Oct 9, 2009 by Helpful Elephant

The current (as of r862) hotness heuristic is flagging lots of trivial functions as hot. These should be inlined into their callers.

Data from slowpickle:

python performance/bm_pickle.py -n 10 --profile pickle

     17576425 function calls (16332114 primitive calls) in 17.017 CPU seconds

Ordered by: internal time

ncalls tottime percall cumtime percall filename:lineno(function) 1248312/12003 4.575 0.000 16.801 0.001 pickle.py:269(save) 588147 2.020 0.000 3.266 0.000 pickle.py:227(memoize) 336084 1.556 0.000 2.115 0.000 pickle.py:1260(encode_long) 528132 1.492 0.000 4.853 0.000 pickle.py:480(save_string) 16004 1.175 0.000 14.736 0.001 pickle.py:655(_batch_setitems) 336084 0.901 0.000 3.296 0.000 pickle.py:461(save_long) 1924481 0.604 0.000 0.604 0.000 {method 'write' of 'cStringIO.StringO' objects} 2224556 0.582 0.000 0.582 0.000 {method 'get' of 'dict' objects} 1788447 0.573 0.000 0.573 0.000 {chr} 588147 0.567 0.000 0.747 0.000 pickle.py:250(put) 2436609 0.399 0.000 0.399 0.000 {id} 1952488 0.376 0.000 0.376 0.000 {len} 16004/12003 0.344 0.000 12.386 0.001 pickle.py:608(_batch_appends) 288072 0.279 0.000 0.374 0.000 pickle.py:260(get) 1248312 0.250 0.000 0.250 0.000 pickle.py:333(persistent_id)

Heuristic results: 17 code objects deemed hot pickle.py:333 (persistent_id) 12483120 pickle.py:269 (save) 12483120 pickle.py:227 (memoize) 5881470 pickle.py:250 (put) 5881470 pickle.py:480 (save_string) 5281320 pickle.py:461 (save_long) 3360840 pickle.py:1260 (encode_long) 3360840 pickle.py:260 (get) 2880720 pickle.py:437 (save_int) 360090 pickle.py:655 (_batch_setitems) 247073 pickle.py:608 (_batch_appends) 226143 pickle.py:591 (save_list) 160040 pickle.py:640 (save_dict) 160040 pickle.py:532 (save_tuple) 131794 pickle.py:220 (dump) 120030 pickle.py:173 (init) 120030 pickle.py:1364 (dumps) 120030

The heuristic weights persistent_id() heavily because of the number of function calls, even though all those calls only account for .25 seconds out of 17 seconds of total running time; same thing for get() and put().

Django has a similar problem:

python performance/bm_django.py -n 10 --profile

     17628021 function calls (17624409 primitive calls) in 23.681 CPU seconds

Ordered by: internal time

ncalls tottime 1895436 7853604 1812/12 271812 270000 270000 270000 1824/12 270000 1350000 270000 271824 270000 540000 271812 271812 273612 1630846 271812 543624 273612 percall cumtime percall filename:lineno(function) 2.961 0.000 4.072 0.000 encoding.py:37(force_unicode) 2.897 0.000 2.897 0.000 {isinstance} 2.476 0.001 23.652 1.971 defaulttags.py:115(render) 1.652 0.000 9.782 0.000 init.py:533(resolve) 1.638 0.000 3.032 0.000 defaultfilters.py:375(escape) 1.266 0.000 17.483 0.000 init.py:800(render) 1.258 0.000 5.929 0.000 defaultfilters.py:34(_dec) 1.254 0.001 23.678 1.973 init.py:764(render) 1.163 0.000 5.350 0.000 functional.py:246(wrapper) 1.104 0.000 1.104 0.000 {method 'replace' of 'unicode' objects} 1.078 0.000 3.943 0.000 html.py:30(escape) 0.885 0.000 1.334 0.000 safestring.py:89(mark_safe) 0.882 0.000 1.394 0.000 safestring.py:104(mark_for_escaping) 0.670 0.000 0.670 0.000 {getattr} 0.552 0.000 0.904 0.000 init.py:690(_resolve_lookup) 0.396 0.000 1.300 0.000 init.py:672(resolve) 0.353 0.000 0.353 0.000 context.py:38(getitem) 0.322 0.000 0.322 0.000 {method 'append' of 'list' objects} 0.264 0.000 0.264 0.000 {hasattr} 0.253 0.000 0.253 0.000 init.py:790(render) 0.182 0.000 0.182 0.000 context.py:34(setitem)

Heuristic results: 16 code objects deemed hot django/utils/encoding.py:37 (force_unicode) 18954370 django/template/init.py:790 (render) 5436240 django/template/context.py:38 (getitem) 2745266 django/template/context.py:34 (setitem) 2736120 django/template/init.py:690 (_resolve_lookup) 2727211 django/template/init.py:533 (resolve) 2727154 django/utils/safestring.py:89 (mark_safe) 2718240 django/template/init.py:672 (resolve) 2718120 django/utils/functional.py:246 (wrapper) 2709091 django/template/init.py:800 (render) 2700000 django/utils/safestring.py:104 (mark_for_escaping) 2700000 django/utils/html.py:30 (escape) 2700000 django/template/defaultfilters.py:34 (_dec) 2700000 django/template/defaultfilters.py:375 (escape) 2700000 django/template/init.py:764 (render) 117246 django/template/defaulttags.py:115 (render) 117120

The init.py:790 (render) function should not optimized independently, nor should context.py's getitem() and setitem() calls.

The proper way to optimize these leaf functions is to inline them into their parents. We need a way to create frames lazily to achieve maximum benefit, possibly as part of PyTraceback_Here() (or colocated machinery). As long as tracebacks contain all the right frames, users of sys._getframe() get what they deserve if they're lookup up the call stack and making assertions about it.

Comment #1

Posted on Jan 6, 2010 by Helpful Elephant

(No comment was entered for this change.)

Comment #2

Posted on Jan 12, 2010 by Helpful Elephant

The html5lib benchmark also has a lot of opportunities for inlining.

Comment #3

Posted on Jan 19, 2010 by Helpful Elephant

Joerg Blank has started on this.

Comment #4

Posted on Jan 28, 2010 by Happy Elephant

This is now blocked by Issue 41.

Running html5lib with inlineing enabled generates some callsites which become polymorphic and cause a lot of bails.

Comment #5

Posted on Jan 28, 2010 by Helpful Elephant

Can you elaborate on this? How does the inlining make callsites more polymorphic? I assume you're talking about callsites in the callee (function being inlined)? What kind of callsites? Which optimization is causing the bails?

Comment #6

Posted on Jan 28, 2010 by Happy Elephant

Please excuse my bad wording.

A callsite is regarded as monomorphic and the called function gets subsequently inlined. Later the same site becomes polymorphic and causes 100k bails in a single round (Up from about 5k from cold branches).

Code

Archive

unladen-swallow - issue #86