My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
Gprof2Dot  
Convert profiling output to a dot graph.
Featured, Script, Python
Updated Oct 25, 2011 by Jose.R.F...@gmail.com

About

This is a Python script to convert the output from many profilers into a dot graph.

It has the following features:

If you want an interactive viewer for gprof2dot output graphs, check xdot.

Example

This is the result from the example data in the Linux Gazette article with the default settings:

Changelog

  • 2011-10-23: linux perf support (w/ Mark Hills assistance)
  • 2011-02-13: Handle multiple spaces in callgrind (by Suzev.Kirill,  issue 51 )
  • 2010-12-12: Handle empty/recent callgrind output ( issue #34 )
  • 2010-12-12: Java HPROF support (by Russell Power,  issue #41 )
  • 2010-08-27: Basic xperf support.
  • 2010-01-31: Initial callgrind support.
  • 2009-12-07: Skew colorization curve to show more/less detail (by Cris Ewing,  issue #26 ).
  • 2009-10-21: Support sysprof format.
  • 2009-10-20: Support Very Sleepy format.
  • 2009-08-09: Support hostshot profile format file (from tocer.deng,  issue #20 ).
  • 2009-03-04: Support AQtime xml format.
  • 2009-03-04: Support gprof indices > 10000 (by jkuzeja).
  • 2009-01-15: Add MacOSX Shark output parser; use rounded corners (contributed by Tomas Carnecky).
  • 2008-09-18: More complete theming. Black and white color theme.
  • 2008-06-29: Added an heuristic to propagate time inside cycles. This allows to determine the critical path for highly recursive code, such as the linux kernel code seen in issue 6.
  • 2008-04-08: Accept " " in oprofile profiles (contributed by Jakob Bornecrantz).
  • 2008-01-31: Handle oprofile output with profile specifications.
  • 2008-01-25: Propagate the self time spent in cycle member functions, which was being ignored yielding wrong results (mainly when pos-processing oprofile data).
  • 2007-12-20: Allow more than a single input file for pstats (contributed by Daniele Varrazzo).
  • 2007-11-18: Parse oprofile callgraph output (more detail here).
  • 2007-09-28: Get the total time not from the granularity time, but from the longest function call.
  • 2007-07-14: Added the ability to read output generated by the python profilers.
  • 2007-05-30: Handle the output produced by gprof with the static call graph option.
  • 2007-05-16: Strip template parameters from function names; add command options to control this behavior.
  • 2007-04-17: Fix bug parsing cycle entries, which have slightly different syntax than regular entries, and therefore are now parsed separately.
  • 2007-04-02: Consistent handling of cycles.
  • 2007-04-02: Handle output produced by non-GNU gprof.
  • 2007-03-30: Initial import.

Requirements

Windows users

Debian/Ubuntu users

  • Run:
  • apt-get install python graphviz

Download

Documentation

Usage

Usage: 
	gprof2dot.py [options] [file] ...

Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -o FILE, --output=FILE
                        output filename [stdout]
  -n PERCENTAGE, --node-thres=PERCENTAGE
                        eliminate nodes below this threshold [default: 0.5]
  -e PERCENTAGE, --edge-thres=PERCENTAGE
                        eliminate edges below this threshold [default: 0.1]
  -f FORMAT, --format=FORMAT
                        profile format: prof, callgrind, oprofile, hprof,
                        sysprof, shark, sleepy, aqtime, pstats, or xperf
                        [default: prof]
  -c THEME, --colormap=THEME
                        color map: color, pink, gray, or bw [default: color]
  -s, --strip           strip function parameters, template parameters, and
                        const modifiers from demangled C++ function names
  -w, --wrap            wrap function names
  --skew=THEME_SKEW     skew the colorization curve.  Values < 1.0 give more
                        variety to lower percentages.  Value > 1.0 give less
                        variety to lower percentages

Examples

gprof

/path/to/your/executable arg1 arg2
gprof path/to/your/executable | gprof2dot.py | dot -Tpng -o output.png

linux perf

perf record -g /path/to/your/executable
perf script | gprof2dot.py -f perf | dot -Tpng -o output.png

oprofile

opcontrol --callgraph=16
opcontrol --start 
/path/to/your/executable arg1 arg2
opcontrol --stop
opcontrol --dump
opreport -cgf | gprof2dot.py -f oprofile | dot -Tpng -o output.png

python profile

python -m profile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

python cProfile (formerly known as lsprof)

python -m cProfile -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

python hotshot profiler

The hotshot profiler does not include a main function. Download this hotshotmain.py script instead.

hotshotmain.py -o output.pstats path/to/your/script arg1 arg2
gprof2dot.py -f pstats output.pstats | dot -Tpng -o output.png

xperf

If you're not familiar with xperf then read this excellent article first. Then do:

  • Start xperf as
  • xperf -on Latency -stackwalk profile
  • Run your application.
  • Save the data.
  • xperf -d output.etl
  • Start the visualizer:
  • xperf output.etl
  • Select an area of interest on the CPU sampling graph, right-click, and select Summary Table.
  • In the Columns menu, make sure the Stack column is enabled and visible.
  • Right click in a row and choose Export Full Table, and save to output.csv.
  • Then invoke gprof2dot as
  • gprof2dot.py -f xperf output.csv | dot -Tpng -o output.png

Output

A node in the output graph represents a function and has the following layout:

+------------------------------+
|        function name         |
| total time % ( self time % ) |
|         total calls          |
+------------------------------+

where:

  • total time % is the percentage of the running time spent in this function and all its children;
  • self time % is the percentage of the running time spent in this function alone;
  • total calls is the total number of times this function was called (including recursive calls).

An edge represents the calls between two functions and has the following layout:

           total time %
              calls
parent --------------------> children

Where:

  • total time % is the percentage of the running time transfered from the children to this parent (if available);
  • calls is the number of calls the parent function called the children.

Note that in recursive cycles, the total time % in the node is the same for the whole functions in the cycle, and there is no total time % figure in the edges inside the cycle, since such figure would make no sense.

The color of the nodes and edges varies according to the total time % value. In the default temperature-like color-map, functions where most time is spent (hot-spots) are marked as saturated red, and functions where little time is spent are marked as dark blue. Note that functions where negligible or no time is spent do not appear in the graph by default.

Frequently Asked Questions

How can I generate a call graph from gprof output?

By default gprof2dot.py generates a partial call graph, excluding nodes and edges with little or no impact in the total computation time. If you want the full call graph then set a zero threshold for nodes and edges via the -n / --node-thres and -e / --edge-thres options, as:

gprof2dot.py -n0 -e0

For an even more complete call graph, also run gprof with the -c / --static-call-graph, which identifies by statical analysis of the binary machine code other functions that could have been called, but never were.

The node labels are too wide. How can I narrow them?

The node labels can get very wide when profiling C++ code, due to inclusion of scope, function arguments, and template arguments in demangled C++ function names.

If you do not need function and template arguments information, then pass the -s / --strip option to strip them.

If you want to keep all that information, or if the labels are still too wide, then you can pass the -w / --wrap, to wrap the labels. Note that because dot does not wrap labels automatically the label margins will not be perfectly aligned.

Why there is no output, or it is all in the same color?

Likely, the total execution time is too short, so there is not enough precision in the profile to determine where time is being spent.

You can still force displaying the whole graph by setting a zero threshold for nodes and edges via the -n / --node-thres and -e / --edge-thres options, as:

gprof2dot.py -n0 -e0

But to get meaningful results you will need to find a way to run the program for a longer time period, or run gprof with multiple profiles. See also:

Why don't the percentages add up?

You likely have an execution time too short, causing the round-off errors to be large.

See question above for ways to increase execution time.

Which options should I pass to gcc when compiling for profiling?

Options which are essential to produce output suitable are:

  • -pg : generate profiling instrumentation code (only for gprof)
  • -g : produce debugging information
  • -fno-omit-frame-pointer : use the frame pointer (frame pointer usage is disabled by default in some architectures like x86_64 and for some optimization levels; it is impossible to walk the call stack without it)

You want the code you are profiling to be as close as possible as the code that you will be releasing. So you should include all options that you use in your release code, typically:

  • -O2 : optimizations that do not involve a space-speed tradeoff
  • -DNDEBUG : disable debugging code in the standard library (such as the assert macro)

However, due to the profiling mechanism used by gprof (and other profilers), many of the optimizations performed by gcc interfere with the accuracy/granularity of the profiling. You should pass these options to disable those particular optimizations:

  • -fno-inline-functions : do not inline functions into their parents (otherwise the time spent on these functions will be attributed to the caller)
  • -fno-inline-functions-called-once : similar to above
  • -fno-optimize-sibling-calls : do not optimize sibling and tail recursive calls (otherwise tail calls may be attributed to the parent function)

If the granularity is still too low, you may pass these options to achieve finer granularity:

  • -fno-default-inline : do not make member functions inline by default merely because they are defined inside the class scope
  • -fno-inline : do not pay attention to the inline keyword
Note however that with these last options the timings of functions called many times will be distorted due to the function call overhead. This is particularly true for typical C++ code which expects that these optimizations to be done for decent performance.

See the full list of gcc optimization options for more information.

Links

Profiling tools

Profile visualization tools

Call-graph generation tools

  • pycallgraph -- a call graph generator for Python programs.
Comment by fbise...@gmail.com, Nov 12, 2007

Great job José. Thanks for making it available to all of us!

Comment by ncalexan...@gmail.com, Apr 3, 2008

I just wanted to let you know that this script just saved me hours of deciphering the Python profiler's output. Thanks!

Comment by leandr...@gmail.com, May 13, 2008

excellent work!! thanks a lot!

Comment by b...@users.sf.net, Dec 6, 2008

Excellent python profiling result viewer. I'm using inkscape as an SVG visualization tool (couldn't find easily a dot viewer on Windows).

Comment by jonasdey...@gmail.com, Feb 10, 2009

It's just what I was looking for. Thanks.

Comment by jkuz...@gmail.com, Mar 2, 2009

Very nice. There's a small problem when there are more than 10000 functions. At least some versions of gprof have a bug where the initial function index loses the closing bracket for indices > 10000. I just added a '?' to the initial closing bracket in the cg_primary_re and cg_cycle_header_re regexps to fix this.

Here is an example of the gprof bug:

-----------------------------------------------
                0.00    0.00    3778/3778        timing::ClockDef::~ClockDef() [3176]
[9999]   0.0    0.00    0.00    3778         timing::ClockEdge::~ClockEdge() [9999]
-----------------------------------------------
                0.00    0.00      14/3657        timing::BaseTranList::handleUserTransitions(clkdb::SmartPointerTP<circuit::Port>, bool) [836]
                0.00    0.00    3643/3657        timing::AllClockDefs::skipClockPropDueToInactiveClock(clkdb::SmartPointerTP<timing::ClockDef>) [2467]
[10000   0.0    0.00    0.00    3657         timing::ClockDef::getMasterSourceClock() const [10000]
-----------------------------------------------
                0.00    0.00    3503/3503        circuit::name::FindNameBaseTraverser::addFoundName(clkdb::SmartPointerTP<circuit::name::NameIFC>, clkdb::SmartPointerTP<clkdb::PersistentObject>) [2984]
[10001   0.0    0.00    0.00    3503         circuit::name::FindNameContainer<clkdb::ObjectContainer>::push_back(clkdb::SmartPointerTP<clkdb::PersistentObject>) [10001]
-----------------------------------------------
Comment by project member Jose.R.F...@gmail.com, Mar 4, 2009

jkuzeja,

Commited. Thanks.

Comment by eker...@gmail.com, May 20, 2009

Very nice, thanks!

/Johan Eker

Comment by jon.x.h...@gmail.com, May 22, 2009

It would be very nice to have an option to colour nodes according to exclusive time rather than inclusive - having the main red hotspot "main" for every profile is not always helpful

Comment by thekswen...@gmail.com, Jun 22, 2009

Really quality tools man...

thanks.

Comment by rajat.go...@gmail.com, Jul 19, 2009

This is a life saver....great job.

Comment by geoffery...@gmail.com, Jul 27, 2009

This is an excellent program. Thank you very much for your hard work.

Comment by ChengHuige@gmail.com, Nov 15, 2009

Amazing, I finally find the tool I needed:) Is it possible to let user specify the function to output? I think it would be better if user can specify one function so to see the process flow of a specific function. For example,one program has 2 steps one is compressing and the following is decompressing,so the user can choose to draw only the work flow of decompressing.

Comment by ChengHuige@gmail.com, Nov 15, 2009

Comment by jon.x.hill, May 22, 2009 It would be very nice to have an option to colour nodes according to exclusive time rather than inclusive - having the main red hotspot "main" for every profile is not always helpful

I think you can achive this by simply copy it to another one say,gprof2dot2.py and then goto line 1898,modify it as below actually changing PRUNE_RATION to TIME_RATIO, hope the author will give one option for this:)

try:
weight = functionTIME_RATIO?
except UndefinedEvent?:
weight = 0.0

label = '\n'.join(labels) self.node(function.id,
label = label, color = self.color(theme.node_bgcolor(weight)), fontcolor = self.color(theme.node_fgcolor(weight)), fontsize = "%.2f" % theme.node_fontsize(weight),
)
for call in function.calls.itervalues():
callee = profile.functions[call.callee_id]
labels = for event in TOTAL_TIME_RATIO, CALLS:
if event in call.events:
label = event.format(callevent?) labels.append(label)
try:
weight = callTIME_RATIO?
except UndefinedEvent?:
try:
weight = calleeTIME_RATIO?
except UndefinedEvent?:
weight = 0.0
Comment by valentin...@gmail.com, Nov 29, 2009

Is there any chance for valgrind callgrind support? These graphs are much nicer than some of KCacheGrind's.

Comment by adol...@gmail.com, Dec 2, 2009

All,

Has any one ever tried generating dot file / call graph using the output from Shark? I m trying to use "gprof2dot" to make call graphs using the profiling info from Shark. I can't figure that out yet.

I tried this below with no luck:

$ ./gprof2dot.py -f shark shark_report1 | dot -Tpng -o output.png

I get the errors:

#-------------------------------------

Traceback (most recent call last): File "./gprof2dot.py", line 2192, in <module> Main().main() File "./gprof2dot.py", line 2117, in main self.profile = parser.parse() File "./gprof2dot.py", line 1348, in parse raise ParseError?('failed to parse', line) main.ParseError?: failed to parse: '+ 79.0%, thread_start, libSystem.B.dylib'

#------------------------------------- Thanks, Ad.

Comment by adol...@gmail.com, Dec 5, 2009

All,

I found a solution to my question above. Tom, provided me with a patch. The line #1339, on the gprof2dot.py script should be changed to:

match = re.compile(r'(?P<prefix>[|+\- ])(?P<samples>\d+), (?P<symbol>[^,]+), (?P<image>.)')

i.e. for it to work with all Apple Shark outputs.

Also mke sure your Shark output e.g. shark_report1 above uses "samples counts" NOT "percentages".

I hope this helps,

Adolph,

Comment by adol...@gmail.com, Dec 5, 2009

Hi,

I tried using the gprof2dot script with my attached output and I got an error below.

i.e. I did this on Fedora:

$.my_exec param1 param2 $ gprof my_exec | gprof2dot.py | dot -Tpng -o output.png

I got:

Error: <stdin>:1: syntax error near line 1 context: >>> Command <<< not found.

Any ideas on what I did wrong? I am running Fedora 12 virtualized on my OS X (leopard). Any ideas will be appreciated,

Ad.

Comment by adol...@gmail.com, Dec 6, 2009

All,

When I run it as below, I do get some graph output, I wonder if this is a correct graph. I am using -pg option in my CFLAGS and LDFLAGS options.

i.e. I did:

$ gprof my_exec > gprof_out.txt $ ./gprof2dot.py -f prof gprof_out.txt | dot -Tpng -o goutput.png This gives me a viewable goutput.png graph but I am wondering if there could be a "prof vs. gprof" confusion in my commandline, that makes this illegitimate.

Thanks, Ad.

Comment by lian...@gmail.com, Apr 1, 2010

would you please update the documentation for processing sysprof results? Thanks!

Comment by stefan.s...@gmail.com, Jun 13, 2010

Hi,

thanks for your work! its very useful.

Here is a patch with an color theme for printing (black text; nodes are not filled). I thought it might be of some use.

--- a/gprof2dot.py	2010-06-13 23:52:16.370554491 +0200
+++ b/gprof2dot.py	2010-06-14 00:05:12.055554359 +0200
@@ -2135,6 +2135,8 @@
             mincolor = (0.0, 0.0, 0.0),
             maxcolor = (0.0, 0.0, 1.0),
             fontname = "Arial",
+            fontcolor = "white",
+			nodestyle = "filled",
             minfontsize = 10.0,
             maxfontsize = 10.0,
             minpenwidth = 0.5,
@@ -2145,6 +2147,8 @@
         self.mincolor = mincolor
         self.maxcolor = maxcolor
         self.fontname = fontname
+        self.fontcolor = fontcolor
+        self.nodestyle = nodestyle
         self.minfontsize = minfontsize
         self.maxfontsize = maxfontsize
         self.minpenwidth = minpenwidth
@@ -2158,6 +2162,9 @@
     def graph_fontname(self):
         return self.fontname
 
+    def graph_fontcolor(self):
+        return self.fontcolor
+
     def graph_fontsize(self):
         return self.minfontsize
 
@@ -2165,11 +2172,17 @@
         return self.color(weight)
 
     def node_fgcolor(self, weight):
-        return self.graph_bgcolor()
+        if self.nodestyle == "filled":
+            return self.graph_bgcolor()
+        else:
+            return self.color(weight)
 
     def node_fontsize(self, weight):
         return self.fontsize(weight)
 
+    def node_style(self):
+        return self.nodestyle
+
     def edge_color(self, weight):
         return self.color(weight)
 
@@ -2272,6 +2285,16 @@
     maxpenwidth = 8.0,
 )
 
+WB_COLORMAP = Theme(
+    minfontsize = 18.0,
+    maxfontsize = 30.0,
+	fontcolor = "black",
+	nodestyle = "solid",
+    mincolor = (0.0, 0.0, 0.0), # black
+    maxcolor = (0.0, 0.0, 0.0), # black
+    minpenwidth = 0.1,
+    maxpenwidth = 8.0,
+)
 
 class DotWriter:
     """Writer for the DOT language.
@@ -2288,9 +2311,11 @@
         self.begin_graph()
 
         fontname = theme.graph_fontname()
+        fontcolor = theme.graph_fontcolor()
+        nodestyle = theme.node_style()
 
         self.attr('graph', fontname=fontname, ranksep=0.25, nodesep=0.125)
-        self.attr('node', fontname=fontname, shape="box", style="filled", fontcolor="white", width=0, height=0)
+        self.attr('node', fontname=fontname, shape="box", style=nodestyle, fontcolor=fontcolor, width=0, height=0)
         self.attr('edge', fontname=fontname)
 
         for function in profile.functions.itervalues():
@@ -2434,6 +2459,7 @@
             "pink": PINK_COLORMAP,
             "gray": GRAY_COLORMAP,
             "bw": BW_COLORMAP,
+            "wb": WB_COLORMAP,
     }
 
     def main(self):
@@ -2461,9 +2487,9 @@
             help="profile format: prof, callgrind, oprofile, sysprof, shark, sleepy, aqtime, pstats, or xperf [default: %default]")
         parser.add_option(
             '-c', '--colormap',
-            type="choice", choices=('color', 'pink', 'gray', 'bw'),
+            type="choice", choices=('color', 'pink', 'gray', 'wb', 'bw'),
             dest="theme", default="color",
-            help="color map: color, pink, gray, or bw [default: %default]")
+            help="color map: color, pink, gray, wb or bw [default: %default]")
         parser.add_option(
             '-s', '--strip',
             action="store_true",
Comment by andrea.b...@gmail.com, Aug 11, 2010

I added the following line to the PstatsParser?.parse method in order to print summaries information, in particular the cpu time. Maybe maybe it could be useful to someone else...

print >> sys.stderr, "%s function calls" % (self.stats.total_calls), if self.stats.total_calls != self.stats.prim_calls:
print >> sys.stderr, "(%d primitive calls)" % self.stats.prim_calls,
print >> sys.stderr, "in %.3f CPU seconds" % self.stats.total_tt

  1. elf.stats.print_stats()
  2. elf.stats.print_callees()
etc...

It could be nice to include summaries information in the output image.

Comment by Graymond...@gmail.com, Sep 7, 2010

I'm getting the "unexpected end of file" error with "gprof myapp.exe | gprof2dot.py | dot -Tpng -o output.png".

Everything works fine except gprof2dot.py. Some help would be appreciated as there are is probably no other profiler support for gcc on Win.

Comment by Graymond...@gmail.com, Sep 7, 2010

Ok, seems like I got over it by using "gprof2dot.py gprof.txt -f prof | dot -Tpng -o output.png" where gprof.txt is output of "gprof myapp.exe".

Comment by muravjov...@gmail.com, Oct 4, 2010

Thank you very much for the tool. I tried it for Python and it is great; hope it handle oProfile output the same good.

Comment by jacky...@gmail.com, Oct 14, 2010

a wonderful tool

Comment by MikeDunl...@gmail.com, Mar 29, 2011

Nice job of programming, but this kind of gprof-style stuff gives me a stomach ache, because I use the random-pause method of locating "bottlenecks". Gprof floods you with data, of which 95+% is irrelevant. For example, I worked through that gmon.out data and simulated 10 random stack samples. After sorting them, they look like this:

main ...
main read_all_makefiles ...
main read_all_makefiles eval_makefile eval ...
main read_all_makefiles eval_makefile eval ...
main read_all_makefiles eval_makefile eval ...
main read_all_makefiles eval_makefile eval ...
main update_goal_chain ...
main update_goal_chain try_implicit_rule ...
main update_goal_chain try_implicit_rule pattern_search ...
main update_goal_chain try_implicit_rule pattern_search file_impossible_p find_directory ...

The ... is because I couldn't follow the gmon.out data far enough to simulate a full stack trace. Any routine on X% of these samples is responsible for about X% of the time. Examples:

eval_makefile 40%
eval 40%
try_implicit_rule 30%
pattern_search 20%

What's more the exact lines of code responsible for the time are also on the samples, not just functions. This a small amount of data, and it is rich in information about what code is responsible for the time. It doesn't spend a lot of clutter on stuff that doesn't matter. What's more, gprof is blind to I/O, so it is skewed to CPU-only time, not wall time.

Comment by project member Jose.R.F...@gmail.com, Oct 25, 2011

MikeDunlavey?,

This is a tool to help visualize "gprof-style stuff". I never intended to advocate nor evangelize gprof (which I actually stopped many years ago), any particular profiling tool, or even a particular profiling technique. This tool, like any tool, it is merely is a mean to an end, and is not implicitly good or bad. If this tool doesn't fit your needs then you can simply be kind to your stomach and not use it.

Yes, the results will be blind to I/O if the profiler is blind to I/O. But this is not the case with all profilers (e.g., Very Sleepy will capture I/O times when profiling a single thread). Tools like xperf and linux perf can actually put a maginifying glass on events such as I/O or context switches.

Yes, gprof and similar profilers will flood you with a lot of data. But guess what -- reducing the data and higlighting the most interesting bits is precisely the point of this tool. Actually, if you look at the output, and follow the hot red color down you'll get precisely the call path that "random-pause" most frequently would give (assuming load is CPU based or the tool captures I/O times) if there is one. If there is not single hot path, you still perceive informatiom without doing many "random-pauses" (which, if done too often, end up being nothing more than a poor man sampling profiler).

One thing which you hint at, and that I agree that could be done better here is to take in account the full stack traces. Unlike old gprof, new tools such as linux perf and Very Sleepy capture the full stack traces, however gprof2dot ends up ignoring the high order correlations, which ends up biasing (actually more like tends to averaging out) the cumulative values.

There is some challenge though, as stack traces are some times truncated, so a mixed approach will be necessary. Also, this only makes a significant difference when there are dispatch functions, i.e., single functions with many callers and many callees, that dispatch to a particular callee depending on argument input.

Comment by spruc...@gmail.com, Dec 5, 2011

hi, i tested gprof on c++ code, but in the call graph are value for % time--self--children = ZERO, do you know where is problem ? for example: index % time self children called name

9 quicksort(int, int, int) 2?
0.00 0.00 1/1 main 2248?
2? 0.0 0.00 0.00 1+9 quicksort(int, int, int) 2?
9 quicksort(int, int, int) 2?

thx. V.


Sign in to add a comment
Powered by Google Project Hosting