Update in 2012-06-11
An example to use KGTP analyze application performance bottleneck
Overview
After KGTP (http://code.google.com/p/kgtp/) 20120319, it include a script hotcode.py that can show the hottest code line in the Linux kernel or user space program through parse and record the pc address in the interrupt handler. This document is an example that use it analyze application performance bottleneck.
This example need KGTP 20120406 or later version.
Introduce of performance issue
KGTP include a script getmod.py. If a GDB that is debugging an Linux Kernel call this script, it will auto load the debug info of the Linux kernel modules that this Linux kernel is inserted. Then GDB can debug these modules better than before. This script not only can use with the GDB that connects with KGTP, but also can use with the GDB that connects with other things such as QEMU. (Please goto http://code.google.com/p/kgtp/wiki/HOWTO#How_to_use_getmod.py'>http://code.google.com/p/kgtp/wiki/HOWTO#How_to_use_getmod.py get more info about this script.)
This script have problem is the execuiton speed is not very fast. This document will introduce use KGTP analyze this issue.
Analysis process and result
#Insert KGTP
sudo insmod gtp.ko
#Open a GDB to be analyzed
sudo gdb ./vmlinux
#Open another terminal and get the pid of this GDB
ps -e | grep gdb
14632 pts/1 00:00:00 gdb
#Call hotcode.py. Just input the pid, for other query just input enter.
sudo gdb ./vmlinux -x ~/kernel/svn/trunk/add-ons/hotcode.py
Which part of code you want trace? [(b)oth], (u)ser, (k)ernel:
Please input the pid of task that you want to trace - one per line.
If not set any task, will trace all code in the Linux kernel.
task pid (use empty to stop pid input):14632
Please input the debug binary of task if you want to change it [/usr/local/bin/gdb]:
task pid (use empty to stop pid input):
Which way you want to output hotcode info when ctrl-c? [(h)tml], (t)ty:
Which file you want to save the html output? [./hotcode.html]:
Show line number (0 meas all)? [20]:
Remote debugging using /sys/kernel/debug/gtp
0x0000000000000000 in irq_stack_union ()
Do you load the symbol from LKM? (y)es, [(n)o]:
#In the GDB part connect to KGTP and call getmod.py
target remote /sys/kernel/debug/gtp
source ~/kernel/svn/trunk/getmod.py
#After getmod.py stop, go to hotcode.py and press ctrl-c.
Then you can get hotcode.html in the current directory. I post it to the http://teawater.users.sourceforge.net/hotcode/normal.html'>http://teawater.users.sourceforge.net/hotcode/normal.html.
Open this file with browser, you can see the table that show the hotcode info of getmod.py.
The number after "User space hotcode list" (37716) and "Kernel space hotcode list" (903) is the times of user space code and kernel space code got interrupt. From this number, we can get that user code spend more time.
From "Hot functions list" of "User space hotcode list", we can see that function "/usr/local/bin/gdb:strcmp_iw" (12776) and function "/usr/local/bin/gdb:lookup_partial_symbol" (9059) is hottest function. If need, click the function name can get the detailed info of this function. Check the source tree can get that both of this functions is around symbol parser. This is a bottleneck of this script.
Solution and resule
The old version of KGTP doesn't have the function that get the call trace of user space task. But
We can get the stack dump because we have get the hot function name:
#0 strcmp_iw (string1=0x7f4685d5611c "pgd_t", string2=0x3fed870 "attrs") at ../../src/gdb/utils.c:2627
#1 0x000000000063ebaa in lookup_partial_symbol (pst=<optimized out>, name=<optimized out>, global=<optimized out>, domain=VAR_DOMAIN)
at ../../src/gdb/psymtab.c:749
#2 0x00000000006400af in lookup_symbol_aux_psymtabs (objfile=<optimized out>, block_index=1, name=0x4041d80 "attrs", domain=VAR_DOMAIN)
at ../../src/gdb/psymtab.c:510
#3 0x000000000063af95 in lookup_symbol_aux_quick (objfile=0x2f28d90, kind=1, name=0x4041d80 "attrs", domain=VAR_DOMAIN)
at ../../src/gdb/symtab.c:1522
#4 0x000000000063b183 in lookup_static_symbol_aux (name=0x4041d80 "attrs", domain=VAR_DOMAIN) at ../../src/gdb/symtab.c:1336
#5 0x000000000063b53b in lookup_symbol_aux (is_a_field_of_this=0x0, language=<optimized out>, domain=VAR_DOMAIN, block=0x0,
name=0x4041d80 "attrs") at ../../src/gdb/symtab.c:1316
#6 lookup_symbol_in_language (name=<optimized out>, block=0x0, domain=VAR_DOMAIN, lang=<optimized out>, is_a_field_of_this=0x0)
at ../../src/gdb/symtab.c:1200
#7 0x00000000005e125f in classify_name (block=0x0) at ../../src/gdb/c-exp.y:2438
#8 0x00000000005e3483 in c_lex () at ../../src/gdb/c-exp.y:2563
Function classify_name will classify the type of a name. getmod.py will not parse the symbol most of time. So function lookup_symbol will return NULL most of time. So I add a hash table in function lookup_symbol. If the lookup_symbol will return NULL, add arguments of it to the hash table. In the begin of function lookup_symbol, if the arguments is same with a hash table entry, return NULL directly. You can get this temp patch in http://teawater.users.sourceforge.net/hotcode/speedup.txt'>http://teawater.users.sourceforge.net/hotcode/speedup.txt.
Before this change, I tried to call getmod.py in a Linux kernel that is inserted 67 modules, the time record is:
239.275789976
242.414674997
234.347954035
237.436943054
244.659180164
Aftet the change, I got:
110.593796968
109.023880005
107.977679014
108.905900002
108.117362976
The speed increase is about 54%.
You can get new hotcode.html from http://teawater.users.sourceforge.net/hotcode/speedup.html'>http://teawater.users.sourceforge.net/hotcode/speedup.html. strcmp_iw and lookup_partial_symbol is not hot like before.
The advantages of hotcode.py
From prev analysis process, you can see that hotcode.py can get the hotcode line info of an application (support more than one application) and don't need rebuild application with some special lib or something.
And it can show the hotcode info of both user space code and kernel space code.
Like other KGTP function, hotcode.py can dynamic start and stop trace the application and don't need stop the application. So it is friendly with online service or something like it.
And when simple GDB with hotcode.py, the time record is:
255.329950094
252.469471931
255.546139002
252.384520054
255.049283028
The speed decrease is about 6%.
一个用KGTP分析应用程序性能瓶颈的例子
概述
从KGTP (http://code.google.com/p/kgtp/'>http://code.google.com/p/kgtp/) 20120319开始,其包含了一个通过记录并分析中断处理时候的取得的PC值从而得到Linux kernel或者用户层程序的热点代码。本篇文档就是一个使用它分析程序性能瓶颈的一个例子。
这个例子需要KGTP 20120406 或者以上的版本。
性能问题描述
KGTP包含了一个脚本getmod.py。如果一个正在调试Linux kernel的GDB调用这个脚本,其将自动装载所有插入这个Linux kernel的内核模块的调试信息,这样GDB可以更好的对这些模块进行调试。这个脚本既能在连接到KGTP上的GDB上使用,也可以和连接上其他东西例如QEMU的GDB一起用。(请到 http://code.google.com/p/kgtp/wiki/HOWTO#How_to_use_getmod.py'>http://code.google.com/p/kgtp/wiki/HOWTO#How_to_use_getmod.py 去访问关于这个脚本的更多信息)
这个脚本有一个问题就是执行速度很不理想,这介绍使用KGTP分析这个问题。
分析过程和结果
#插入KGTP模块
sudo insmod gtp.ko
#打开要分析的GDB
sudo gdb ./vmlinux
#打开另一个终端并取得这个GDB的pid
ps -e | grep gdb
14632 pts/1 00:00:00 gdb
#调用hotcode.py。只需要输入PID,其他选项直接回车即可。
sudo gdb ./vmlinux -x ~/kernel/svn/trunk/add-ons/hotcode.py
Which part of code you want trace? [(b)oth], (u)ser, (k)ernel:
Please input the pid of task that you want to trace - one per line.
If not set any task, will trace all code in the Linux kernel.
task pid (use empty to stop pid input):14632
Please input the debug binary of task if you want to change it [/usr/local/bin/gdb]:
task pid (use empty to stop pid input):
Which way you want to output hotcode info when ctrl-c? [(h)tml], (t)ty:
Which file you want to save the html output? [./hotcode.html]:
Show line number (0 meas all)? [20]:
Remote debugging using /sys/kernel/debug/gtp
0x0000000000000000 in irq_stack_union ()
Do you load the symbol from LKM? (y)es, [(n)o]:
#在GDB这边连接KGTP然后调用getmod.py
target remote /sys/kernel/debug/gtp
source ~/kernel/svn/trunk/getmod.py
#getmod.py结束后,到hotcode.py那边输入ctrl-c。
于是可以在当前命令取得hotcode.html,我将其发布在http://teawater.users.sourceforge.net/hotcode/normal.html'>http://teawater.users.sourceforge.net/hotcode/normal.html。
用浏览器打开这个文件,我们可以看到getmod.py的代码热点信息。
"User space hotcode list" (37716) 和 "Kernel space hotcode list" (903) 后面的数字是用户空间代码和内核空间代码被中断的次数。从这些数字,我们可以知道执行用户空间代码花去了更多的时间。
从"User space hotcode list"的"Hot functions list",我们可以看到函数"/usr/local/bin/gdb:strcmp_iw" (12776)和函数"/usr/local/bin/gdb:lookup_partial_symbol" (9059)是最热的函数。如果需要,我们还可以点函数名来详细更详细的信息。查一下代码我们就可以知道这两个函数大部分的调用都和符号分析有关。符号分析是这个脚本执行速度的一个瓶颈。
解决方案和结果
因为我们已经有了函数名,所以我们可以取得stack dump:
#0 strcmp_iw (string1=0x7f4685d5611c "pgd_t", string2=0x3fed870 "attrs") at ../../src/gdb/utils.c:2627
#1 0x000000000063ebaa in lookup_partial_symbol (pst=<optimized out>, name=<optimized out>, global=<optimized out>, domain=VAR_DOMAIN)
at ../../src/gdb/psymtab.c:749
#2 0x00000000006400af in lookup_symbol_aux_psymtabs (objfile=<optimized out>, block_index=1, name=0x4041d80 "attrs", domain=VAR_DOMAIN)
at ../../src/gdb/psymtab.c:510
#3 0x000000000063af95 in lookup_symbol_aux_quick (objfile=0x2f28d90, kind=1, name=0x4041d80 "attrs", domain=VAR_DOMAIN)
at ../../src/gdb/symtab.c:1522
#4 0x000000000063b183 in lookup_static_symbol_aux (name=0x4041d80 "attrs", domain=VAR_DOMAIN) at ../../src/gdb/symtab.c:1336
#5 0x000000000063b53b in lookup_symbol_aux (is_a_field_of_this=0x0, language=<optimized out>, domain=VAR_DOMAIN, block=0x0,
name=0x4041d80 "attrs") at ../../src/gdb/symtab.c:1316
#6 lookup_symbol_in_language (name=<optimized out>, block=0x0, domain=VAR_DOMAIN, lang=<optimized out>, is_a_field_of_this=0x0)
at ../../src/gdb/symtab.c:1200
#7 0x00000000005e125f in classify_name (block=0x0) at ../../src/gdb/c-exp.y:2438
#8 0x00000000005e3483 in c_lex () at ../../src/gdb/c-exp.y:2563
函数classify_name将返回一个名字的类型,而大部分时候getmod.py分析的名字都不是一个符号,所以函数lookup_symbol大部分时候返回的都是NULL。所以我在函数lookup_symbol中加了一个哈希表。如果lookup_symbol返回NULL,就将函数参数加入哈希表。在这个函数开始的地方检查哈希表,如果参数和一个哈希表项相同,则直接返回NULL。可以到 http://teawater.users.sourceforge.net/hotcode/speedup.txt'>http://teawater.users.sourceforge.net/hotcode/speedup.txt 取得这个临时PATCH。
在修改之前,我在一个插入了67个模块的Linux kernel上调用getmod.py,时间记录是:
239.275789976
242.414674997
234.347954035
237.436943054
244.659180164
修改之后:
110.593796968
109.023880005
107.977679014
108.905900002
108.117362976
速度提高大约是54%。
你可以从 http://teawater.users.sourceforge.net/hotcode/speedup.html'>http://teawater.users.sourceforge.net/hotcode/speedup.html 得到新的htocode.html。可以看到 strcmp_iw 和 lookup_partial_symbol 不像之前那么热了。
hotcode.py的优势
从前面的分析我们可以知道hotcode.py可以取得一个应用程序(也支持多个应用程序)的热代码信息而不需要重新编译这个程序连接特殊库什么的。
而且hotcode.py可以同时显示用户空间代码和内核空间代码的信息。
和其他KGTP的功能一样,hotcode.py的trace是可以动态的开始和结束而且不需要停止应用程序。所以其对例如线上服务的程序会更友好。
普通GDB和hotcode.py一起使用的时间记录是:
255.329950094
252.469471931
255.546139002
252.384520054
255.049283028
速度降低大约6%。