Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

google profiler hangs in rhel5 and rhel6, but works in rhel4 #645

Closed
alk opened this issue Aug 23, 2015 · 26 comments
Closed

google profiler hangs in rhel5 and rhel6, but works in rhel4 #645

alk opened this issue Aug 23, 2015 · 26 comments

Comments

@alk
Copy link
Contributor

alk commented Aug 23, 2015

Originally reported on Google Code with ID 642

What steps will reproduce the problem?
1.Write a code which has system calls 
2.Run in any rhel5 or rhel6 platform
3.Hangs abruptly in any system call. The hanging happens more predominantly for little
large testcases.

What is the expected output? What do you see instead?
Completion of the run, but it hangs in middle ..
stack 39372
Thread 4 (Thread 0x2b5992f95700 (LWP 39373)):
#0  0x00000038e260b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000023a122e in k34ba3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x2b5993196700 (LWP 39374)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000023a1142 in a3467n3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x2b5994972700 (LWP 39379)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000024f07c1 in ?? ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b5991b6fee0 (LWP 39372)):
#0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
#3  0x0000000000d16e92 in hdlSetCheetahArgs() ()
#4  0x0000000000d227e2 in hdlParserMain ()
#5  0x0000000000d9a92a in CHECKRULES(registeredNameList*, SgWorkspace*, int) ()
#6  0x0000000000d924dc in sgCompareNDesigns(registeredNameList*) ()
#7  0x0000000000d8455c in sgGoCheckRules ()
#8  0x0000000000a95543 in runGoal() ()


What version of the product are you using? On what operating system?

google profiler version 2.2.1 compiled on gcc 4.8.1 on 2.6.18-92.el5 platform. 
Please provide any additional information below.
The hang is more consistent in 2.6.18-92.el5 platform and works smoothly without hang
in 2.6.9-5.EL platform.


Reported by rajdeep.babu on 2014-08-18 12:34:17

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

strace -cp PID shows..

strace -cp 39372
Process 39372 attached - interrupt to quit
^CProcess 39372 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    6.818870      162354        42        42 clone
  0.00    0.000048           0      1290           rt_sigprocmask
  0.00    0.000024           1        43           rt_sigreturn
  0.00    0.000010           0       215           msync
------ ----------- ----------- --------- --------- ----------------
100.00    6.818952                  1590        42 total

Reported by rajdeep.babu on 2014-08-18 12:35:38

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

In this backtraces I on only see something waiting on result of system call. And no
signs of anything stuck in gperftools.

There's still slight chance that child process forked by system is stuck somewhere.
So getting backtraces from child might help.

Also helpful might be short reproduction program that you could share. And also helpful
might be if you could test if older versions (2.0 and 2.1) are affected or not.

Reported by alkondratenko on 2014-08-18 15:55:23

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

The pstree for similar process shows the following:
pstree -p 61012
check.Linux4(61012)???{check.Linux4}(61014)
                    ??{check.Linux4}(61015)
                    ??{check.Linux4}(61049)


Now if you do pstack for 61012, it shows all the process thread...
pstack 61012
Thread 4 (Thread 0x2ba7eb993700 (LWP 61014)):
#0  0x00000038e260b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000023a122e in k34ba3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x2ba7ebb94700 (LWP 61015)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000023a1142 in a3467n3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x2ba7ed370700 (LWP 61049)):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000024f07c1 in ?? ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2ba7ea56dee0 (LWP 61012)):
#0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
#3  0x0000000000d16e92 in hdlSetCheetahArgs() ()
#4  0x0000000000d227e2 in hdlParserMain ()
#5  0x0000000000d9a92a in CHECKRULES(registeredNameList*, SgWorkspace*, int) ()
#6  0x0000000000d924dc in sgCompareNDesigns(registeredNameList*) ()
#7  0x0000000000d8455c in sgGoCheckRules ()
#8  0x0000000000a95543 in runGoal() ()
#9  0x0000000000a95eea in spyglass_main ()
#10 0x0000000000a96497 in spyglass_main_reentr ()
#11 0x0000000000a93e94 in main ()


Also the strace for the same shows:
strace -cp 61012
Process 61012 attached - interrupt to quit
^CProcess 61012 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    7.416872      200456        37        37 clone
  0.00    0.000041           0      1140           rt_sigprocmask
  0.00    0.000011           0        38           rt_sigreturn
  0.00    0.000000           0        16           msync
  0.00    0.000000           0         4           futex
------ ----------- ----------- --------- --------- ----------------
100.00    7.416924                  1235        37 total

SO some issue related to cloning. The process gets stuck all the time exactly in the
same place. The same issue is happening in 2.0 and 2.1 for sure and exactly at the
same place.
Also I am sorry that I could not attach the test program, the reason being it is in
NDA, plus not only the program, I have to send you a real design also, which is not
possible. So if you can throw some light on the basis of the stack trace, it would
be excellent.

Reported by rajdeep.babu on 2014-08-19 08:05:06

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I will need detailed backtraces from child process to help.

Reported by alkondratenko on 2014-08-19 14:54:41

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I checked all the child process.. All of them are showing exactly the same back trace.

Thread 1 (process 8127):
#0  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000023a1142 in a3467n3 ()
#2  0x00000038e26079d1 in start_thread () from /lib64/libpthread.so.0
#3  0x00000038e1ee8b6d in clone () from /lib64/libc.so.6

And strace shows:
strace -cp 8127
Process 8127 attached - interrupt to quit
^CProcess 8127 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  -nan    0.000000           0         2         2 rt_sigreturn
------ ----------- ----------- --------- --------- ----------------
100.00    0.000000                     2         2 total

Reported by rajdeep.babu on 2014-08-20 07:46:08

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Let me first note that those strace outputs are not telling me anything at all.

Backtraces do.

If that's indeed child process and not a thread (which actually seems to be the case),
then it's not something caused by gperftools.

Double check on your end that backtrace above is not thread of your main process but
a child process forked in do_system.

Reported by alkondratenko on 2014-08-20 18:19:53

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Another thing to investigate is what exactly do_system is doing. It might be useful
do post disasm of it as seen by gdb

Reported by alkondratenko on 2014-08-20 18:23:30

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

This is for sure that those 3 process are threads and not forked child. Also do you
mean I run gdb on the main thread, what output you need?

Reported by rajdeep.babu on 2014-08-21 12:44:18

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Let me elaborate on my thinking a bit.

system function is supposed to fork child and wait for it. In your backtraces there
are no obvious signs of anything being blocked, but there is active system(3) call
in one of threads. Therefore my assumption is that system might have forked child and
is waiting for it. And child might be stuck.

Do ps aux --forest to see if your process indeed has child. And if there's child attach
to that child via gdb and get me backtraces from it.

Hopefully it'll help.

Reported by alkondratenko on 2014-08-22 05:35:29

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I checked with ps aux --forest, the checker process does not have a forked child. This
are threads created by OS.
Also gdb on the process gives me something like this:

0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install expat-2.0.1-11.el6_2.x86_64 fontconfig-2.8.0-3.el6.x86_64
glib2-2.26.1-3.el6.x86_64 glibc-2.12-1.132.el6.x86_64 libICE-1.0.6-1.el6.x86_64 libSM-1.2.1-2.el6.x86_64
libX11-1.5.0-4.el6.x86_64 libXau-1.0.6-4.el6.x86_64 libXext-1.3.1-2.el6.x86_64 libXrender-0.9.7-2.el6.x86_64
libunwind-1.1-2.el6.x86_64 libuuid-2.17.2-12.14.el6.x86_64 libxcb-1.8.1-1.el6.x86_64
nss-softokn-freebl-3.14.3-9.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
(gdb) up
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) up
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
(gdb) up
#3  0x0000000000d16e86 in hdlAbcArgs() () at src/hdlAbc.c:2543
2543       veSetSpyschDirPath(SG_CLOM_CNTR->sgSpyDirName);

Reported by rajdeep.babu on 2014-08-22 06:43:23

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

In that case I can conclude that either do_system starts spinning somewhere or that
it's simply your app calls system in loop.

In order to investigate the former, please do disasm of do_system via gdb.

Reported by alkondratenko on 2014-08-22 14:56:04

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

when I do 
gdb) info line do_system
It does not show up starting and ending adddress , it just show up one address.

(gdb) info line do_system
No line number information available for address 0x38e1e3e790 <do_system>
(gdb) disasm 0x38e1e3e790
Undefined command: "disasm".  Try "help".
(gdb) disas 0x38e1e3e790
Dump of assembler code for function do_system:
   0x00000038e1e3e790 <+0>:     push   %r12
   0x00000038e1e3e792 <+2>:     xor    %eax,%eax
   0x00000038e1e3e794 <+4>:     mov    %rdi,%r12
   0x00000038e1e3e797 <+7>:     mov    $0x10,%ecx
   0x00000038e1e3e79c <+12>:    mov    $0x1,%esi
   0x00000038e1e3e7a1 <+17>:    push   %rbp
   0x00000038e1e3e7a2 <+18>:    push   %rbx
   0x00000038e1e3e7a3 <+19>:    sub    $0x170,%rsp
   0x00000038e1e3e7aa <+26>:    lea    0x8(%rsp),%rdx
   0x00000038e1e3e7af <+31>:    movq   $0x1,(%rsp)
   0x00000038e1e3e7b7 <+39>:    movl   $0x0,0x88(%rsp)
   0x00000038e1e3e7c2 <+50>:    mov    %rsp,%rbx
   0x00000038e1e3e7c5 <+53>:    mov    %rdx,%rdi
   0x00000038e1e3e7c8 <+56>:    rep stos %rax,%es:(%rdi)
   0x00000038e1e3e7cb <+59>:    cmpl   $0x0,0x354ede(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e7d2 <+66>:    je     0x38e1e3e7e4 <do_system+84>
   0x00000038e1e3e7d4 <+68>:    lock cmpxchg %esi,0x3514c0(%rip)        # 0x38e218fc9c
<lock>
   0x00000038e1e3e7dc <+76>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7e2 <+82>:    jmp    0x38e1e3e7f1 <do_system+97>
   0x00000038e1e3e7e4 <+84>:    cmpxchg %esi,0x3514b1(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e7eb <+91>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7f1 <+97>:    mov    0x3514a1(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e7f7 <+103>:   lea    0x1(%rax),%edx
   0x00000038e1e3e7fa <+106>:   test   %eax,%eax
   0x00000038e1e3e7fc <+108>:   mov    %edx,0x351496(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e802 <+114>:   jne    0x38e1e3e83c <do_system+172>
   0x00000038e1e3e804 <+116>:   lea    0x351355(%rip),%rdx        # 0x38e218fb60 <intr>
   0x00000038e1e3e80b <+123>:   mov    %rsp,%rsi
   0x00000038e1e3e80e <+126>:   mov    $0x2,%edi
   0x00000038e1e3e813 <+131>:   callq  0x38e1e32bb0 <sigaction>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e818 <+136>:   test   %eax,%eax
   0x00000038e1e3e81a <+138>:   js     0x38e1e3ea6a <do_system+730>
   0x00000038e1e3e820 <+144>:   lea    0x3513d9(%rip),%rdx        # 0x38e218fc00 <quit>
   0x00000038e1e3e827 <+151>:   mov    %rsp,%rsi
   0x00000038e1e3e82a <+154>:   mov    $0x3,%edi
   0x00000038e1e3e82f <+159>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e834 <+164>:   test   %eax,%eax
   0x00000038e1e3e836 <+166>:   js     0x38e1e3ea9f <do_system+783>
   0x00000038e1e3e83c <+172>:   cmpl   $0x0,0x354e6d(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e843 <+179>:   je     0x38e1e3e854 <do_system+196>
   0x00000038e1e3e845 <+181>:   lock decl 0x351450(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e84c <+188>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e852 <+194>:   jmp    0x38e1e3e860 <do_system+208>
   0x00000038e1e3e854 <+196>:   decl   0x351442(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e85a <+202>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e860 <+208>:   lea    0xa0(%rsp),%rbp
   0x00000038e1e3e868 <+216>:   lea    0x8(%rbx),%rsi
   0x00000038e1e3e86c <+220>:   xor    %edi,%edi
   0x00000038e1e3e86e <+222>:   orq    $0x10000,0x8(%rsp)
   0x00000038e1e3e877 <+231>:   mov    %rbp,%rdx
   0x00000038e1e3e87a <+234>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e87f <+239>:   test   %eax,%eax
   0x00000038e1e3e881 <+241>:   js     0x38e1e3eac8 <do_system+824>
   0x00000038e1e3e887 <+247>:   mov    0x354c53(%rip),%ebx        # 0x38e21934e0 <__libc_pthread_functions_init>
   0x00000038e1e3e88d <+253>:   test   %ebx,%ebx
   0x00000038e1e3e88f <+255>:   jne    0x38e1e3ea38 <do_system+680>
   0x00000038e1e3e895 <+261>:   lea    0x3c4(%rip),%rax        # 0x38e1e3ec60 <cancel_handler>
   0x00000038e1e3e89c <+268>:   mov    %rax,0x140(%rsp)
   0x00000038e1e3e8a4 <+276>:   lea    0x168(%rsp),%rax
   0x00000038e1e3e8ac <+284>:   mov    %rax,0x148(%rsp)
   0x00000038e1e3e8b4 <+292>:   lea    0x168(%rsp),%rdx
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e8bc <+300>:   xor    %esi,%esi
   0x00000038e1e3e8be <+302>:   mov    $0x100011,%edi
   0x00000038e1e3e8c3 <+307>:   mov    $0x38,%eax
   0x00000038e1e3e8c8 <+312>:   syscall
=> 0x00000038e1e3e8ca <+314>:   cmp    $0xfffffffffffff000,%rax
   0x00000038e1e3e8d0 <+320>:   ja     0x38e1e3ebcc <do_system+1084>
   0x00000038e1e3e8d6 <+326>:   cmp    $0x0,%eax
   0x00000038e1e3e8d9 <+329>:   mov    %eax,%edi
   0x00000038e1e3e8db <+331>:   mov    %eax,0x168(%rsp)
   0x00000038e1e3e8e2 <+338>:   je     0x38e1e3eb2b <do_system+923>
   0x00000038e1e3e8e8 <+344>:   lea    0x16c(%rsp),%r12
   0x00000038e1e3e8f0 <+352>:   jge    0x38e1e3e8ff <do_system+367>
   0x00000038e1e3e8f2 <+354>:   jmp    0x38e1e3e92b <do_system+411>
   0x00000038e1e3e8f4 <+356>:   nopl   0x0(%rax)
   0x00000038e1e3e8f8 <+360>:   mov    0x168(%rsp),%edi
   0x00000038e1e3e8ff <+367>:   xor    %edx,%edx
   0x00000038e1e3e901 <+369>:   mov    %r12,%rsi
   0x00000038e1e3e904 <+372>:   callq  0x38e1eac8a0 <waitpid>
   0x00000038e1e3e909 <+377>:   cltq
   0x00000038e1e3e90b <+379>:   cmp    $0xffffffffffffffff,%rax
   0x00000038e1e3e90f <+383>:   jne    0x38e1e3e91e <do_system+398>
   0x00000038e1e3e911 <+385>:   mov    0x34f690(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3e918 <+392>:   cmpl   $0x4,%fs:(%rdx)
   0x00000038e1e3e91c <+396>:   je     0x38e1e3e8f8 <do_system+360>
   0x00000038e1e3e91e <+398>:   movslq 0x168(%rsp),%rdx
   0x00000038e1e3e926 <+406>:   cmp    %rdx,%rax
   0x00000038e1e3e929 <+409>:   je     0x38e1e3e936 <do_system+422>
   0x00000038e1e3e92b <+411>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e936 <+422>:   test   %ebx,%ebx
   0x00000038e1e3e938 <+424>:   jne    0x38e1e3ea10 <do_system+640>
   0x00000038e1e3e93e <+430>:   mov    $0x1,%esi
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e943 <+435>:   xor    %eax,%eax
   0x00000038e1e3e945 <+437>:   cmpl   $0x0,0x354d64(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e94c <+444>:   je     0x38e1e3e95e <do_system+462>
   0x00000038e1e3e94e <+446>:   lock cmpxchg %esi,0x351346(%rip)        # 0x38e218fc9c
<lock>
   0x00000038e1e3e956 <+454>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e95c <+460>:   jmp    0x38e1e3e96b <do_system+475>
   0x00000038e1e3e95e <+462>:   cmpxchg %esi,0x351337(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e965 <+469>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e96b <+475>:   mov    0x351327(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e971 <+481>:   sub    $0x1,%eax
   0x00000038e1e3e974 <+484>:   test   %eax,%eax
   0x00000038e1e3e976 <+486>:   mov    %eax,0x35131c(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e97c <+492>:   jne    0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e97e <+494>:   lea    0x3511db(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3e985 <+501>:   xor    %edx,%edx
   0x00000038e1e3e987 <+503>:   mov    $0x2,%edi
   0x00000038e1e3e98c <+508>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e991 <+513>:   lea    0x351268(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3e998 <+520>:   xor    %edx,%edx
   0x00000038e1e3e99a <+522>:   mov    $0x3,%edi
   0x00000038e1e3e99f <+527>:   mov    %eax,%ebx
   0x00000038e1e3e9a1 <+529>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e9a6 <+534>:   test   %eax,%eax
   0x00000038e1e3e9a8 <+536>:   jne    0x38e1e3e9ae <do_system+542>
   0x00000038e1e3e9aa <+538>:   test   %ebx,%ebx
   0x00000038e1e3e9ac <+540>:   je     0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e9ae <+542>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e9b9 <+553>:   cmpl   $0x0,0x354cf0(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e9c0 <+560>:   je     0x38e1e3e9d1 <do_system+577>
   0x00000038e1e3e9c2 <+562>:   lock decl 0x3512d3(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9c9 <+569>:   jne    0x38e1e3ed7e <_L_unlock_213>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e9cf <+575>:   jmp    0x38e1e3e9dd <do_system+589>
   0x00000038e1e3e9d1 <+577>:   decl   0x3512c5(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9d7 <+583>:   jne    0x38e1e3ed7e <_L_unlock_213>
   0x00000038e1e3e9dd <+589>:   mov    0x16c(%rsp),%eax
   0x00000038e1e3e9e4 <+596>:   add    $0x170,%rsp
   0x00000038e1e3e9eb <+603>:   pop    %rbx
   0x00000038e1e3e9ec <+604>:   pop    %rbp
   0x00000038e1e3e9ed <+605>:   pop    %r12
   0x00000038e1e3e9ef <+607>:   retq
   0x00000038e1e3e9f0 <+608>:   xor    %edx,%edx
   0x00000038e1e3e9f2 <+610>:   mov    %rbp,%rsi
   0x00000038e1e3e9f5 <+613>:   mov    $0x2,%edi
   0x00000038e1e3e9fa <+618>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e9ff <+623>:   test   %eax,%eax
   0x00000038e1e3ea01 <+625>:   je     0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea03 <+627>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3ea0e <+638>:   jmp    0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea10 <+640>:   mov    0x354c61(%rip),%rax        # 0x38e2193678 <__libc_pthread_functions+376>
   0x00000038e1e3ea17 <+647>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea1f <+655>:   xor    %esi,%esi
   0x00000038e1e3ea21 <+657>:   ror    $0x11,%rax
   0x00000038e1e3ea25 <+661>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea2e <+670>:   callq  *%rax
   0x00000038e1e3ea30 <+672>:   jmpq   0x38e1e3e93e <do_system+430>
   0x00000038e1e3ea35 <+677>:   nopl   (%rax)
   0x00000038e1e3ea38 <+680>:   mov    0x354c31(%rip),%rax        # 0x38e2193670 <__libc_pthread_functions+368>
   0x00000038e1e3ea3f <+687>:   lea    0x168(%rsp),%rdx
   0x00000038e1e3ea47 <+695>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea4f <+703>:   ror    $0x11,%rax
   0x00000038e1e3ea53 <+707>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea5c <+716>:   lea    0x1fd(%rip),%rsi        # 0x38e1e3ec60 <cancel_handler>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3ea63 <+723>:   callq  *%rax
   0x00000038e1e3ea65 <+725>:   jmpq   0x38e1e3e8b4 <do_system+292>
   0x00000038e1e3ea6a <+730>:   subl   $0x1,0x351227(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ea71 <+737>:   cmpl   $0x0,0x354c38(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ea78 <+744>:   je     0x38e1e3ea89 <do_system+761>
   0x00000038e1e3ea7a <+746>:   lock decl 0x35121b(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea81 <+753>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea87 <+759>:   jmp    0x38e1e3ea95 <do_system+773>
   0x00000038e1e3ea89 <+761>:   decl   0x35120d(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea8f <+767>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea95 <+773>:   mov    $0xffffffff,%eax
   0x00000038e1e3ea9a <+778>:   jmpq   0x38e1e3e9e4 <do_system+596>
   0x00000038e1e3ea9f <+783>:   mov    0x34f502(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eaa6 <+790>:   subl   $0x1,0x3511eb(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eaad <+797>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eab0 <+800>:   lea    0x3510a9(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eab7 <+807>:   xor    %edx,%edx
   0x00000038e1e3eab9 <+809>:   mov    $0x2,%edi
   0x00000038e1e3eabe <+814>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eac3 <+819>:   mov    %ebp,%fs:(%rbx)
   0x00000038e1e3eac6 <+822>:   jmp    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eac8 <+824>:   mov    $0x1,%esi
   0x00000038e1e3eacd <+829>:   xor    %eax,%eax
   0x00000038e1e3eacf <+831>:   cmpl   $0x0,0x354bda(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ead6 <+838>:   je     0x38e1e3eae8 <do_system+856>
   0x00000038e1e3ead8 <+840>:   lock cmpxchg %esi,0x3511bc(%rip)        # 0x38e218fc9c
<lock>
   0x00000038e1e3eae0 <+848>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eae6 <+854>:   jmp    0x38e1e3eaf5 <do_system+869>
   0x00000038e1e3eae8 <+856>:   cmpxchg %esi,0x3511ad(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3eaef <+863>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eaf5 <+869>:   mov    0x35119d(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3eafb <+875>:   sub    $0x1,%eax
   0x00000038e1e3eafe <+878>:   test   %eax,%eax
   0x00000038e1e3eb00 <+880>:   mov    %eax,0x351192(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eb06 <+886>:   jne    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eb0c <+892>:   mov    0x34f495(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eb13 <+899>:   lea    0x3510e6(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb1a <+906>:   xor    %edx,%edx
   0x00000038e1e3eb1c <+908>:   mov    $0x3,%edi
   0x00000038e1e3eb21 <+913>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eb24 <+916>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb29 <+921>:   jmp    0x38e1e3eab0 <do_system+800>
   0x00000038e1e3eb2b <+923>:   lea    0x117256(%rip),%rax        # 0x38e1f55d88
   0x00000038e1e3eb32 <+930>:   lea    0x351027(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eb39 <+937>:   xor    %edx,%edx
   0x00000038e1e3eb3b <+939>:   mov    $0x2,%dil
   0x00000038e1e3eb3e <+942>:   mov    %r12,0x130(%rsp)
   0x00000038e1e3eb46 <+950>:   movq   $0x0,0x138(%rsp)
   0x00000038e1e3eb52 <+962>:   mov    %rax,0x120(%rsp)
   0x00000038e1e3eb5a <+970>:   lea    0x11721f(%rip),%rax        # 0x38e1f55d80
   0x00000038e1e3eb61 <+977>:   mov    %rax,0x128(%rsp)
   0x00000038e1e3eb69 <+985>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb6e <+990>:   lea    0x35108b(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb75 <+997>:   xor    %edx,%edx
   0x00000038e1e3eb77 <+999>:   mov    $0x3,%edi
   0x00000038e1e3eb7c <+1004>:  callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb81 <+1009>:  xor    %edx,%edx
   0x00000038e1e3eb83 <+1011>:  mov    %rbp,%rsi
   0x00000038e1e3eb86 <+1014>:  mov    $0x2,%edi
   0x00000038e1e3eb8b <+1019>:  callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3eb90 <+1024>:  mov    0x34f3b1(%rip),%rax        # 0x38e218df48
   0x00000038e1e3eb97 <+1031>:  lea    0x1171e5(%rip),%rdi        # 0x38e1f55d83
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3eb9e <+1038>:  lea    0x120(%rsp),%rsi
   0x00000038e1e3eba6 <+1046>:  movl   $0x0,0x3510ec(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ebb0 <+1056>:  movl   $0x0,0x3510de(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ebba <+1066>:  mov    (%rax),%rdx
   0x00000038e1e3ebbd <+1069>:  callq  0x38e1ead040 <execve>
   0x00000038e1e3ebc2 <+1074>:  mov    $0x7f,%edi
   0x00000038e1e3ebc7 <+1079>:  callq  0x38e1eacff0 <_exit>
   0x00000038e1e3ebcc <+1084>:  mov    0x34f3d5(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3ebd3 <+1091>:  neg    %eax
   0x00000038e1e3ebd5 <+1093>:  movl   $0xffffffff,0x168(%rsp)
   0x00000038e1e3ebe0 <+1104>:  mov    %eax,%fs:(%rdx)
   0x00000038e1e3ebe3 <+1107>:  jmpq   0x38e1e3e92b <do_system+411>
End of assembler dump.

interestingly, the hang state only comes when I set the environment variable CPUPROFILE
to a text output file. If I do not set this varaibale the runs completes without any
error.

Reported by rajdeep.babu on 2014-08-25 09:38:08

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I think my above assembly dump is not correct, because the memory address seems not
correct. SO I am just giving you the diasm of the present state

gdb> disassemble do_system
Dump of assembler code for function do_system:
   0x00000038e1e3e790 <+0>:     push   %r12
   0x00000038e1e3e792 <+2>:     xor    %eax,%eax
   0x00000038e1e3e794 <+4>:     mov    %rdi,%r12
   0x00000038e1e3e797 <+7>:     mov    $0x10,%ecx
   0x00000038e1e3e79c <+12>:    mov    $0x1,%esi
   0x00000038e1e3e7a1 <+17>:    push   %rbp
   0x00000038e1e3e7a2 <+18>:    push   %rbx
   0x00000038e1e3e7a3 <+19>:    sub    $0x170,%rsp
   0x00000038e1e3e7aa <+26>:    lea    0x8(%rsp),%rdx
   0x00000038e1e3e7af <+31>:    movq   $0x1,(%rsp)
   0x00000038e1e3e7b7 <+39>:    movl   $0x0,0x88(%rsp)
   0x00000038e1e3e7c2 <+50>:    mov    %rsp,%rbx
   0x00000038e1e3e7c5 <+53>:    mov    %rdx,%rdi
   0x00000038e1e3e7c8 <+56>:    rep stos %rax,%es:(%rdi)
   0x00000038e1e3e7cb <+59>:    cmpl   $0x0,0x354ede(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e7d2 <+66>:    je     0x38e1e3e7e4 <do_system+84>
   0x00000038e1e3e7d4 <+68>:    lock cmpxchg %esi,0x3514c0(%rip)        # 0x38e218fc9c
<lock>
   0x00000038e1e3e7dc <+76>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7e2 <+82>:    jmp    0x38e1e3e7f1 <do_system+97>
   0x00000038e1e3e7e4 <+84>:    cmpxchg %esi,0x3514b1(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e7eb <+91>:    jne    0x38e1e3ed21 <_L_lock_24>
   0x00000038e1e3e7f1 <+97>:    mov    0x3514a1(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e7f7 <+103>:   lea    0x1(%rax),%edx
   0x00000038e1e3e7fa <+106>:   test   %eax,%eax
   0x00000038e1e3e7fc <+108>:   mov    %edx,0x351496(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e802 <+114>:   jne    0x38e1e3e83c <do_system+172>
   0x00000038e1e3e804 <+116>:   lea    0x351355(%rip),%rdx        # 0x38e218fb60 <intr>
   0x00000038e1e3e80b <+123>:   mov    %rsp,%rsi
   0x00000038e1e3e80e <+126>:   mov    $0x2,%edi
   0x00000038e1e3e813 <+131>:   callq  0x38e1e32bb0 <sigaction>
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e818 <+136>:   test   %eax,%eax
   0x00000038e1e3e81a <+138>:   js     0x38e1e3ea6a <do_system+730>
   0x00000038e1e3e820 <+144>:   lea    0x3513d9(%rip),%rdx        # 0x38e218fc00 <quit>
   0x00000038e1e3e827 <+151>:   mov    %rsp,%rsi
   0x00000038e1e3e82a <+154>:   mov    $0x3,%edi
   0x00000038e1e3e82f <+159>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e834 <+164>:   test   %eax,%eax
   0x00000038e1e3e836 <+166>:   js     0x38e1e3ea9f <do_system+783>
   0x00000038e1e3e83c <+172>:   cmpl   $0x0,0x354e6d(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e843 <+179>:   je     0x38e1e3e854 <do_system+196>
   0x00000038e1e3e845 <+181>:   lock decl 0x351450(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e84c <+188>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e852 <+194>:   jmp    0x38e1e3e860 <do_system+208>
   0x00000038e1e3e854 <+196>:   decl   0x351442(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e85a <+202>:   jne    0x38e1e3ed40 <_L_unlock_49>
   0x00000038e1e3e860 <+208>:   lea    0xa0(%rsp),%rbp
   0x00000038e1e3e868 <+216>:   lea    0x8(%rbx),%rsi
   0x00000038e1e3e86c <+220>:   xor    %edi,%edi
   0x00000038e1e3e86e <+222>:   orq    $0x10000,0x8(%rsp)
   0x00000038e1e3e877 <+231>:   mov    %rbp,%rdx
   0x00000038e1e3e87a <+234>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e87f <+239>:   test   %eax,%eax
   0x00000038e1e3e881 <+241>:   js     0x38e1e3eac8 <do_system+824>
   0x00000038e1e3e887 <+247>:   mov    0x354c53(%rip),%ebx        # 0x38e21934e0 <__libc_pthread_functions_init>
   0x00000038e1e3e88d <+253>:   test   %ebx,%ebx
   0x00000038e1e3e88f <+255>:   jne    0x38e1e3ea38 <do_system+680>
   0x00000038e1e3e895 <+261>:   lea    0x3c4(%rip),%rax        # 0x38e1e3ec60 <cancel_handler>
   0x00000038e1e3e89c <+268>:   mov    %rax,0x140(%rsp)
   0x00000038e1e3e8a4 <+276>:   lea    0x168(%rsp),%rax
   0x00000038e1e3e8ac <+284>:   mov    %rax,0x148(%rsp)
   0x00000038e1e3e8b4 <+292>:   lea    0x168(%rsp),%rdx
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e8bc <+300>:   xor    %esi,%esi
   0x00000038e1e3e8be <+302>:   mov    $0x100011,%edi
   0x00000038e1e3e8c3 <+307>:   mov    $0x38,%eax
   0x00000038e1e3e8c8 <+312>:   syscall
=> 0x00000038e1e3e8ca <+314>:   cmp    $0xfffffffffffff000,%rax
   0x00000038e1e3e8d0 <+320>:   ja     0x38e1e3ebcc <do_system+1084>
   0x00000038e1e3e8d6 <+326>:   cmp    $0x0,%eax
   0x00000038e1e3e8d9 <+329>:   mov    %eax,%edi
   0x00000038e1e3e8db <+331>:   mov    %eax,0x168(%rsp)
   0x00000038e1e3e8e2 <+338>:   je     0x38e1e3eb2b <do_system+923>
   0x00000038e1e3e8e8 <+344>:   lea    0x16c(%rsp),%r12
   0x00000038e1e3e8f0 <+352>:   jge    0x38e1e3e8ff <do_system+367>
   0x00000038e1e3e8f2 <+354>:   jmp    0x38e1e3e92b <do_system+411>
   0x00000038e1e3e8f4 <+356>:   nopl   0x0(%rax)
   0x00000038e1e3e8f8 <+360>:   mov    0x168(%rsp),%edi
   0x00000038e1e3e8ff <+367>:   xor    %edx,%edx
   0x00000038e1e3e901 <+369>:   mov    %r12,%rsi
   0x00000038e1e3e904 <+372>:   callq  0x38e1eac8a0 <waitpid>
   0x00000038e1e3e909 <+377>:   cltq
   0x00000038e1e3e90b <+379>:   cmp    $0xffffffffffffffff,%rax
   0x00000038e1e3e90f <+383>:   jne    0x38e1e3e91e <do_system+398>
   0x00000038e1e3e911 <+385>:   mov    0x34f690(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3e918 <+392>:   cmpl   $0x4,%fs:(%rdx)
   0x00000038e1e3e91c <+396>:   je     0x38e1e3e8f8 <do_system+360>
   0x00000038e1e3e91e <+398>:   movslq 0x168(%rsp),%rdx
   0x00000038e1e3e926 <+406>:   cmp    %rdx,%rax
   0x00000038e1e3e929 <+409>:   je     0x38e1e3e936 <do_system+422>
   0x00000038e1e3e92b <+411>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e936 <+422>:   test   %ebx,%ebx
   0x00000038e1e3e938 <+424>:   jne    0x38e1e3ea10 <do_system+640>
   0x00000038e1e3e93e <+430>:   mov    $0x1,%esi
---Type <return> to continue, or q <return> to quit---
   0x00000038e1e3e943 <+435>:   xor    %eax,%eax
   0x00000038e1e3e945 <+437>:   cmpl   $0x0,0x354d64(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e94c <+444>:   je     0x38e1e3e95e <do_system+462>
   0x00000038e1e3e94e <+446>:   lock cmpxchg %esi,0x351346(%rip)        # 0x38e218fc9c
<lock>
   0x00000038e1e3e956 <+454>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e95c <+460>:   jmp    0x38e1e3e96b <do_system+475>
   0x00000038e1e3e95e <+462>:   cmpxchg %esi,0x351337(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e965 <+469>:   jne    0x38e1e3ed5f <_L_lock_179>
   0x00000038e1e3e96b <+475>:   mov    0x351327(%rip),%eax        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e971 <+481>:   sub    $0x1,%eax
   0x00000038e1e3e974 <+484>:   test   %eax,%eax
   0x00000038e1e3e976 <+486>:   mov    %eax,0x35131c(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3e97c <+492>:   jne    0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e97e <+494>:   lea    0x3511db(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3e985 <+501>:   xor    %edx,%edx
   0x00000038e1e3e987 <+503>:   mov    $0x2,%edi
   0x00000038e1e3e98c <+508>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e991 <+513>:   lea    0x351268(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3e998 <+520>:   xor    %edx,%edx
   0x00000038e1e3e99a <+522>:   mov    $0x3,%edi
   0x00000038e1e3e99f <+527>:   mov    %eax,%ebx
   0x00000038e1e3e9a1 <+529>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3e9a6 <+534>:   test   %eax,%eax
   0x00000038e1e3e9a8 <+536>:   jne    0x38e1e3e9ae <do_system+542>
   0x00000038e1e3e9aa <+538>:   test   %ebx,%ebx
   0x00000038e1e3e9ac <+540>:   je     0x38e1e3e9f0 <do_system+608>
   0x00000038e1e3e9ae <+542>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3e9b9 <+553>:   cmpl   $0x0,0x354cf0(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3e9c0 <+560>:   je     0x38e1e3e9d1 <do_system+577>
   0x00000038e1e3e9c2 <+562>:   lock decl 0x3512d3(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9c9 <+569>:   jne    0x38e1e3ed7e <_L_unlock_213>
   0x00000038e1e3e9cf <+575>:   jmp    0x38e1e3e9dd <do_system+589>
   0x00000038e1e3e9d1 <+577>:   decl   0x3512c5(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3e9d7 <+583>:   jne    0x38e1e3ed7e <_L_unlock_213>
   0x00000038e1e3e9dd <+589>:   mov    0x16c(%rsp),%eax
   0x00000038e1e3e9e4 <+596>:   add    $0x170,%rsp
   0x00000038e1e3e9eb <+603>:   pop    %rbx
   0x00000038e1e3e9ec <+604>:   pop    %rbp
   0x00000038e1e3e9ed <+605>:   pop    %r12
   0x00000038e1e3e9ef <+607>:   retq
   0x00000038e1e3e9f0 <+608>:   xor    %edx,%edx
   0x00000038e1e3e9f2 <+610>:   mov    %rbp,%rsi
   0x00000038e1e3e9f5 <+613>:   mov    $0x2,%edi
   0x00000038e1e3e9fa <+618>:   callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3e9ff <+623>:   test   %eax,%eax
   0x00000038e1e3ea01 <+625>:   je     0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea03 <+627>:   movl   $0xffffffff,0x16c(%rsp)
   0x00000038e1e3ea0e <+638>:   jmp    0x38e1e3e9b9 <do_system+553>
   0x00000038e1e3ea10 <+640>:   mov    0x354c61(%rip),%rax        # 0x38e2193678 <__libc_pthread_functions+376>
   0x00000038e1e3ea17 <+647>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea1f <+655>:   xor    %esi,%esi
   0x00000038e1e3ea21 <+657>:   ror    $0x11,%rax
   0x00000038e1e3ea25 <+661>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea2e <+670>:   callq  *%rax
   0x00000038e1e3ea30 <+672>:   jmpq   0x38e1e3e93e <do_system+430>
   0x00000038e1e3ea35 <+677>:   nopl   (%rax)
   0x00000038e1e3ea38 <+680>:   mov    0x354c31(%rip),%rax        # 0x38e2193670 <__libc_pthread_functions+368>
   0x00000038e1e3ea3f <+687>:   lea    0x168(%rsp),%rdx
   0x00000038e1e3ea47 <+695>:   lea    0x140(%rsp),%rdi
   0x00000038e1e3ea4f <+703>:   ror    $0x11,%rax
   0x00000038e1e3ea53 <+707>:   xor    %fs:0x30,%rax
   0x00000038e1e3ea5c <+716>:   lea    0x1fd(%rip),%rsi        # 0x38e1e3ec60 <cancel_handler>
   0x00000038e1e3ea63 <+723>:   callq  *%rax
   0x00000038e1e3ea65 <+725>:   jmpq   0x38e1e3e8b4 <do_system+292>
   0x00000038e1e3ea6a <+730>:   subl   $0x1,0x351227(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ea71 <+737>:   cmpl   $0x0,0x354c38(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ea78 <+744>:   je     0x38e1e3ea89 <do_system+761>
   0x00000038e1e3ea7a <+746>:   lock decl 0x35121b(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea81 <+753>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea87 <+759>:   jmp    0x38e1e3ea95 <do_system+773>
   0x00000038e1e3ea89 <+761>:   decl   0x35120d(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ea8f <+767>:   jne    0x38e1e3ed9d <_L_unlock_320>
   0x00000038e1e3ea95 <+773>:   mov    $0xffffffff,%eax
   0x00000038e1e3ea9a <+778>:   jmpq   0x38e1e3e9e4 <do_system+596>
   0x00000038e1e3ea9f <+783>:   mov    0x34f502(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eaa6 <+790>:   subl   $0x1,0x3511eb(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eaad <+797>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eab0 <+800>:   lea    0x3510a9(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eab7 <+807>:   xor    %edx,%edx
   0x00000038e1e3eab9 <+809>:   mov    $0x2,%edi
   0x00000038e1e3eabe <+814>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eac3 <+819>:   mov    %ebp,%fs:(%rbx)
   0x00000038e1e3eac6 <+822>:   jmp    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eac8 <+824>:   mov    $0x1,%esi
   0x00000038e1e3eacd <+829>:   xor    %eax,%eax
   0x00000038e1e3eacf <+831>:   cmpl   $0x0,0x354bda(%rip)        # 0x38e21936b0 <__libc_multiple_threads>
   0x00000038e1e3ead6 <+838>:   je     0x38e1e3eae8 <do_system+856>
   0x00000038e1e3ead8 <+840>:   lock cmpxchg %esi,0x3511bc(%rip)        # 0x38e218fc9c
<lock>
   0x00000038e1e3eae0 <+848>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eae6 <+854>:   jmp    0x38e1e3eaf5 <do_system+869>
   0x00000038e1e3eae8 <+856>:   cmpxchg %esi,0x3511ad(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3eaef <+863>:   jne    0x38e1e3edbc <_L_lock_354>
   0x00000038e1e3eaf5 <+869>:   mov    0x35119d(%rip),%eax        # 0x38e218fc98 <sa_refcntr>

   0x00000038e1e3eafb <+875>:   sub    $0x1,%eax
   0x00000038e1e3eafe <+878>:   test   %eax,%eax
   0x00000038e1e3eb00 <+880>:   mov    %eax,0x351192(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3eb06 <+886>:   jne    0x38e1e3ea71 <do_system+737>
   0x00000038e1e3eb0c <+892>:   mov    0x34f495(%rip),%rbx        # 0x38e218dfa8
   0x00000038e1e3eb13 <+899>:   lea    0x3510e6(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb1a <+906>:   xor    %edx,%edx
   0x00000038e1e3eb1c <+908>:   mov    $0x3,%edi
   0x00000038e1e3eb21 <+913>:   mov    %fs:(%rbx),%ebp
   0x00000038e1e3eb24 <+916>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb29 <+921>:   jmp    0x38e1e3eab0 <do_system+800>
   0x00000038e1e3eb2b <+923>:   lea    0x117256(%rip),%rax        # 0x38e1f55d88
   0x00000038e1e3eb32 <+930>:   lea    0x351027(%rip),%rsi        # 0x38e218fb60 <intr>
   0x00000038e1e3eb39 <+937>:   xor    %edx,%edx
   0x00000038e1e3eb3b <+939>:   mov    $0x2,%dil
   0x00000038e1e3eb3e <+942>:   mov    %r12,0x130(%rsp)
   0x00000038e1e3eb46 <+950>:   movq   $0x0,0x138(%rsp)
   0x00000038e1e3eb52 <+962>:   mov    %rax,0x120(%rsp)
   0x00000038e1e3eb5a <+970>:   lea    0x11721f(%rip),%rax        # 0x38e1f55d80
   0x00000038e1e3eb61 <+977>:   mov    %rax,0x128(%rsp)
   0x00000038e1e3eb69 <+985>:   callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb6e <+990>:   lea    0x35108b(%rip),%rsi        # 0x38e218fc00 <quit>
   0x00000038e1e3eb75 <+997>:   xor    %edx,%edx
   0x00000038e1e3eb77 <+999>:   mov    $0x3,%edi
   0x00000038e1e3eb7c <+1004>:  callq  0x38e1e32bb0 <sigaction>
   0x00000038e1e3eb81 <+1009>:  xor    %edx,%edx
   0x00000038e1e3eb83 <+1011>:  mov    %rbp,%rsi
   0x00000038e1e3eb86 <+1014>:  mov    $0x2,%edi
   0x00000038e1e3eb8b <+1019>:  callq  0x38e1e32bd0 <sigprocmask>
   0x00000038e1e3eb90 <+1024>:  mov    0x34f3b1(%rip),%rax        # 0x38e218df48
   0x00000038e1e3eb97 <+1031>:  lea    0x1171e5(%rip),%rdi        # 0x38e1f55d83
   0x00000038e1e3eb9e <+1038>:  lea    0x120(%rsp),%rsi
   0x00000038e1e3eba6 <+1046>:  movl   $0x0,0x3510ec(%rip)        # 0x38e218fc9c <lock>
   0x00000038e1e3ebb0 <+1056>:  movl   $0x0,0x3510de(%rip)        # 0x38e218fc98 <sa_refcntr>
   0x00000038e1e3ebba <+1066>:  mov    (%rax),%rdx
   0x00000038e1e3ebbd <+1069>:  callq  0x38e1ead040 <execve>
   0x00000038e1e3ebc2 <+1074>:  mov    $0x7f,%edi
   0x00000038e1e3ebc7 <+1079>:  callq  0x38e1eacff0 <_exit>
   0x00000038e1e3ebcc <+1084>:  mov    0x34f3d5(%rip),%rdx        # 0x38e218dfa8
   0x00000038e1e3ebd3 <+1091>:  neg    %eax
   0x00000038e1e3ebd5 <+1093>:  movl   $0xffffffff,0x168(%rsp)
   0x00000038e1e3ebe0 <+1104>:  mov    %eax,%fs:(%rdx)
   0x00000038e1e3ebe3 <+1107>:  jmpq   0x38e1e3e92b <do_system+411>
End of assembler dump.

Reported by rajdeep.babu on 2014-08-25 10:45:18

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

So system is not blocked anywhere but indeed calls clone. You can see above that rip
is next instruction after syscall instructions. And syscall number is "clone".

Here is my recommendation. Try sending "finish" command to gdb after switching to main
thread that does system. I expect it to return. It would be interesting to know what
it returns.

My guess is that it returns failure and your program might be retrying it.

Reported by alkondratenko on 2014-08-25 16:46:54

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I have tried to finish it from one or two steps above but no results. Not sure if I
am doing the right way.
gdb) up
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) up
#2  0x000000000213a666 in VeMacroUsageDump::setDumpDirPath(char const*) ()
(gdb) down
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) down
#0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6
(gdb) up
#1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6
(gdb) finish
Run till exit from #1  0x00000038e1e3ec40 in system () from /lib64/libc.so.6



Not sure if I am in the main thread or not. Appreciate if you correct me.

Thanks
-Rajdeep

Reported by rajdeep.babu on 2014-08-26 04:29:02

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

There are 3 threads and I tried to finish all of them but everytime it switches to thread
1 and hangs

(gdb) info threads
  3 Thread 0x2b5bd614a700 (LWP 36400)  0x00000038e260b5bc in pthread_cond_wait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
  2 Thread 0x2b5bd634b700 (LWP 36401)  0x00000038e260b98e in pthread_cond_timedwait@@GLIBC_2.3.2
() from /lib64/libpthread.so.0
* 1 Thread 0x2b5bd4d24ee0 (LWP 36399)  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6

finish
Run till exit from #0  0x00000038e1e3e8ca in do_system () from /lib64/libc.so.6

It hangs


Reported by rajdeep.babu on 2014-08-26 04:37:03

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

One interesting thing I noticed.
over the code base we have used system() call in numerous places, but it only hangs
with profiler when we use "\rm -rf " with system(). All other system call are not creating
this hang issue.

Reported by rajdeep.babu on 2014-08-27 10:08:28

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Another query..
While runninf pprof, how do I specify my .so to load.

I have a binary a.out, which needs to load .so which gets created in the /tmp directory
while ruuning a.out
if I just do 
pprof -gv a.out gprof.txt

Then it does not load the .so in /tmp as a result there are many unresolved symbols.
I checked an option called --add-libs, but how can add multiple .so 

Reported by rajdeep.babu on 2014-08-28 10:29:14

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Do you have any updates on this?

Thanks

Reported by rajdeep.babu on 2014-09-01 12:52:01

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Most likely we're missing something. Here's what we can do to find out what is really
going on.

* reproduce this again.

* attach gdb and switch to thread that is apparently stuck in system

* do disas to see where it's stuck

* do fin to check again if call to system completes

* if it's stuck stop it again (Ctrl-C)

* switch to do_system thread again and do disas to check where it is stuck

* do fin again

* check by running ps aux --forest if there's child process (there should be one).

Post results of _all_ steps.

Reported by alkondratenko on 2014-09-06 23:03:27

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

In last week I commented all the systems calls with \rm -rf command in it, and now there
is no hang. For reproducing I need to create another build. Will get back once I have
it.

Reported by rajdeep.babu on 2014-09-08 08:04:16

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I have my own profiler and happen to have same issue. My OS is RHEL6. Eventually I found
When I stop sampling by SIGPROF, then the hang will disappear. And as you mentioned,
my hang occur when I call system to "mv" a file.

It's more like an OS bug.

Reported by limijd on 2014-11-05 08:41:23

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

>> It's more like an OS bug.

This is certainly not excluded yet. But so far I saw no evidence at all that it's OS
bug.

Reported by alkondratenko on 2014-11-05 20:09:16

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Closing due to inactivity. Please reopen once requested information is available.

Reported by alkondratenko on 2014-11-29 07:20:28

  • Status changed: CannotReproduce

@alk alk closed this as completed Aug 23, 2015
@peng-wang-cn
Copy link

Still seeing this on RHEL 7.4, with gperftools 2.8. This time it is stuck on system("mkdir ..."), when CPU profiler is enabled. Any updates?

@alk
Copy link
Contributor Author

alk commented Dec 19, 2020

Thanks for update. Please file fresh ticket and ideally with some means to reproduce. Also if you're able to attach backtraces from all threads involved that would help a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants