Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sideline thread has no chance to clean up #297

Open
derekbruening opened this issue Nov 27, 2014 · 7 comments
Open

sideline thread has no chance to clean up #297

derekbruening opened this issue Nov 27, 2014 · 7 comments

Comments

@derekbruening
Copy link
Contributor

From derek.br...@gmail.com on April 23, 2010 14:49:38

In debug build DR synchs + terminates our sideline thread prior to calling
our exit event so we have no chance to clean up its memory. DR then
asserts about leaks if the sideline thread had thread-local memory. DR
either needs to call the client exit event even sooner, before the
synchall, or should add support for a sideline exit routine.

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=297

@derekbruening
Copy link
Contributor Author

From bruen...@google.com on October 02, 2014 10:55:17

Labels: GoodContrib

zhaoqin added a commit that referenced this issue Apr 7, 2017
…exit

We split the synch-all threads operation into two synch operations:
- pre-exit synch: synchronize all app threads and ignore client threads
- post-exit synch: synchronize all threads.
The pre-exit synch is called before all client thread exit and process exit
events.  The post-exit synch is called after all the client exit events.
By doing so, the sideline client thread could have a chance to perform
graceful exit from the process exit event.

Fixes #297
zhaoqin added a commit that referenced this issue Apr 7, 2017
We split the synch-all threads operation into two synch operations:

pre-exit synch: synchronize all app threads and ignore client threads
post-exit synch: synchronize all threads.
The pre-exit synch is called before all client thread exit and process exit
events. The post-exit synch is called after all the client exit events.
By doing so, the sideline client thread could have a chance to perform
graceful exit from the process exit event.

Fixes #297
zhaoqin added a commit that referenced this issue Apr 12, 2017
We split the synch-all threads operation into two synch operations:

pre-exit synch: synchronize all app threads and ignore client threads
post-exit synch: synchronize all threads.
The pre-exit synch is called before all client thread exit and process exit
events. The post-exit synch is called after all the client exit events.
By doing so, the sideline client thread could have a chance to perform
graceful exit from the process exit event.

Fixes #297
zhaoqin added a commit that referenced this issue Apr 12, 2017
…on on Unix

We split the synch-all threads operation into two synch operations:
- pre-exit synch: synchronize all app threads and ignore client threads
- post-exit synch: synchronize all threads.
The pre-exit synch is called before all client thread exit and process exit
events. The post-exit synch is called after all the client exit events.
By doing so, the sideline client thread could have a chance to perform
graceful exit from the process exit event.

Fixes #297
zhaoqin added a commit that referenced this issue Apr 14, 2017
…ion on Unix

We split the synch-all threads operation into two synch operations:
- pre-exit synch: synchronize all app threads and ignore client sideline threads
- post-exit synch: synchronize all threads.
The pre-exit synch is called before all app thread exit and process exit
events. The post-exit synch is called after all the app exit events.
By doing so, the sideline thread could be notified and have a chance to exit
gracefully before the app process exit event.

Fixes #297
@zhaoqin
Copy link
Contributor

zhaoqin commented Apr 20, 2017

Reopen this issue. If we delay the client thread exit and the client thread exit on the first synch-all, there could be a racy condition and the threads array may held stale pointer, and a SIGSEGV may occur.

@zhaoqin zhaoqin reopened this Apr 20, 2017
@egrimley
Copy link
Contributor

This Travis failure may be relevant: https://travis-ci.org/DynamoRIO/dynamorio/jobs/224281671

@zhaoqin
Copy link
Contributor

zhaoqin commented Apr 21, 2017 via email

@zhaoqin
Copy link
Contributor

zhaoqin commented Apr 22, 2017

A hang:

pre-DR init
<Starting application /home/zhaoqin/Workspace/DynamoRIO/builds/build_x64_dbg.git/suite/tests/bin/api.static_sideline (14609)>
<unable to determine lib path for cross-arch execve>
<Initial options = -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
in dr_client_main
client thread 0 is alive
client thread 1 is alive
client thread 2 is alive
client thread 3 is alive
pre-DR start
<Attached to 5 threads in application /home/zhaoqin/Workspace/DynamoRIO/builds/build_x64_dbg.git/suite/tests/bin/api.static_sideline (14609)>
<Detaching from application /home/zhaoqin/Workspace/DynamoRIO/builds/build_x64_dbg.git/suite/tests/bin/api.static_sideline (14609)>
<Detaching from process, entering final cleanup>
Saw some bb events
post-DR detach
re-attach attempt
<Starting application /home/zhaoqin/Workspace/DynamoRIO/builds/build_x64_dbg.git/suite/tests/bin/api.static_sideline (14609)>
<Initial options = -stack_size 56K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
in dr_client_main
client thread 0 is alive
client thread 1 is alive
client thread 2 is alive
client thread 3 is alive
<Detaching from application /home/zhaoqin/Workspace/DynamoRIO/builds/build_x64_dbg.git/suite/tests/bin/api.static_sideline (14609)>
<Detaching from process, entering final cleanup>

In GDB

(gdb) info threads
  Id   Target Id         Frame 
* 1    Thread 0x7fbe6fd1d740 (LWP 14609) "api.static_side" (Exiting) 0x00000000004bf614 in add_to_free_list (dcontext=0xffffffffffffffff, cache=0x4b5ad048, unit=0x4b5ad428, f=0x4b616d18, 
    start_pc=0x4b5dff28 "\030maK", size=28) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fcache.c:3001
(gdb) where
#0  syscall_ready () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/arch/x86/x86_shared.asm:180
#1  0x0000000000003921 in ?? ()
#2  0x00000000006f1018 in ksynch_wait (futex=0x9ea840 <change_linking_lock>, mustbe=1) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/unix/ksynch_linux.c:120
#3  0x00000000006cb655 in mutex_wait_contended_lock (lock=0x9ea840 <change_linking_lock>) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/unix/os.c:9288
#4  0x00000000004fd472 in mutex_lock (lock=0x9ea840 <change_linking_lock>) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/utils.c:891
#5  0x00000000004fda4e in acquire_recursive_lock (lock=0x9ea840 <change_linking_lock>) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/utils.c:1008
#6  0x00000000004bf614 in add_to_free_list (dcontext=0xffffffffffffffff, cache=0x4b5ad048, unit=0x4b5ad428, f=0x4b616d18, start_pc=0x4b5dff28 "\030maK", size=28)
    at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fcache.c:3001
#7  0x00000000004b5290 in fifo_prepend_empty (dcontext=0xffffffffffffffff, cache=0x4b5ad048, unit=0x4b5ad428, f=0x4b616d18, start_pc=0x4b5dff28 "\030maK", size=28)
    at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fcache.c:2280
#8  0x00000000004ca210 in fcache_remove_fragment (dcontext=0xffffffffffffffff, f=0x4b616d18) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fcache.c:3665
#9  0x000000000048c250 in fragment_delete (dcontext=0xffffffffffffffff, f=0x4b616d18, actions=135) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:3103
#10 0x0000000000484eb2 in hashtable_fragment_reset (dcontext=0xffffffffffffffff, table=0x4b5be9f8) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:1192
#11 0x0000000000485f0a in fragment_reset_free () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:1550
#12 0x00000000004861fa in fragment_exit () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:1617
#13 0x000000000047224d in dynamo_shared_exit (toexit=0x4b5bd288) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/dynamo.c:974
#14 0x00000000005ead7e in detach_on_permanent_stack (internal=true, do_cleanup=true) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/synch.c:2108
#15 0x0000000000474ff6 in dr_app_stop_and_cleanup () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/dynamo.c:2737
#16 0x000000000040be5c in main (argc=1, argv=0x7ffe864ff998) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/suite/tests/api/static_sideline.c:177

(gdb) p change_linking_lock
$23 = {lock = {lock_requests = 1, contended_event = -1, name = 0x71f390 "change_linking_lock(recursive)@/home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/link.c:108", rank = 17, owner = 14625, 
    owning_dcontext = 0x4b626340, prev_owned_lock = 0x9eb3a0 <bb_building_lock>, count_times_acquired = 441, count_times_contended = 3, count_times_spin_pause = 0, max_contended_requests = 0, 
    count_times_spin_only = 1, prev_process_lock = 0x4b5bee70, next_process_lock = 0x4b5ad908, callstack = {0x0, 0x0, 0x0, 0x0}, app_lock = false, deleted = true}, owner = 14625, count = 1}

So it is likely that a thread is killed while holding a lock,and the main thread hangs while trying to acquire that lock.

@zhaoqin
Copy link
Contributor

zhaoqin commented Apr 22, 2017

However, this happens in fragment_exit(), which is before instrument_exit(), and all other thread should have not been terminated by the main thread yet. How could thread 14625 exit without releasing the lock?

@zhaoqin
Copy link
Contributor

zhaoqin commented Apr 23, 2017

Is it possible that in the first synch-all operation, although we did not terminate the thread, we somehow messed up their state? Or the exited (app/client?) thread exited without cleanup?

I have seen 3 times similar hangs on the same lock, the other thread exit should have something to do with this lock.

(gdb) info thread
  Id   Target Id         Frame 
* 1    Thread 0x7f63c3369740 (LWP 2219) "api.static_side" (Exiting) 0x00000000004fd472 in mutex_lock (lock=0x9ea840 <change_linking_lock>) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/utils.c:891
(gdb) where
#0  syscall_ready () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/arch/x86/x86_shared.asm:180
#1  0x00000000000008be in ?? ()
#2  0x00000000006f1018 in ksynch_wait (futex=0x9ea840 <change_linking_lock>, mustbe=1) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/unix/ksynch_linux.c:120
#3  0x00000000006cb655 in mutex_wait_contended_lock (lock=0x9ea840 <change_linking_lock>) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/unix/os.c:9288
#4  0x00000000004fd472 in mutex_lock (lock=0x9ea840 <change_linking_lock>) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/utils.c:891
#5  0x00000000004fda4e in acquire_recursive_lock (lock=0x9ea840 <change_linking_lock>) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/utils.c:1008
#6  0x00000000004bf614 in add_to_free_list (dcontext=0xffffffffffffffff, cache=0x1141048, unit=0x1141428, f=0x11a6f70, start_pc=0x117266c "po\032\001", size=28)
    at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fcache.c:3001
#7  0x00000000004b5290 in fifo_prepend_empty (dcontext=0xffffffffffffffff, cache=0x1141048, unit=0x1141428, f=0x11a6f70, start_pc=0x117266c "po\032\001", size=28)
    at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fcache.c:2280
#8  0x00000000004ca210 in fcache_remove_fragment (dcontext=0xffffffffffffffff, f=0x11a6f70) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fcache.c:3665
#9  0x000000000048c250 in fragment_delete (dcontext=0xffffffffffffffff, f=0x11a6f70, actions=135) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:3103
#10 0x0000000000484eb2 in hashtable_fragment_reset (dcontext=0xffffffffffffffff, table=0x11529f8) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:1192
#11 0x0000000000485f0a in fragment_reset_free () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:1550
#12 0x00000000004861fa in fragment_exit () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/fragment.c:1617
#13 0x000000000047224d in dynamo_shared_exit (toexit=0x1151288) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/dynamo.c:974
#14 0x00000000005ead7e in detach_on_permanent_stack (internal=true, do_cleanup=true) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/synch.c:2108
#15 0x0000000000474ff6 in dr_app_stop_and_cleanup () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/dynamo.c:2737
#16 0x000000000040be5c in main (argc=1, argv=0x7ffe113d8658) at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/suite/tests/api/static_sideline.c:177
(gdb) p *lock
$29 = {lock_requests = 1, contended_event = -1, name = 0x71f390 "change_linking_lock(recursive)@/home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/link.c:108", rank = 17, owner = 2238, 
  owning_dcontext = 0x11ba340, prev_owned_lock = 0x9eb3a0 <bb_building_lock>, count_times_acquired = 443, count_times_contended = 2, count_times_spin_pause = 0, max_contended_requests = 0, 
  count_times_spin_only = 0, prev_process_lock = 0x1152e70, next_process_lock = 0x1141908, callstack = {0x0, 0x0, 0x0, 0x0}, app_lock = false, deleted = true}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants