Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testinstall hangs when compiled with --static #22

Closed
mppf opened this issue Apr 27, 2015 · 17 comments
Closed

testinstall hangs when compiled with --static #22

mppf opened this issue Apr 27, 2015 · 17 comments

Comments

@mppf
Copy link

mppf commented Apr 27, 2015

On Ubuntu 14.10 64 or 32-bit, which has gcc version 4.9.1,
compiling testinstall.cc with --static results in a program that
hangs. I don't see this problem on other platforms.

git clone https://github.com/google/re2.git
cd re2
make clean
make
g++ --static testinstall.cc -L obj -lre2 -I . -pthread
./a.out

the a.out program hangs, but if you take away --static, it does not hang.

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

I've just committed 800c90a to test the static library as well. I'll see whether I can reproduce the issue now.

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

Confirmed on Ubuntu 14.10.

The output of strace(1) indicates that something is sleeping on a futex and blocking indefinitely:

…
brk(0xdef200)                           = 0xdef200
brk(0xdf0000)                           = 0xdf0000
futex(0x7bc4a0, FUTEX_WAKE_PRIVATE, 1)  = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
futex(0xdd02f4, FUTEX_WAIT_PRIVATE, 0, NULL

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

…
(gdb) bt
#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:85
#1  0x0000000000445bce in Lock (this=0x7da2e8) at ./util/mutex.h:111
#2  MutexLock (mu=0x7da2e8, this=<synthetic pointer>) at ./util/mutex.h:151
#3  RunStateOnByteUnlocked (c=<optimised out>, state=0x7db370, this=0x7da2d0) at re2/dfa.cc:962
…

I have no idea what changed to break this. :)

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

You can #undef HAVE_RWLOCK in util/mutex.h to work around this for now, but note that the DFA state cache uses read-write locks, so contention will have a greater effect on performance.

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

Interestingly, https://sourceware.org/git/?p=glibc.git;a=commit;h=b7aa8caacee9ec707835ee48d14ab46bfdbae4e9 removed …/x86_64/pthread_rwlock_wrlock.S in mid June, so I'm not sure what code Ubuntu 14.10 is running...

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

The assembly implementation appears not to have changed in any pertinent way for years, so I don't think glibc is the culprit. The problem also occurs with GCC 4.8.3 on Ubuntu 14.10, so I can't blame GCC 4.9.1 either.

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

I added some logging to Mutex in order to trace what's going on:

./util/mutex.h:111: 0xbaed40 pthread_rwlock_init(&mutex_, NULL)
./util/mutex.h:111: 0xbaf198 pthread_rwlock_init(&mutex_, NULL)
./util/mutex.h:113: 0xbaf198 pthread_rwlock_wrlock(&mutex_)
./util/mutex.h:111: 0xbaf348 pthread_rwlock_init(&mutex_, NULL)
./util/mutex.h:111: 0xbaf3a0 pthread_rwlock_init(&mutex_, NULL)
./util/mutex.h:114: 0xbaf198 pthread_rwlock_unlock(&mutex_)
./util/mutex.h:116: 0xbaf3a0 pthread_rwlock_rdlock(&mutex_)
./util/mutex.h:113: 0xbaf348 pthread_rwlock_wrlock(&mutex_)
./util/mutex.h:114: 0xbaf348 pthread_rwlock_unlock(&mutex_)
./util/mutex.h:113: 0xbaf348 pthread_rwlock_wrlock(&mutex_)
^C

I don't know why that last writer lock is waiting to be woken, but as I expected, RE2 doesn't seem to be doing anything wrong here.

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

…
(gdb) bt
#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:34
#1  0x00000000004224c1 in re2::Mutex::Lock (this=0x7e2218) at ./util/mutex.h:113
#2  0x000000000044510b in MutexLock (mu=0x7e2218, this=<synthetic pointer>) at ./util/mutex.h:153
#3  re2::Prog::GetDFA (this=0x7e21f0, kind=re2::Prog::kLongestMatch) at re2/dfa.cc:1815
…
(gdb) bt
#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:34
#1  0x00000000004224c1 in re2::Mutex::Lock (this=0x7e23c8) at ./util/mutex.h:113
#2  0x000000000044ad48 in MutexLock (mu=0x7e23c8, this=<synthetic pointer>) at ./util/mutex.h:153
#3  re2::DFA::AnalyzeSearchHelper (this=this@entry=0x7e23b0, params=params@entry=0x7fffffffd2c0, info=info@entry=0x7e2498, flags=flags@entry=5) at re2/dfa.cc:1683
…
(gdb) bt
#0  pthread_rwlock_wrlock () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_rwlock_wrlock.S:34
#1  0x0000000000449040 in Lock (this=0x7e23c8) at ./util/mutex.h:113
#2  MutexLock (mu=0x7e23c8, this=<synthetic pointer>) at ./util/mutex.h:153
#3  RunStateOnByteUnlocked (c=98, state=0x7e2830, this=0x7e23b0) at re2/dfa.cc:962
…

Note that Mutex::Lock() has a different address in the third stack trace. For some reason, this is happening when the testinstall binary is statically linked, but not when it is dynamically linked.

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

Mea culpa. I'd neglected to remove the logging from Mutex, so its code would have been quite a bit larger and thus a lot less likely to be inlined. :/

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

Well, I've added more logging to Mutex in order to dump the pthread_rwlock_t internal state changes and now I'm thoroughly boggled...

When dynamically linked, pthread_rwlock_wrlock() will set __writer and then pthread_rwlock_unlock() will reset that to 0. All good.

When statically linked, pthread_rwlock_wrlock() _does not_ set __writer and so pthread_rwlock_unlock() will decrement __nr_readers, which is unsigned and thus goes from 0 to 4294967295. The second time around, pthread_rwlock_wrlock() will wait forever to be woken by a reader that never existed.

@mppf
Copy link
Author

mppf commented Apr 28, 2015

Thanks for looking at it - it is bizarre. I'm just using the workaround of not passing --static... but this bug is probably worth reporting to Ubuntu, at some point in the investigation.

@junyer
Copy link
Contributor

junyer commented Apr 28, 2015

It finally occurred to me that pthread_rwlock_wrlock() must actually be setting __writer to 0 because the TID in the TLS is 0, which I confirmed with some inline assembly.

@isomer pointed me to https://gcc.gnu.org/ml/gcc-help/2010-05/msg00029.html, which suggests the use of -Wl,--whole-archive -lpthread -Wl,--no-whole-archive when linking. That worked for me, but YMMV and I still have no idea what changed in Ubuntu 14.10. :(

@junyer
Copy link
Contributor

junyer commented Apr 29, 2015

Have you observed the problem on Ubuntu 15.04? If not, I can try to reproduce it – probably within the next week or two.

@mppf
Copy link
Author

mppf commented Apr 29, 2015

I can confirm that the problem still exists with Ubuntu 15.04 32- and 64-bit. However the bug behavior is different - the program core dumps instead of hanging. Removing --static or adding the -Wl,--whole-archive -lpthread -Wl,--no-whole-archive both solve the problem.

@junyer
Copy link
Contributor

junyer commented Apr 30, 2015

Ouch. I wonder whether the TID in the TLS is garbage as opposed to 0? Anyway, thanks for checking! I'll whip up a small test case – twiddling a read-write lock a few times should be sufficient, I guess – and file a bug over on Launchpad.

@junyer
Copy link
Contributor

junyer commented Apr 30, 2015

I've filed https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/1450355. I'll follow up there as needed, but there's nothing more that I can do here, so I'm going to close this issue.

@junyer junyer closed this as completed Apr 30, 2015
@junyer
Copy link
Contributor

junyer commented Mar 13, 2016

FYI, e37a0e8 solves the problem a different way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants