Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SEGV in tcmalloc #585

Closed
alk opened this issue Aug 23, 2015 · 7 comments
Closed

SEGV in tcmalloc #585

alk opened this issue Aug 23, 2015 · 7 comments

Comments

@alk
Copy link
Contributor

alk commented Aug 23, 2015

Originally reported on Google Code with ID 582

What steps will reproduce the problem?
1. I am using tcmalloc in my code
2. I am also using leveldb in my code
3. I see a SEGV in ptmalloc while leveldb invokes background compaction crashing my
program

What is the expected output? What do you see instead?

I see a SEGV in ptmalloc instead of funcitoning program.

What version of the product are you using? On what operating system?

i am using gperftools version 2.1 on ubuntu 12.0.4

Please provide any additional information below.

Here is the stack trace from my optimized binaries at the time of the SEGV:

#0  SLL_Next (t=0x10715b74bec51b15) at src/linked_list.h:44
#1  SLL_Pop (list=<optimized out>) at src/linked_list.h:58
#2  Pop (this=0x18de2e0) at src/thread_cache.h:215
#3  Allocate (cl=1,
    size=<error reading variable: Cannot access memory at address 0x8>,
    this=<optimized out>) at src/thread_cache.h:367
#4  do_malloc_small (
    size=<error reading variable: Cannot access memory at address 0x8>,
    heap=<optimized out>) at src/tcmalloc.cc:1088
#5  do_malloc_no_errno (size=8) at src/tcmalloc.cc:1095
#6  cpp_alloc (nothrow=false, size=8) at src/tcmalloc.cc:1423
#7  tc_new (size=8) at src/tcmalloc.cc:1601
#8  0x00007f913e6e59ce in allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/4.6/ext/new_allocator.h:92
#9  _M_allocate (__n=<optimized out>, this=<optimized out>)
    at /usr/include/c++/4.6/bits/stl_vector.h:150
#10 std::vector<unsigned int, std::allocator<unsigned int> >::_M_insert_aux (
    this=0x25bb448, __position=..., __x=<optimized out>)
    at /usr/include/c++/4.6/bits/vector.tcc:327
#11 0x00007f913d76a9e7 in leveldb::BlockBuilder::Add(leveldb::Slice const&, leveldb::Slice
const&) () from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#12 0x00007f913d76e8e1 in leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice
const&) () from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
---Type <return> to continue, or q <return> to quit---
#13 0x00007f913d7527f2 in leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)
() from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#14 0x00007f913d752ff2 in leveldb::DBImpl::BackgroundCompaction() ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#15 0x00007f913d753acb in leveldb::DBImpl::BackgroundCall() ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#16 0x00007f913d77319f in leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper(void*)
() from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#17 0x00007f913e921e9a in start_thread ()
   from /lib/x86_64-linux-gnu/libpthread.so.0
#18 0x00007f913da7accd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#19 0x0000000000000000 in ?? ()

Any ideas?

thanks,
Sameer


Reported by sameer.s.mahajan on 2013-10-17 07:23:09

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I reinstalled gperftools and recompiled debug binaries for my code. I hit the SEGV again.
This time in src/central_freelist.cc:298 span->objects = *(reinterpret_cast<void**>(result));
Here is the call stack this time:


#0  tcmalloc::CentralFreeList::FetchFromSpans (this=0x7f9898f1caa0)
    at src/central_freelist.cc:298
#1  0x00007f9898cf1078 in tcmalloc::CentralFreeList::RemoveRange (
    this=0x7f9898f1caa0, start=0x7f982d066480, end=0x7f982d066488, N=166)
    at src/central_freelist.cc:269
#2  0x00007f9898cf4202 in tcmalloc::ThreadCache::FetchFromCentralCache (
    this=0x865e18, cl=<optimized out>, byte_size=8) at src/thread_cache.cc:160
#3  0x00007f9898d052d8 in Allocate (cl=<optimized out>, size=<optimized out>,
    this=<optimized out>) at src/thread_cache.h:364
#4  do_malloc_small (size=<optimized out>, heap=<optimized out>)
    at src/tcmalloc.cc:1088
#5  do_malloc_no_errno (size=5) at src/tcmalloc.cc:1095
#6  cpp_alloc (nothrow=false, size=5) at src/tcmalloc.cc:1423
#7  tc_newarray (size=5) at src/tcmalloc.cc:1631
#8  0x00007f98981e7574 in leveldb::Status::Status(leveldb::Status::Code, leveldb::Slice
const&, leveldb::Slice const&) ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#9  0x00007f98981d80be in leveldb::Version::Get(leveldb::ReadOptions const&, leveldb::LookupKey
const&, std::string*, leveldb::Version::GetStats*) ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1
#10 0x00007f98981bd98c in leveldb::DBImpl::Get(leveldb::ReadOptions const&, leveldb::Slice
const&, std::string*) ()
   from /home/storagevisor/leveldb-1.14.0/libleveldb.so.1

< valid stack from my code below>

I am using --enable-frame-pointers option while configuring since I do not want to
install libunwind.

Reported by sameer.s.mahajan on 2013-10-17 10:54:31

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Thanks for raising this.

First, I need to be sure, that this is not bug in your application. Have you tried
running without tcmalloc but with either valgrind or address sanitizer ?

Alternatively, if you can attach test program or link to test program that would help
too.

Reported by alkondratenko on 2013-10-17 16:36:43

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Yes I have tried it with valgrind successfully without tcmalloc.

It is rather complex system to share in isolation. I will see how I can share the repro.

Is there any additional logging and/or profiling that I can do to help you identify
the problem? It repros consistently in my environment.

Reported by sameer.s.mahajan on 2013-10-18 07:04:10

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Ok. Can you share your compiler and toolchain?

Recently somebody reported very similarly looking issue on windows with intel compiler.
I suspect we might be having pointer aliasing bug that more aggressive compiler is
hitting.

In order to investigate that possibility:

* can you try with -O0 or something like that ?

* can you try with -fno-strict-aliasing (my understanding is both clang and icc support
this flag on GNU/Linux) ?

* can try to post disassembly of FetchFromSpans ?

Reported by alkondratenko on 2013-10-18 15:37:01

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

I am using g++ version 4.6.3 on ubuntu.

Note that I was getting an error even with debug (-g) compilation.

We re factored and rewrote some code to accommodate some additional functionality in
our system and now the issue does not seem to reproduce. There were some changes in
areas around which the failures were seen however I haven't yet completely analyzed
where and whether there were any issues in our code.

Reported by sameer.s.mahajan on 2013-10-20 03:47:03

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

While analyzing some valgrind errors I could isolate a small standalone leveldb program
which gives errors similar to our code. I have posted to the issue here: http://code.google.com/p/leveldb/issues/detail?id=211
. I am posting the update here as well in case it is related.

Reported by sameer.s.mahajan on 2013-10-23 06:47:44

@alk
Copy link
Contributor Author

alk commented Aug 23, 2015

Closing then. Please reopen if you can reproduce it.

Reported by alkondratenko on 2013-10-27 00:12:15

  • Status changed: CannotReproduce

@alk alk closed this as completed Aug 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant