Export to GitHub

libtorrent - issue #508

When cache hits 530MB bad things happen


Posted on Sep 4, 2013 by Happy Hippo

This is tested on Windows XP sp2 32bit. Libtorrent rev8869. Compiled with msvc2008 using this cmd:

b2 -q --without-python --toolset=msvc-9.0 variant=release link=static runtime-link=static encryption=openssl logging=none geoip=static dht-support=on boost=source character-set=unicode boost-link=static -sBOOST_ROOT="C:\qBittorrent\boost_1_54_0" include="C:\QBITTORRENT\install\include" library-path="C:\QBITTORRENT\install\lib" -j 2

This is the relevant qbittorrent forum thread: http://qbforums.shiki.hu/index.php/topic,2042.0.html

As the user Switeck mentions this behavior is exhibited after 0.16.9. When setting the cache size to >=1GB and cache_expiry to 600s(max allowed from the ui) and seeding (locally for testing using LSD) he observed that around 530MB ram usage he sees this message in the log: "13/07/2013 04:57:34 - Reason: Torrent file () error: Not enough space 13/07/2013 04:57:34 - An I/O error occured, 'Torrent' paused."

Also if he tries to open the "options" window of qbittorrent the app crashes. But this may be related to another bug.

Anyway, I too tested this locally and got the same behavior. At ~530MB Private Bytes I get the above message which refers to the current seeding torrent and afterwards I watch the Private Bytes size drop instantly to 26-30MB (as when first launched). What I find weird is that the message is confusing. I assume that this message comes from libtorrent itself and not from the OS because OS errors have the message string translated to my OS locale. This is always in english. Furthermore, I suspect that it means "not enough space in RAM" and not "not enough space on disk".

Furthermore the user says that with 0.16.9 the same behavior happens at a much larger limit ie around 1.6GB. I didn't test with 0.16.9. I provided to the user a build of libtorrent 0.16.10 with the following patch reversed->https://code.google.com/p/libtorrent/issues/attachmentText?id=405&aid=4050003000&name=disk_pool.diff&token=QDIiofRFzpNtXfrEFs8XJQJGOZU%3A1374087592961 He said that this build allowed him to go past 530MB. So probably that is the culprit.

How to quickly reproduce this:

  1. Seed something large in libtorrent(above 1GB), and set libtorrent for LSD, cache >= 1GB, cache_expiry >= 600s
  2. Use another bittorrent client on the same machine, allow access only to localhost for LSD, add the same torrent as libtorrent but in different save path
  3. Start downloading and observe RAM usage. It should drop(and maybe crash at around 530MB)

Comment #1

Posted on Sep 5, 2013 by Quick Rhino

The definitely looks like a filesystem error. I can't find a windows error with the description "not enough space" though. In your logging, you may want to include the error value and category name as well.

It could be this one: ERROR_INSUFFICIENT_BUFFER, but the fact that it's tied to how much memory the process uses it sounds like it may be connected to the balance set manager.

Comment #2

Posted on Sep 5, 2013 by Helpful Elephant

The point of that patch was to fix an issue where disk cache blocks would never be returned to the system. It's kind of an important patch. It's possible that maybe something is going wrong where more memory is allocated than what's used, and the process runs out of virtual address space. All disk blocks after this patch still were allocated with VirtualAlloc, to get page aligned buffers.

Buffers are also allocated 16 kiB at a time, perhaps there is some accounting overhead for such small allocations that's causing the virtual address space to explode. Could you look (or ask the user to look) at the virtual memory used by the process when it breaks?

Comment #3

Posted on Sep 5, 2013 by Helpful Elephant

I think this explains the underlying problem. My guess is that VirtualAlloc() fails because the process runs out of virtual address space. http://cbloomrants.blogspot.com/2009/01/01-16-09-virtual-memory.html

Comment #4

Posted on Sep 5, 2013 by Helpful Elephant

Could you please try this patch against libtorrent 0.16.11?

Attachments

Comment #5

Posted on Sep 5, 2013 by Happy Hippo

Virtual size(in procexp) == 640MB

With 0.16.11+patch I don't get a crash and it goes well above 530MB. I have to make more tests though because I noticed: 1. When the cache is about to fill qbittorrent stops uploading. Then it resumes for a few seconds at a low speed and stops again. In a loop 2. I observed the cache not respecting the limit I had set.

Also latest RC_0_16 doesn't compile. it complains about a keys var etc. Probably related to rev8887

Comment #6

Posted on Sep 7, 2013 by Happy Hippo

Some updates.

First of all when I said in my previous comment "Virtual size(in procexp) == 640MB" it was meant for the unpatched code.

Now some more info about the unpatched code: 1. The alert I get is file_error_alert 2. error.value() == 12 3. error.category().name() == generic

Now with the patch: 1. I don't get that alert, ram usage continues past 530MB normally 2. I had to set a lower limit to test the following because total physical ram for me is 1.5GB. I set cache to 500MB and I started the same torrent on another machine on the same network(lsd). Unfortunately, the cache limit isn't respected. It grows to around 670MB(+170MB from the limit) and then still continues growing but at a much smaller rate(a few KBs per sec). 3. The upload speed fluctuates until the peer(other machine) is disconnected and the seeding machine just stops all activity. I don't know if the seeding machine disconnects the peer or the peer disconnects from seed. I suspect that the seed disconnects first, probably because it can't push more data into the cache. But isn't the cache supposed to evict older items even if they haven't expired to make room for new data?

I acknowledge that the last 2 items are new bugs and that this bug should be considered as fixed. Let me know if you want me to open a new issue.

Also please fix RC_0_16 compilation errors.

Comment #7

Posted on Sep 7, 2013 by Helpful Elephant

could you build libtorrent with statistics on? (--enable-statistics or statistics=on). That will log a lot of stats, among other things; the disk cache size and limits.

Comment #8

Posted on Sep 7, 2013 by Happy Hippo

libtorrent 0.16.11+aligned_malloc.diff+statistics

3076.0000.log cache == 500MB (it reached 680MB) cache_expiry == 600s Transfered a 2.9 successfully without a drop in speed.

2388.0000.log cache == 1000MB cache_expiry == 600s Transfer of the same torrent stopped at around 480MB ram usage. I observed in the leeching machine that qbittorrent banned the seeding ip because of a peer_ban_alert on that ip.

(leeching machine is debian sid using 0.16.10)

Attachments

Comment #9

Posted on Sep 7, 2013 by Helpful Elephant

if the banning can be reproduced, I would be very interested in knowing what caused it. Are there any other alerts around the time it's banned? any peer_error alerts or anything? You don't happen to have a verbose log of the peer getting banned?

According to the internal cache accounting, the cache is not exceeding the limit at all. There may be a deeper issue with perhaps a memory leak. What are you looking at that says 680 MB?

Comment #10

Posted on Sep 7, 2013 by Helpful Elephant

do you have any memory leak tools to profile where memory is allocated?

Comment #11

Posted on Sep 7, 2013 by Happy Hippo

Unfortunately qbittorrent doesn't catch all alerts. Only the ones it is interested in. I'll try to make modifications and log all alerts with timestamps. Is there a way to get a verbose log of a peer getting banned?

680MB is the "private bytes" column of process explorer(sysinternals)

Memory leak is plausible. And maybe it is on the qbittorrent side. However I don't know of any tools on Windows that can help me, like valgrind on linux. Do you have any suggestions? On second thought, some of the extra MBs should be attributed to the space qbittorrent code takes on memory, right? I mean it is qbittorrent code+libtorrent code+libtorrent cache. But I don't think that qbittorrent+libtorrent code can occupy 180MB of RAM by themselves....

Comment #12

Posted on Sep 8, 2013 by Helpful Elephant

I believe google's tcmalloc would work on windows (they renamed it to something else, I can't recall, but tcmalloc should give a hit).

180 MiB doesn't sound impossible, but it does sound toward the higher end.

Comment #13

Posted on Dec 7, 2014 by Helpful Elephant

I believe this was fixed when moving away from VirtualAlloc() as the page aligned allocator for disk buffers

Comment #14

Posted on Mar 16, 2015 by Happy Hippo

I just read your blog post and came here. Funilly, enough I have another user reporting similar problem. We have a backtrace showing the crashing happening when allocating memory. I link the specific forum post: http://qbforums.shiki.hu/index.php/topic,3198.msg15720.html#msg15720 In which version did you move away from VirtualAlloc()? The version used by the user is using 1.0.3 r10821.

Comment #15

Posted on Mar 16, 2015 by Helpful Elephant

that does look like memory allocation failing. possibly because of exhausted virtual memory.

However, the fix to the block allocator was definitely in 1.0.3.


r8918 | arvidn | 2013-09-05 03:14:42 -0400 (Thu, 05 Sep 2013) | 1 line

merge memory allocation fix from RC_0_16

Comment #16

Posted on Mar 16, 2015 by Happy Hippo

Should I open a new issue to track this? Is there some other info you want about this? Note: The user reports that the 64bit version doesn't crash. He also reports that the crash happens when he seeds torrents that have a piece size of 8MB or 16MB. Smaller piece size doesn't crash.

Comment #17

Posted on Mar 16, 2015 by Helpful Elephant

I'd say it depends. There are certain circumstances where this is expected behavior. If the process legitimately is out of memory for instance there isn't much one can do about this. libtorrent is unfortunately sprinkled with assumptions that the memory allocator won't fail and I don't think it's practical to try to gracefully handle all those cases.

perhaps the user has configured the disk cache larger than what a 32 bit process can hold for instance.

Comment #18

Posted on Mar 16, 2015 by Happy Hippo

The user has set the cache to 1500MB and it still crashed. And today he has a new post after the link one that reads:

I'm not sure 8mb and 16 mb pieces are exactly the cause, but i'm sure at the same speed (16MB/s) big pieces use more memory

example : at 16MB/s upload with <=4MB pieces qbt use 1500MB ram at 16MB/s upload with >4MB pieces qbt use >1800MB ram and crash

But what happen if someone seed at 50MB/s with small pieces ? proly crash too, i don't know, i can't test

(I don't know if he had set the cache in the above quoted text. However, qbt puts an upper limit of 1800MB to cache if it is compiled for 32bit environments).

So it seems that qbt doesn't exhaust the 2GB address space before it crashes. -at least the reported ram usage doesn't reach that limit-

Comment #19

Posted on Mar 17, 2015 by Happy Hippo

Another update from the forum. He set the cache to 1000MB, but the ram usage continued increasing until it reached about 1800MB and then it crashed again.

Comment #20

Posted on Mar 17, 2015 by Quick Kangaroo

Sorry i don't speak english good plus my feedback was confuse, qbt no longer crash with 1000MB limit, qbt crash only when i use more than 1800MB and pieces size are not the problem. but finally someone found an enigmatic fact. The R.A.M usage = cache limit * ~1.3

some tests to prove it:

R.A.M usage / cache limit 64bits ~5285/4096 = 1.29... 32bits ~1400/1024 = 1.36...

I'm very curious to know if someone has a mathematical explanation to explain this coeff?

Thanks.

Comment #21

Posted on Mar 17, 2015 by Quick Kangaroo

so, if this theory is exact, 1800/~1.3 = ~1384.., that's why when i set cache limit to 1400MB qbt crash but with 1024MB limit qbt don't crash cause qbt never use more than 1024*~1.3 R.A.M I use approximative values but i hope you understand what i mean ?

Comment #22

Posted on Mar 17, 2015 by Happy Hippo

Could the 1.32 be some kind of overhead? Like mem occupied by data structures? Or does cache control takes into account the size of the datastructures too and not only the size of the payload?

Comment #23

Posted on Mar 18, 2015 by Helpful Elephant

the disk cache is not the only thing that uses memory in a bittorrent client, it's just the one most significant user of memory (typically).

Comment #24

Posted on Mar 18, 2015 by Quick Kangaroo

Thanks for your help arvid, ok the disk cache is not the only thing but why there is a relation between the cache limit and the total memory usage ? we don't understand this enigmatic fact. Is it libtorrent related ?

Comment #25

Posted on Mar 18, 2015 by Happy Hippo

the disk cache is not the only thing that uses memory in a bittorrent client, it's just the one most significant user of memory (typically).

Yes I agree. But the user reported that with cache set to 1400MiB the crash still happened. So it means that the rest of the program somehow consumed 600MiB of extra RAM. Is this normal when operating under such conditions(big cache and fiber optic connections)? This is a genuine question, I am not trying to troll you.

Comment #26

Posted on Mar 18, 2015 by Helpful Elephant

It does sound like a lot, but not so much that it sounds completely unlikely. I imagine that if the heap is fragmented enough, using 600 MiB of virtual memory is't all that difficult.

Comment #27

Posted on Mar 18, 2015 by Helpful Elephant

are there any good heap profiles for windows? that could probably point to where all that memory is going.

tcmalloc could work if there aren't any better ones.

Comment #28

Posted on Mar 18, 2015 by Happy Hippo

I have found Heapy: https://github.com/lukedodd/Heapy It works without recompiling. However, I don't know if the report it builds is verbose enough for you. (I could try intergrating tcmalloc if you want) It appends every 10s updated stats in the same file, so you only need to look at the last print in every file I am linking to see what is the mem distribution before the crash. The user did 2 runs with crashes: 1. http://pastebin.com/uzwP1RKh 2. http://pastebin.com/Cek2vEwC

Comment #29

Posted on Mar 19, 2015 by Quick Kangaroo

for this 2 crashes, i set - "disk write cache size" to 1400MiB - "disk cache expiry to 600s (to produce the crashs faster) - unchecked "Enable OS cache" - on Windows 8.1

Comment #30

Posted on May 16, 2015 by Happy Hippo

arvid, any news on this? Are you able to discern anything from the mem logs attached 2 comments above?

Comment #31

Posted on May 18, 2015 by Helpful Elephant

those heap profiles (as far as I can tell) don't seem to suggest that there's a lot of memory used at all. Did those runs terminate with the error?

Comment #32

Posted on May 19, 2015 by Happy Hippo

As I say in my comment above, yes it resulted in crashes.

Comment #33

Posted on May 19, 2015 by Happy Hippo

Comment deleted

Status: Fixed

Labels:
Type-Defect Priority-Medium