When running the new test script (test_commands.sh), the iperf3 client hangs on 2 of the tests:
./src/iperf3 -c $host -P 2 -t 5 -R and ./src/iperf3 -c $host -Z -t 5
And when you ^C the client, the server dies.
Comment #1
Posted on Dec 20, 2013 by Happy KangarooThis happened on OSX, but Linux seems OK.
Comment #2
Posted on Dec 22, 2013 by Happy KangarooThis seems to reliably reproduce the problem on linux:
!/bin/sh
set -x while [ 1 ] do ./src/iperf3 -P 2 -c localhost -t 5 ./src/iperf3 -P 2 -c localhost -t 5 -R done
It works for 3-6 loops, and then locks up. (1 time the server crashed).
Hopefully that will help track it down.
Comment #3
Posted on Dec 24, 2013 by Happy KangarooRunning the server in gdb shows that the server is crashing on this line:
Program received signal SIGSEGV, Segmentation fault. 0x000000305784812c in vfprintf () from /lib64/libc.so.6
Which is called from here:
1808 iprintf(test, report_sum_bw_retrans_format, start_time, end_time, ubuf, nbuf, retransmits, irp->omitted?report_omitted:"");
Maybe Sasant's new patch will fix this?
Comment #4
Posted on Dec 24, 2013 by Happy HippoI am too able to reproduce this . The reverse -R option server getting crashed
~~~
getsockopt(5, SOL_TCP, TCP_INFO, "\1\0\0\0\0\7w\0(\21\3\0@\234\0\0\270\377\0\0\30\2\0\0\0\0\0\0\0\0\0\0"..., [104]) = 0
getsockopt(7, SOL_TCP, TCP_INFO, "\1\0\0\0\0\7w\0(\21\3\0@\234\0\0\270\377\0\0\30\2\0\0\0\0\0\0\0\0\0\0"..., [104]) = 0
write(1, "- - - - - - - - - - - - - - - - "..., 50- - - - - - - - - - - - - - - - - - - - - - - - -
) = 50
write(1, "[ 5] 8.02-9.00 sec 382 MB"..., 67[ 5] 8.02-9.00 sec 382 MBytes 3.27 Gbits/sec 5
) = 67
write(1, "[ 7] 8.02-9.00 sec 381 MB"..., 67[ 7] 8.02-9.00 sec 381 MBytes 3.26 Gbits/sec 0
) = 67
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x5} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
~~~
(gdb) bt
0 0x000000399144908f in vfprintf () from /lib64/libc.so.6
1 0x000000000040542a in vprintf (__arg=0x7fffffffda08,
__fmt=0x4110e0 <report_sum_bw_retrans_format> "\340SUM] %6.2f-%-6.2f sec %ss %ss/sec", ' ' <repeats 14 times>, "%s\n") at /usr/include/bits/stdio.h:38
2 iprintf (test=test@entry=0x617010, format=0x4110e0 "\340SUM] %6.2f-%-6.2f sec %ss %ss/sec", ' ' , "%s\n")
at iperf_api.c:2405
3 0x000000000040618b in iperf_print_intermediate (test=test@entry=0x617010) at iperf_api.c:1808
4 0x0000000000406468 in iperf_reporter_callback (test=0x617010) at iperf_api.c:2008
5 0x000000000040c9ac in tmr_run (nowP=nowP@entry=0x7fffffffdd10) at timer.c:189
6 0x0000000000409f43 in iperf_run_server (test=test@entry=0x617010) at iperf_server_api.c:586
7 0x0000000000401e92 in run (test=0x617010) at main.c:116
8 main (argc=, argv=0x7fffffffdf68) at main.c:91
gdb) f 0
0 0x000000399144908f in vfprintf () from /lib64/libc.so.6
(gdb) list
43 __STDIO_INLINE int
44 getchar (void)
45 {
46 return _IO_getc (stdin);
47 }
48
49
50 # ifdef __USE_MISC
51 /* Faster version when locking is not necessary. */
52 __STDIO_INLINE int
Looks like the stack is getting corrupted somewhere which is leading to crash
Need to dig more what is really causing the crash
Comment #5
Posted on Dec 24, 2013 by Happy DogI've been doing some digging into this. The hang and the crash might have two different causes, or might be two different manifestations of the same problem. Notes from a private email on this subject, where I was describing what I saw with FreeBSD 10.0 and -R. There's a hang but no crash.
A slightly lower level symptom of this problem is that at the end of the test, the client tries to send an TEST_END state change message to the server over the control connection. When in -R mode, the server doesn't seem to get it or read it reliably. However if I kill the client (because it seems hung) the server immediately gets the TEST_END and tries to do the end-of-test processing (it can't do this successfully because at this point the client has died and closed its side of the control connection).
In non -R mode this part all works as expected (I see the client send the TEST_END and the server receives it immediately, as we would expect).
This is all on FreeBSD 10.0, client and server on the same machine (so far it looks like the configuration where client and server are on the same machine is particularly vulnerable to this problem).
Comment #6
Posted on Jan 3, 2014 by Happy DogPartial fix committed in c499d0008f7d. There was basically a deadlock between the client and server in -R mode, see commit log for more details.
Not closing this yet...need to do some more tests to get a warm fuzzy feeling about the fix first. Also note that this doesn't address the server-side crashes that have been reported (but which I have not personally witnessed).
Comment #7
Posted on Jan 3, 2014 by Happy DogFixed the -P and -R server-side crash reported via Comments 2, 3, and 4, in 423166a54849. This only affected Linux; it was a mangled printf format string that only got used on that platform (it would have been used on any other platform with retransmit statistics, but there aren't currently any).
It's clear to me now that there were multiple issues being reported in this one bug. :-p
Comment #8
Posted on Jan 3, 2014 by Helpful WombatIf gcc isn't spitting out warnings on format strings as const char variables, it'd probably make sense to turn the format strings into typedefs or something to ensure that gcc spits out a warning if this kind of mismatch happens.
Comment #9
Posted on Jan 3, 2014 by Happy DogGood point. I don't see any warning messages for the format string mismatch (on a working copy rolled back to before my fix), but gcc isn't compiling with any warnings enabled either, as far as I can tell:
gcc -DHAVE_CONFIG_H -I. -g -O2 -MT iperf_api.o -MD -MP -MF .deps/iperf_api.Tpo -c -o iperf_api.o iperf_api.c
I'm not sure why this is...I'm used to living under -Wall and -Werror. Yet another thing to investigate.
Comment #10
Posted on Jan 3, 2014 by Happy DogUpdate: Just one sub-issue remaining from this bug report...that's the hang with -Z. I've been able to observe this on Mac OS, as reported in the initial bug report. It doesn't happen every time, at least not on my MacBook; sometimes the -Z test works just fine.
So far I have not been able to reproduce this problem on my other two development platforms (FreeBSD 10 and CentOS 6).
It's not clear to me if there's something platform-specific lurking about or not, although the sendfile(2) call used by the -Z option is slightly different on the three platforms I've been using (therefore there are slightly different codepaths being used).
Comment #11
Posted on Jan 4, 2014 by Swift GiraffeIn my tests, OSX hangs every time. Linux is now working fine.
Comment #12
Posted on Jan 21, 2014 by Happy DogUpdate: I'm still seeing this issue (but not consistently) on MacOS 10.8 and MacOS 10.9.
Status: Accepted
Labels:
Type-Defect
Priority-High
Milestone-3.0-Release