Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: race between Close and Read #3507

Closed
alberts opened this issue Apr 11, 2012 · 11 comments
Closed

net: race between Close and Read #3507

alberts opened this issue Apr 11, 2012 · 11 comments
Milestone

Comments

@alberts
Copy link
Contributor

alberts commented Apr 11, 2012

What steps will reproduce the problem?

Run go.crypto/ssh unit tests repeatedly.

What do you see instead?

=== RUN TestCipherReversal-26
--- PASS: TestCipherReversal-26 (0.00 seconds)
=== RUN TestDefaultCiphersExist-26
--- PASS: TestDefaultCiphersExist-26 (0.00 seconds)
=== RUN TestClientAuthPublickey-26
--- PASS: TestClientAuthPublickey-26 (0.18 seconds)
=== RUN TestClientAuthPassword-26
--- PASS: TestClientAuthPassword-26 (0.17 seconds)
=== RUN TestClientAuthWrongPassword-26
--- PASS: TestClientAuthWrongPassword-26 (0.17 seconds)
=== RUN TestClientAuthInvalidPublickey-26
--- PASS: TestClientAuthInvalidPublickey-26 (0.17 seconds)
=== RUN TestClientAuthRSAandDSA-26
--- PASS: TestClientAuthRSAandDSA-26 (0.17 seconds)
=== RUN TestClientHMAC-26

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x70 pc=0x515479]

goroutine 19 [running]:
sync/atomic.CompareAndSwapUint32(0x70, 0x100000000, 0x0, 0x0, 0x4be29d, ...)
 go/src/pkg/sync/atomic/asm_amd64.s:12 +0xd
sync.(*Mutex).Lock(0x70, 0x0)
 go/src/pkg/sync/mutex.go:40 +0x35
net.(*netFD).Read(0x0, 0xf8401c1000, 0x100000001000, 0x0, 0x0, ...)
 go/src/pkg/net/fd.go:417 +0x42
net.(*TCPConn).Read(0xf8400a95e8, 0xf8401c1000, 0x100000001000, 0xfdc00000000, 0x0, ...)
 go/src/pkg/net/tcpsock_posix.go:87 +0xce
bufio.(*Reader).fill(0xf8400ce640, 0x100000001)
 go/src/pkg/bufio/bufio.go:77 +0xf2
bufio.(*Reader).Read(0xf8400ce640, 0xf8400a9620, 0x500000005, 0x5, 0x0, ...)
 go/src/pkg/bufio/bufio.go:142 +0x188
code.google.com/p/go.crypto/ssh.(*reader).Read(0xf84013ed20, 0xf8400a9620, 0x500000005,
0x784df0, 0x0, ...)
 go/src/pkg/code.google.com/p/go.crypto/ssh/agent.go:0 +0x72
io.ReadAtLeast(0xf8401a0870, 0xf84013ed20, 0xf8400a9620, 0x500000005, 0x7fae00000005,
...)
 go/src/pkg/io/io.go:254 +0xc3
io.ReadFull(0xf8401a0870, 0xf84013ed20, 0xf8400a9620, 0x500000005, 0xf84013ed20, ...)
 go/src/pkg/io/io.go:273 +0x69
code.google.com/p/go.crypto/ssh.(*reader).readOnePacket(0xf84013ed20, 0xf8400d3028,
0x7fae1d9ede70, 0x18)
 go/src/pkg/code.google.com/p/go.crypto/ssh/transport.go:85 +0xc3
code.google.com/p/go.crypto/ssh.(*transport).readPacket(0xf84013ed20, 0x0, 0x0, 0x0)
 go/src/pkg/code.google.com/p/go.crypto/ssh/transport.go:135 +0x25
code.google.com/p/go.crypto/ssh.(*ClientConn).mainLoop(0xf840196720, 0x0)
 go/src/pkg/code.google.com/p/go.crypto/ssh/client.go:190 +0x86
created by code.google.com/p/go.crypto/ssh.Client
 go/src/pkg/code.google.com/p/go.crypto/ssh/client.go:38 +0x16b

goroutine 1 [chan receive]:
testing.RunTests(0x400c00, 0x784580, 0x1b0000001b, 0x1, 0x7718b0, ...)
 go/src/pkg/testing/testing.go:350 +0x79f
testing.Main(0x400c00, 0x784580, 0x1b0000001b, 0x782fa8, 0x400000004, ...)
 go/src/pkg/testing/testing.go:285 +0x7a
main.main()
    go-build173108041/code.google.com/p/go.crypto/ssh/_test/_testmain.go:103 +0x91

goroutine 2 [syscall]:
created by runtime.main
 go/src/pkg/runtime/proc.c:221

goroutine 6 [syscall]:
syscall.Syscall6()
 go/src/pkg/syscall/asm_linux_amd64.s:40 +0x5
syscall.EpollWait(0xf800000006, 0xf8401a12d0, 0xa0000000a, 0xffffffff, 0xc, ...)
 go/src/pkg/syscall/zerrors_linux_amd64.go:1781 +0xa1
net.(*pollster).WaitFD(0xf8401a12c0, 0xf8401a21c0, 0x0, 0x0, 0x0, ...)
 go/src/pkg/net/fd_linux.go:146 +0x112
net.(*pollServer).Run(0xf8401a21c0, 0x0)
 go/src/pkg/net/fd.go:236 +0xe4
created by net.newPollServer
 go/src/pkg/net/newpollserver.go:35 +0x388

goroutine 20 [running]:
created by testing.RunTests
 go/src/pkg/testing/testing.go:349 +0x77c

goroutine 21 [runnable]:
code.google.com/p/go.crypto/ssh._func_005(0xf8400a95f0, 0xf8400a95d8, 0x0, 0x0)
 go/src/pkg/code.google.com/p/go.crypto/ssh/client_auth_test.go:174
created by code.google.com/p/go.crypto/ssh.newMockAuthServer
 go/src/pkg/code.google.com/p/go.crypto/ssh/client_auth_test.go:174 +0x16a

Which compiler are you using (5g, 6g, 8g, gccgo)?

6g

Which operating system are you using?

linux

Which version are you using?  (run 'go version')

go version weekly.2012-03-27 +75c94686a4d7

go.crypto 45667c991e8e

Please provide any additional information below.
@rsc
Copy link
Contributor

rsc commented Apr 11, 2012

Comment 1:

I believe that what happened is:
c.Read:
    check c.ok(), which checks c.fd != nil
    <pause>
c.Close:
    c.fd = nil
c.Read:
    <resume>
    call c.fd.Read
Then (*netFD).Read uses its receiver as if non-nil, and the crash happens.
The fix may be to delete the c.fd = nil line.  The fd knows whether it's closed or not.

Labels changed: added priority-later, go1.1, removed priority-triage.

Status changed to Accepted.

@davecheney
Copy link
Contributor

Comment 2:

I believe rsc is correct. I've tried to cook up a little test program which will blow up
pretty quickly on a 2 core machine with GOMAXPROCS=100 (probably overkill). The busier
the machine the more likely it is hit the race. 
fullung, can you please test this on your big host and see if you can replicate the
problem, then we can discuss the fix.

Attachments:

  1. netstress.go (486 bytes)

@alberts
Copy link
Contributor Author

alberts commented Apr 14, 2012

Comment 3:

Hello
Yes, as soon as I do something to make the machine a bit busier, this triggers
immediately.
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x70 pc=0x47cae6]
goroutine 479618 [running]:
sync/atomic.CompareAndSwapUint32(0x70, 0x100000000, 0xf8401b2a20, 0x0, 0x421d42, ...)
 go/src/pkg/sync/atomic/asm_amd64.s:12 +0xd
sync.(*Mutex).Lock(0x70, 0x0)
 go/src/pkg/sync/mutex.go:40 +0x35
net.(*netFD).Read(0x0, 0xf840068000, 0x200000002000, 0x0, 0x0, ...)
 go/src/pkg/net/fd.go:417 +0x42
net.(*TCPConn).Read(0xf84018c1b0, 0xf840068000, 0x200000002000, 0x2e3732310000000e, 0x0,
...)
 go/src/pkg/net/tcpsock_posix.go:87 +0xce
created by main.main
 netstress.go:35 +0x294
goroutine 1 [running]:
goroutine 2 [syscall]:
created by runtime.main
 go/src/pkg/runtime/proc.c:221
goroutine 3 [syscall]:
syscall.Syscall6()
 go/src/pkg/syscall/asm_linux_amd64.s:40 +0x5
syscall.EpollWait(0xf84006a010, 0xf84006a010, 0xa0000000a, 0xffffffff, 0xc, ...)
 go/src/pkg/syscall/zerrors_linux_amd64.go:1781 +0xa1
created by net.newPollServer
 go/src/pkg/net/newpollserver.go:35 +0x382
goroutine 4 [chan receive]:
net.(*pollServer).WaitRead(0xf840034800, 0xf84006b000, 0xf840036450, 0xb, 0x1, ...)
 go/src/pkg/net/fd.go:268 +0x73
net.(*netFD).accept(0xf84006b000, 0x401027, 0x0, 0xf840036420, 0xf84005a030, ...)
 go/src/pkg/net/fd.go:622 +0x20d
net.(*TCPListener).AcceptTCP(0xf84005a0a0, 0x1, 0x0, 0x0, 0x10, ...)
 go/src/pkg/net/tcpsock_posix.go:322 +0x71
net.(*TCPListener).Accept(0xf84005a0a0, 0x0, 0x0, 0x0, 0x0, ...)
 go/src/pkg/net/tcpsock_posix.go:332 +0x49
main._func_001(0xf84003a260, 0xf84003a250, 0x0, 0x0)
 netstress.go:19 +0x5a
created by main.main
 netstress.go:28 +0x189
go version weekly.2012-03-27 +1b2b113a2d66

@davecheney
Copy link
Contributor

Comment 4:

Hi,
Thanks for testing, could you please apply this CL which incorporates rsc's suggested
fix and see how you go. I've been running this fix for a few hours and netstress hasn't
faulted yet.
http://golang.org/cl/6002053

@alberts
Copy link
Contributor Author

alberts commented Apr 14, 2012

Comment 5:

Looks good. I've been running it on a heavily loaded machine and under strace as a way
to mix up the timings. Nohting has blown up yet.

@davecheney
Copy link
Contributor

Comment 6:

Thank you for the quick response, I have mailed the CL for wider review.

@davecheney
Copy link
Contributor

Comment 7:

Hello,
Could you please reapply http://golang.org/cl/6002053 and try these additional
test files.
unixstress.go needs a bit of tlc if you run it more than once.
The same issue probably effects iprawsock_posix.go, but it's pretty hard to test that,
so I'm going to have to trust it also benefits from this change.

Attachments:

  1. udpstress.go (465 bytes)
  2. unixstress.go (472 bytes)

@alberts
Copy link
Contributor Author

alberts commented Apr 14, 2012

Comment 8:

No panics. On a busy system, unixstress dies with:
dial unix /tmp/x: resource temporarily unavailable
and tcpstress dies with:
dial tcp 127.0.0.1:9999: cannot assign requested address
I'm guessing the UNIX one is due to the backlog filling up and the TCP one is due too
many sockets in the TIME_WAIT state.

@davecheney
Copy link
Contributor

Comment 9:

Owner changed to @davecheney.

Attachments:

  1. tcpstress2.go (488 bytes)

@davecheney
Copy link
Contributor

Comment 10:

This issue was closed by revision 1f14d45.

Status changed to Fixed.

@davecheney
Copy link
Contributor

Comment 11:

This issue was closed by revision 479138396750.

@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1 label Apr 14, 2015
davecheney added a commit that referenced this issue May 11, 2015
««« backport 5f24ff99b5f1
net: fix race between Close and Read

Fixes #3507.

Applied the suggested fix from rsc. If the connection
is in closing state then errClosing will bubble up to
the caller.

The fix has been applied to udp, ip and unix as well as
their code path include nil'ing c.fd on close. Func
tests are available in the linked issue that verified
the bug existed there as well.

R=rsc, fullung, alex.brainman, mikioh.mikioh
CC=golang-dev
https://golang.org/cl/6002053
»»»
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants