I have proxies with more than 10-20k outbound connections. At peak time i am having something like: 2010/02/03 21:24:49| commBind: Cannot bind socket FD 3892 family 2 to 0.0.0.0 port 0: (98) Address already in use
According: http://www.ibm.com/developerworks/linux/library/l-sockpit/
and according Linux man: " Linux will only allow port re-use with the SO_REUSEADDR option when this option was set both in the previous program that performed a bind(2) to the port and in the program that wants to re-use the port. This differs from some implementations (e.g., FreeBSD) where only the later program needs to set the SO_REUSEADDR option. Typically this difference is invisible, since, for example, a server program is designed to always set this option. "
There is sure system-wide solution also (sometimes required as addon to mentioned socket option) net.ipv4.tcp_tw_recycle net.ipv4.tcp_tw_reuse
Both of them helping. Probably it is good idea to implement this in code with ifdef to LINUX?
I guess in comm_fdopen6(int new_socket,
to move out commSetReuseAddr(new_socket); (by default it is being set i guess only to listening ports, i hope it is harmless in case of UDP).
Comment #1
Posted on Mar 20, 2010 by Happy HorseHm. Wait, so what you're saying is that Linux will benefit from having SO_REUSEADDR set on -all- incoming and outbound connections?
Comment #2
Posted on Mar 20, 2010 by Grumpy LionYes, because in Linux for outgoing connection even, socket will stay in TIME_WAIT for some time. This means port cannot be reused at this time. Maybe knob in config? Because in some cases this option, if enabled, can have as i understand negative, security issues.
I guess it is also better to mention in manuals or somewhere, if someone see this error message, he should take a look also to: net.ipv4.tcp_tw_recycle net.ipv4.tcp_tw_reuse
Comment #3
Posted on Oct 17, 2010 by Swift HippoSO_REUSEADDR is generally only useful where something bind()'s to a port for listening, and one needs to rebind quickly after a process restart or crash.
I have no evidence that this is at all useful for outbound connections. In fact, the only way I know to control that behaviour is to use the tw_recycle bits, or tweak the tcp fin/ack/syn timeouts.
Comment #4
Posted on Nov 4, 2010 by Swift HippoI can confirm that the net.ipv4.tcp_tw_recycle=1 setting is a general cure for this.
Comment #5
Posted on Nov 10, 2010 by Grumpy Liontw_recycle harmful for NAT, i had a lot of situations, when you set this knob - NAT clients will not be able to work normally with proxy (stalled connections, or tcp connection was unable to establish, i dont remember, tried that recently).
My solution was seems was in other parameter(?). net.ipv4.tcp_orphan_retries by default was 0, when i set it to 1 - my problem disappeared.
Comment #6
Posted on Nov 10, 2010 by Grumpy Rabbittcp_orphan_retries didn't work for me.
Though /prov/sys/net/ipv4/tcp_tw_reuse works, although I get a lot of:
TCP: time wait bucket table overflow
Comment #7
Posted on Nov 10, 2010 by Grumpy RabbitBy the way, I don't have that many connections, mgr:info shows this:
Squid Object Cache: Version LUSCA_HEAD-r14756 Start Time: Wed, 10 Nov 2010 19:41:29 GMT Current Time: Wed, 10 Nov 2010 21:06:06 GMT Connection information for Squid: Number of clients accessing cache: 4465 Number of HTTP requests received: 3892521 Number of ICP messages received: 0 Number of ICP messages sent: 0 Number of queued ICP replies: 0 Request failure ratio: 0.03 Average HTTP requests per minute since start: 45995.1 Average ICP messages per minute since start: 0.0 Select loop called: 11358493 times, 0.447 ms avg Cache information for Squid: Request Hit Ratios: 5min: 0.0%, 60min: 0.0% Byte Hit Ratios: 5min: -3.2%, 60min: -2.6% Request Memory Hit Ratios: 5min: 0.0%, 60min: 0.0% Request Disk Hit Ratios: 5min: 0.0%, 60min: 0.0% Storage Swap size: 0 KB Storage Mem size: 179180 KB Mean Object Size: 0.00 KB Requests given to unlinkd: 0
Is there something I can try to fix this?
"commBind: Cannot bind socket FD 17164 family 2 to 0.0.0.0 port 0: (98) Address already in use"
I'm getting these around 232 times per-second on cache.log
Comment #8
Posted on Nov 15, 2010 by Grumpy Lionhalf_closed_clients off ?
Comment #9
Posted on Nov 16, 2010 by Happy MonkeyCan agree with nuclearcat regards: net.ipv4.tcp_tw_recycle=1 being harmful. In my tproxy setup I got the exact behaviour, stalling, connect failures etc.
nuclearcat: Are you running a tproxy setup ?
Comment #10
Posted on Nov 16, 2010 by Happy MonkeyEr, never mind if you're getting commBind errors, then likely not.
Comment #11
Posted on Nov 16, 2010 by Happy Monkeywhat does mgr:curcounters show ?
Comment #12
Posted on Nov 16, 2010 by Happy MonkeyJust furthering research on the topic in the linux kernel:
From: inet_bind() in http://lxr.linux.no/linux+v2.6.36/net/ipv4/af_inet.c#L511
ERRINUSE will be returned if inet_csk_get_port() from http://lxr.linux.no/linux+v2.6.36/net/ipv4/inet_connection_sock.c#L120 was unable to find a free socket to bind() to from the bind bucket.
There are some checks in inet_csk_get_port() to see if sk->sk_reuse has been set, which is controlled with the setsockopt(SO_REUSEADDR) so this request may actually contain some value.
Can you try it with a patch against comm.c such as:
} else if (! sqinet_is_noaddr(&F->local_address)) { + commSetReuseAddr(new_socket); if (commBind(new_socket, &F->local_address) != COMM_OK) { comm_close(new_socket); return -1; } } F->local_port = sqinet_get_port(a);
Comment #13
Posted on Nov 18, 2010 by Swift HippoOK, having played with the above patch, it didn't really make a major difference when apachebenching either a transparent, or tproxied lusca.
What did help immensely was the following: net.ipv4.tcp_max_orphans = 8192 net.ipv4.tcp_orphan_retries = 1
Status: New
Labels:
Type-Defect
Priority-Medium
Version-1.0