Export to GitHub

jmxtrans - issue #6

Too many TCP connections in TIME_WAIT (tcp6)


Posted on May 27, 2011 by Massive Horse

On our graphite server I see many TCP connections on port 2003. This server (Ubuntu 10.10 64bits) host both graphite and JMXTrans.

tcp 0 0 0.0.0.0:2003 0.0.0.0:* LISTEN

normal listen port (ipv4)

tcp6 0 0 123.123.123.123:56228 123.123.123.123:2003 TIME_WAIT tcp6 0 0 123.123.123.123:56231 123.123.123.123:2003 TIME_WAIT tcp6 0 0 123.123.123.123:56296 123.123.123.123:2003 TIME_WAIT

About 219 connections in TIME_WAIT (ipv6).

I can't locate them with lsof on my system and it perturb a bit the whole system since we miss openable ports.

I forced JMXTrans to stick with IPv4 adding -Djava.net.preferIPv4Stack=true to java command line and it fixed the problem.

Something to be added in the doc or README.

Comment #1

Posted on May 27, 2011 by Grumpy Elephant

Ok, I just committed a fix. Could you please try it?

Comment #2

Posted on May 27, 2011 by Massive Horse

Tried with the fix but no more luck. After I killed jmxtrans, there is still TIME_WAIT connections (322 exactly) for about one minute.

with -Djava.net.preferIPv4Stack=true, timed wait connections are on tcp (no more tcp6). In the dialog with GraphiteWriter, did there is some sort of End Of Dialog Message to be sent ?

Note I'm using JMXTrans with -e -s 60 (continuous run, 60s interval)

If I run JMXTrans in one shot mode (ie: without -e -s 60), I get about 272 connections in TIME_WAIT

Comment #3

Posted on May 27, 2011 by Massive Horse

More on this.

When I start JMXTrans from another machine, I don't get these. So it's something related to JMXTrans and Graphite (Carbon), when hosted on the same box.

Comment #4

Posted on May 27, 2011 by Grumpy Elephant

Graphite doesn't have a dialog. You open a socket, send some data, close the socket. That is exactly what I'm doing. It seems something is off with the configuration of your box or something. If you have an option that works for you with the -D, then I'd say use it. =)

Looking on a box I'm running with Ubuntu 10.04, I see a lot of TIME_WAIT's, but it appears to be between jmxtrans and the JMX ports on remote servers.

tcp6 0 0 10.0.5.42%3510517:35740 app03-int:58363 TIME_WAIT
tcp6 0 0 10.0.5.42%3510517:42101 app11-int:60187 TIME_WAIT
tcp6 0 0 10.0.5.42%3510517:58072 app12-int:1101 TIME_WAIT
tcp6 0 0 10.0.5.42%3510517:37927 olp02-int:35339 TIME_WAIT

I'll have a look at that code again to make sure things are getting closed properly there as well.

Comment #5

Posted on May 28, 2011 by Massive Horse

May be the problem is on the carbon side. When I stopped the JMXTrans, I still see the connections via netstat so they are kept by the server side.

Also, I don't understand why I don't get such behaviour when JMXTrans is on another box.

Comment #6

Posted on May 31, 2011 by Grumpy Elephant

Not sure there is much more I can do here. If you find anything else that might help, I'd be happy to make code changes around this.

Comment #7

Posted on Jun 24, 2011 by Grumpy Elephant

Added socket pooling.

Comment #8

Posted on Jun 24, 2011 by Grumpy Elephant

I just noticed that the JmxConnections need to be pooled too... those are sitting in TIME_WAIT as well... I'll get to that soon.

Status: Fixed

Labels:
Type-Defect Priority-Medium