My favorites | Sign in
Project Home Downloads Wiki Issues Code Search
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 85229: way to disable preconnected/speculative sockets from server side
19 people starred this issue and may be notified of changes. Back to list
 
Reported by Petrausk...@gmail.com, Jun 7, 2011
Chrome Version       : 11.0.696.77, 11.0.696.71, 12.0.742.91
OS Version: 5.0, 5.1, 6.1
URLs (if applicable) :
Other browsers tested:
     Safari 5: OK
  Firefox 4.x: OK
     IE 7/8/9: OK

What steps will reproduce the problem?
1. Client often visits HTTPS site and look at the same dynamic content
2. This triggers "preconnected/speculative sockets" use
3. Server suffers from available connections starvation

What is the expected result?
Disable "preconnected/speculative sockets" from server side if server admins see the problem. Reduce idle time of "preconnected/speculative socket".

What happens instead?
Server with 50 maximum clients limit and average 80ms serving time for content item serves not more than 10 clients per second.

I am an administrator of some intranet web server. Intranet users use only IE 7/8/9. Recently we started to do some internet services for authenticated users. Server with lot of dynamically generated content runs on 16 (sixteen) years old Sun machine and we currently don't have possibility to upgrade. Internet users use a variety of browsers and we started to see degradation of service. Apache server status shows a lot of sessions in "Reading Request" state. Because Apache is configured with "Timeout" of 300 seconds these connections in "Reading Request" state are terminated after 300 seconds. I see these lines in access_log:
10.10.10.10 - - [06/Jun/2011:16:45:07 +0200] "-" 408 -
and these lines in error_log:
[Mon Jun  6 16:45:07 2011] [warn] [client 10.10.10.10] read request line timed out

I attached three screenshots of Wireshark (did not attached dump file because of security concerns):
CaptureImmediateShutdown.gif - sometimes Chrome preconects with SSL server but immediately shutdowns connection
CaptureTimeout.gif - Chrome keeps idle preconnected socket and server after 300 seconds shutdowns connection
CaptureSuccessfulUsageOfPreconnectedSocket.gif - Chrome keeps preconnected socket, user makes request after 52 seconds and Chrome uses this socket to communicate with server. I named picture with "successful usage" but as admin of this server i don't want such behavior.

Some details about Apache configuration:
Timeout 300
KeepAlive Off #Server always sends "Connection: close" header
MaxClients 50

Dynamic content are generated from database and httpd children processes do not share database connections and by increasing MaxClients value I  would exhaust database resources.

I tried to search web with keywords from subject but could not find any suggestions for web site owners/administrators how to deny this type of browser behavior from server side.
CaptureTimeout.gif
27.1 KB   View   Download
CaptureImmediateShutdown.gif
18.4 KB   View   Download
CaptureSuccessfulUsageOfPreconnectedSocket.gif
20.7 KB   View   Download
Jun 7, 2011
#1 mmenke@chromium.org
(No comment was entered for this change.)
Labels: -Area-Undefined Area-Internals Internals-Network
Jun 8, 2011
#2 rsleevi@chromium.org
+cc the networking-preconnect braintrust
Cc: willchan@chromium.org j...@chromium.org mbel...@chromium.org
Jun 8, 2011
#3 m...@whensoon.com
I can't tell from these gifs what is really going on here, or if this is even a chrome browser.  

Some observations:
a) The SSL handshake signature does not look like a recent chrome client.  Are you sure this is chrome?

b) The SSL server certainly does seem to have some problem - the time between the client hello and server hello in the first diagram is 13s.  Ouch.

c) The 3rd chart does not look to me like use of the socket after 50s of idle.  Rather, it looks like there are both HTTP and HTTPS connections to this server from the same client.  But I can't see the port # to confirm this.

Overall, I don't believe server side control of client preconnect behavior is the right answer here.  I could be convinced, but my initial thought is that system admins won't know how to configure this properly, and it will become a "voodoo configuration".

Instead, I propose more evidence be gathered.  I understand your privacy concerns, but we need to see some traces, as well as the web pages and description of user behavior causing this pattern.  I'm not at all convinced that this was preconnect causing this, or that it was even a chrome client.

Can you submit more data?
Jun 8, 2011
#4 m...@whensoon.com
One source of data would be a trace from about:net-internals.

Do the following:
   a) load the about:net-internals tab
   b) reproduce the problem
   c) Click "dump" in about:net-internals, remove any data that is private, and then send to us here.

The about:net-internals doesn't contain any web content, but it does contain URLs.  We already black out cookies, so those won't be sent.  But if you are sensitive on other headers, you'd have to block those out as well.
Jun 8, 2011
#5 j...@chromium.org
The 300 seconds is probably for keep alive, and has nothing to do with speculative preconnects, which would typically disconnect in about 10 seconds (if never used). 

The 300 seconds should be a server side parameter.  It can be set as high as 300 seconds when the server wants to improve user experience at the cost of server side resources.  My first suggestion would be to reduce it.  This will increase connect time, but will reduce load on your server (which you are asserting is the critical resource for customer performance).

This bug is asking "what can the server do when it wants to use less resources, and is willing to reduce client performance."  Perhaps it is also asking what can be done to disable preconnects, asserting that they are harming performance, but I'm not clear on the evidence that this is taking place.

We recently changed the performance (client side) to avoid "learning" about preconnects if the historical connection did not happen within 10 seconds of the parent resource.  As a result, I'd expect that unless the HTTPS is truly "needed" that we won't "learn" about it.  

If a subresource is truly needed, then (if we hesitate at all in response to a challenge for credentials), we wouldn't (wastefully) abandon the connection.   If we can't "hesitate" then perhaps we need to monitor connections, and avoid pre-connection to sites that demand client credentials.  I'm adding another developer that may be able to comment on the SSL performance when credentials are requested.

I suspect that if this is a problem, the bug should be morphed to better understanding (client side) that it is wasteful to preconnect, so as to avoid this connection thrashing.

It is possible that we should support this as a hint from server, but if we can understand the problem, it seems much better to solve it adaptively client side.  This would solve it for all sites, without requiring diagnostics.
Jun 9, 2011
#6 Petrausk...@gmail.com
@ comment #3: gifs are created from network dump file. Because all requests are using SSL to get more information I entered server private RSA key in Wireshark and decrypted SSL traffic. All three gifs are extracted from network dump file with Wireshark filter "tcp.stream==###" so in every gif there are all packets from one TCP session.
a) this is a header from successful requests "User-Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.77 Safari/534.24"
b) could be because at capture time server suffered from this problem. I'm attaching anonymized tcpdump file and you will see that:
  * Chrome keeps idle connection longer than 5 seconds at the times when SSL handshake was without delays (tcp.strem 37 or 40) but client itself initiates shutdown after 300 seconds
  * Chrome closes connection after 9.6 seconds after delayed SSL server hello (tcp.stream 22 or 26)
  * Chrome keeps idle connection longer than 300 seconds after delayed SSL server hello  (tcp.stream 10, 39, 46 or 47) and server initiates shutdown after 300 seconds
I created spreadsheet (https://spreadsheets.google.com/spreadsheet/ccc?key=0AswgSgD2-Y58dHI3TU5DeU1fV3BtZVdCT19LcDFBSXc&hl=en_US) with information from Apache access_log, tcp dump stream numbers and some comments
@ comment #4: currently we closed services for Chrome browser so I will create trace from browser when I will have chance.
@ comment #5: Keep alive is turned off in Apache configuration. Because Apache has only one parameter for both server side dynamic content generation time and client connection activity time I can not reduce "Timeout" setting (http://httpd.apache.org/docs/2.2/mod/core.html#timeout).
I don't know internals of Chrome preconnects and maybe this combination of slow SSL connection and 5 or 10 seconds idle socket timer could create situation when idle socket timer finishes in some state when idle connection is never shutdown from the client side.
anonymized.zip
308 KB   Download
Jun 28, 2011
#7 rsleevi@chromium.org
jar: Related to your suggestion https://code.google.com/p/chromium/issues/detail?id=87121#c19 , and your remark in comment 5, would/should it be possible to tune preconnects aggressiveness down based on the presence/prevalence of an explicit "Connection: Close" headers in HTTP/1.1 services?

Given that "Connection: Close" semantics indicate that connections SHOULD NOT be considered persistent and HTTP/1.1 applications that don't support persistent connections MUST include it every message, this (may) be a way for servers to reduce load. As you see in the reporters original Apache configuration, they're already setting "KeepAlive Off".

Admittedly, connections marked "Connection: Close" are perhaps the ones best suited to benefit from preconnect (since a primed connection may be waiting in the pool), but it may better match the server's expectation that the client should "go away" after this request.

Also, should  Issue 87121  be merged into this, based on willchan's findings in comment 18?

Jul 10, 2011
#8 Petrausk...@gmail.com
I was thinking that it is hard to hit this bug but after comments in  Issue 87121  I decided to try. My new server environment: Apache 2.2.14 with worker MPM, important configuration settings tuned for site usage scenarios are "Timeout 300", "KeepAlive On", "MaxKeepAliveRequests 10", "KeepAliveTimeout 5". By setting these values I hope that with fast client scenario KeepAlive feature will be used and most of the content will be downloaded in the same connection(s), with slow connection client I *want* that server would not keep idle connection longer than 5 seconds because connections pool(MaxClients setting) is limited and with a lot of clients it will be exhausted. Timeout is 5 minutes as in earlier cases. On my Win7 Home Premium laptop I installed  latest publicly available Google Chrome (12.0.742.112). In console window I started "netstat -n 5" command to monitor hanging connections. My secure site uses frames, the same situation as in earlier case and as in   Issue 87121 . Main document URL is "/dynamiccontent.main", it loads three subdocuments in frames ("/dynamiccontent_pirmas.meniu", "/dynamiccontent_pirmas.pirmas" and "/blank.html"). Frame "/dynamiccontent_pirmas.pirmas" catches window.onload event and reloads frame "/dynamiccontent_pirmas.meniu". "/dynamiccontent_pirmas.meniu" document refers to four images. To hit a bug I loaded main document URL:
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /dynamiccontent.main HTTP/1.1" 200 1000
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /dynamiccontent_pirmas.meniu HTTP/1.1" 200 2207
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /dynamiccontent_pirmas.pirmas HTTP/1.1" 200 1323
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /dynamiccontent_pirmas.meniu HTTP/1.1" 200 2207
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /images/bg.png HTTP/1.1" 304 -
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /images/logologo.png HTTP/1.1" 304 -
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /images/mna.png HTTP/1.1" 304 -
192.168.99.99 - - [10/Jul/2011:17:38:02 +0300] "GET /images/mni.png HTTP/1.1" 304 -
and reloaded (by clicking link in document) "/dynamiccontent_pirmas.pirmas" document, which automatically reloaded "/dynamiccontent_pirmas.meniu":
192.168.99.99 - - [10/Jul/2011:17:38:27 +0300] "GET /dynamiccontent_pirmas.pirmas HTTP/1.1" 200 1323
192.168.99.99 - - [10/Jul/2011:17:38:27 +0300] "GET /dynamiccontent_pirmas.meniu HTTP/1.1" 200 2207
192.168.99.99 - - [10/Jul/2011:17:38:27 +0300] "GET /images/bg.png HTTP/1.1" 304 -
192.168.99.99 - - [10/Jul/2011:17:38:27 +0300] "GET /images/logologo.png HTTP/1.1" 304 -
192.168.99.99 - - [10/Jul/2011:17:38:27 +0300] "GET /images/mna.png HTTP/1.1" 304 -
192.168.99.99 - - [10/Jul/2011:17:38:27 +0300] "GET /images/mni.png HTTP/1.1" 304 -
For the first URL Chrome creates 6 sockets, connects to server and performs SSL handshake. Two created and preconnected sockets (262 and 264) are used to get some content from server, two are using HTTP keep alive feature and gets more than one item (266 and 267) and two are preconnected but because all content already is fetched are closed after 10 seconds (263 and 265). As server admin I would hope that no idle client stays connected for more than 5 seconds (keepalivetimeout setting) - Chrome keeps it for 10 seconds. For very busy sites this already could be a problem.
Second URL scenario hits the bug. Chrome creates only one socket to get "/dynamiccontent_pirmas.pirmas" document but this document with javascript requests to reload "/dynamiccontent_pirmas.meniu" document. Chrome uses keepalive feature and fetches 4 more items from server using same socket. After this (or in parallel) Chrome creates 3 additional sockets and preconnects them. It uses socket 357 to get "/images/mna.png", but two other sockets (358 and 359) stay in preconnected state for 300 seconds until server closes connections (timeout setting in apache).
So for first page load there was one preconnected SSL socket and it was closed by Chrome after 10 seconds but for second page load Chrome got two preconnected SSL sockets and kept them for very long time. In Apache server-status page these connections are shown as 'R' - reading request (as Vikram explained in  Issue 87121  comment #15).
My suggestions here would be similar to rsleevi's in cooment #7:
 Keep some global timeout information about server:port
 a) if server supports keep alive and sets some keep alive timeout use this timeout for preconnected sockets
 b) if server sends "Connection: close" do not use precconected sockets (as server administrator are expecting no idle connections from clients)
chrome-net-internals.dump.zip
14.9 KB   Download
Jul 11, 2011
#9 willchan@chromium.org
I think there are two issues here:
(1) There is a problem with Chrome overpreconnecting. We should perhaps be more conservative. I defer to Jim here.
(2) The server cannot handle the load.

Let's work on fixing (1) so we improve the accuracy of our preconnect target. For (2), I advise the server admin to disable HTTP keep alives and lower the timeouts. If the server considers it unacceptable for clients to keep sockets open for so long, then close the sockets. The server doesn't need to wait 300s for the client to close its socket.

Preconnect has been in Chrome since Chrome 7 or so. This is the first bug report I've seen where servers had begun complaining about it. If this is a problem for server admins, I'd like to see more server admins chime in here and ask Chrome to do something.
Jul 12, 2011
#10 Petrausk...@gmail.com
The problem with web servers is that they are different. Very popular Apache server have only one parameter Timeout that defines a lot of things (http://httpd.apache.org/docs/2.2/mod/core.html#timeout):
1. When reading data from the client, the length of time to wait for a TCP packet to arrive if the read buffer is empty.
2. When writing data to the client, the length of time to wait for an acknowledgement of a packet if the send buffer is full.
3. In mod_cgi, the length of time to wait for output from a CGI script.
4. In mod_ext_filter, the length of time to wait for output from a filtering process.
Chrome exhausts server resources because #1 but #3 and #4 does not allow server admins to lower Timeout value because clients will not get results from dynamically generated pages when generation takes a long time.
HTTP keep alives has more configuration on server side and Chrome plays by rules set on server. I could accept idle client connections for 10 seconds but Chrome breaks something in this idle preconnected unused socket timeout whet it plays with SSL and frames.
It is not easy to get "public" hosting server with SSL but I found one and created some simple dynamic application there https://apex.oracle.com/pls/apex/f?p=27545:4:0 if you cannot repeat the bug in your environment. I setup Chrome to open old tabs whet it runs, I closed Chrome with my sample application in one tab and about:net-internals in other tab. And when I run chrome it opens these two tabs on start up and hits the  bug 9  times of 10. Also I logged to file one of these sessions using --log-net-log flag. Log file is attached, socket #8 hits the bug and are closed after 5 minutes.
chrome-log-net-log.zip
9.1 KB   Download
Jul 20, 2011
#11 Petrausk...@gmail.com
I think it is very hard to diagnose this issue from server side and to find that problems are because Chrome. I found only one official issue here http://www.directadmin.com/features.php?id=1138
To diagnose this issue Apache admin must have access to dump network traffic, must know how to examine traffic in Wireshark, must have access to server private key to decipher traffic and identify Chrome as cause.
Jul 20, 2011
#12 willchan@chromium.org
@#10: If you need Apache to provide more configuration, please file a bug with Apache. Commenting here isn't going to change Apache's configurability.

@#11: I think that for many cases when debugging a server, one would need access to dump network traffic. I don't think this is an extraordinary requirement.
Jul 20, 2011
#13 Petrausk...@gmail.com
@ comments in #12:
I do not hope that commenting here will help with Apache. I'm only giving an example to comment #9 that server admins not always have the possibility to lower timeouts. From suggestion in comment #9 about lowering server timeout seems that no one recognizes that there is a bug in Chrome preconnecting SSL sockets and leaving them idle for more than 10 seconds if they where never used.
About network traffic dumps: I believe that I am very good administrator and I am using network dumps in everyday administration more than 10 years but for some reason I had believe that there is no way to decrypt dumped SSL traffic even with access to server private key. Only when I had to cope with this problem I discovered this possibility. So I use common sense when I say that it is more difficult to debug this particular problem than problems without SSL. And in bigger companies where there is separate positions for web server admin, operating system admin and security admin it could be that there is no possibility for web server admin to get access to private key of server certificate and identify Chrome as reason for idle connections.
Jul 24, 2011
#14 j...@chromium.org
Given that there is a larger server cost to pre-connect SSL, we probably should be more conservative about that class of speculative pre-connection.  Perhaps we can add a negative feedback loop to diminish our (future) speculation when we detect (as Will called it) over-pre-connection.

In more general settings, we would like to better estimate the number of needed pre-connections, based on required connections, rather than based on resource count.  That transition in our learning algorithms should significantly help to address this issue.

It is also plausible that we could detect over-pre-connection on SSL links, and disconnect sooner than a 5-minute time point. We'll have already used some server resources to acquire the connection.... but perhaps we can help by reducing further resource utilization when we detect such a state.   

All the above approaches really focus on just "being better" about our speculative estimates, so that we don' make (m)any mistakes, but we require no server assistance (hints/headers) to "Get this right."

We'll need to think and look at some of these options over time.

I don't really see a way to totally control this from a server side perspective.  It is mostly too late when we talk to a server... but perhaps we can update our speculative tables based on feedback from a server requesting "less speculation."  The current speculative (learned) data structures are indexed by a referrer, and offer suggested connections to sub-resources.  The question then comes as to whether it is the sub-resource host (header?) that would like to request less speculation, or the referrer host (header).   It probably wouldn't be too hard to have the referrer host header state "don't speculate about my subresources," or "don't speculate about a specific sub-resource," or maybe "don't speculate about SSL sub-resources."  More thought needs to go into this selection.

I'll assign this bug to myself, but I'll lower the priority to P3 since I'm not clear on what a good resolution would be.
Status: Assigned
Owner: j...@chromium.org
Labels: -Pri-2 Pri-3
Sep 15, 2011
#15 maels...@hotmail.com
We have this problem on the Apache used for our SSO (Single Sign On). Chrome users consistently create unused connexions (state R "Reading" when viewed with Apache mod_status) that stay actives until timeout is reached. Ex. for one user (1st GET return login page, next are access to applications through SSO + redirect) :

[14/Sep/2011:11:43:09 +0200] "GET /cas/login?service=.. HTTP/1.1" 200 2109
[14/Sep/2011:11:44:00 +0200] "POST /cas/login?service=... HTTP/1.1" 302 215
[14/Sep/2011:11:48:09 +0200] "-" 408 - "-" "-"
[14/Sep/2011:11:48:09 +0200] "-" 408 - "-" "-"
[14/Sep/2011:11:48:09 +0200] "-" 408 - "-" "-"
[14/Sep/2011:11:50:35 +0200] "GET /cas/login?service=... HTTP/1.1" 302 257
[14/Sep/2011:11:55:35 +0200] "-" 408 - "-" "-"
[14/Sep/2011:11:55:35 +0200] "-" 408 - "-" "-"
[14/Sep/2011:11:55:35 +0200] "-" 408 - "-" "-"
[14/Sep/2011:12:02:47 +0200] "GET /cas/login?service=... HTTP/1.1" 302 261
[14/Sep/2011:12:07:47 +0200] "-" 408 - "-" "-"
[14/Sep/2011:12:07:47 +0200] "-" 408 - "-" "-"

We've decreased Apache TimeOut to 60s to avoid exhausting Apache MaxClients too quickly in case of load, but this is annoying nonetheless..
Apr 19, 2012
#16 sterling...@gmail.com
I have been hitting this issue since speculative pre-connections were added to Chrome, and I suspect many others that haven't found this bug report have as well. This is becoming a serious issue for my particular environment, now that more users are running Chrome with pre-connections enabled. I'm using Apache 1.3 and cannot upgrade or migrate since my backend depends on Apache 1.3.

The symptom is that Chrome makes pre-connections to my server, holds them open (I haven't explicitly timed it, but I do see connections held open for longer than 10 seconds), and thus uses up all my available server slots, of which I have only 10. The server is quite literally doing nothing but waiting for Chrome to close or use the pre-connected sockets, which essentially causes a DoS attack, since most clients (those without pre-connected sockets) time out while trying to connect.

Again, I must state, I noticed this initially because I caused a DoS attack against my own site after upgrading Chrome last May (2011). I had hoped something would be done before actual users started using Chrome with pre-connect enabled. Please understand this is most assuredly causing issues for my environment now that it is in widespread use, and there is no easy way for me to workaround Chrome's behaviour. The only real option I have is to stand up another more capable web server in front of my Apache 1.3 servers, to proxy requests. Certainly this is a viable workaround, however it is a fair amount of work for me in terms of configuration, testing and deployment.

May I suggest that Chrome permanently disable pre-connecting to sites that timeout? Chrome now has a (shared?) database of servers that are under load to prevent Chrome users from DDoSing sites with Chrome. Can this be leveraged somehow to also disable preconnecting to sites with environments such as mine, where the maximum number of concurrent requests is quite low (i.e. 10) and cannot easily be raised?

I would greatly appreciate some more thought be put into solving this problem.

Thanks,
Dan Sterling


Apr 20, 2012
#17 Petrausk...@gmail.com
@#14: Should we repost bug with different wording? Now I see that my expectations for Chrome reaction to some very special headers would be a point off misuse for server admins. But rsleevi's suggestions in comment #7 was very relevant: in case of SSL server returning header "Connection: close" client must not leave any idle connections and Chrome must terminate all current idle connections to that server's port. And to be perfect Chrome should remember this setting for server:port combination until it will not get "Connection: keep-alive" header from the server.
Apr 20, 2012
#18 robnag...@gmail.com
This is a "me too" response.  I do have to ask what the point of pre-connections is.  Seems like an over-optimization.  We've seen similar problems with aggressively configured wget's.

Are there any proxy solutions out there that limit connections based on dynamic behavior?  I haven't found any good Apache modules that do "the right thing".

Thanks,
Rob

Apr 20, 2012
#19 willchan@chromium.org
@16: Is your issue strictly due to Apache 1.3? I'm surprised that Apache 1.3 can only handle 10 connections, that sounds wrong to me. In any case, if it's specific to Apache 1.3, then I think we have to simply ask you to upgrade your environment. Apache 1.3 was end of life'd nearly 2 years ago and Apache 2 has been out for almost a decade.

@17: Thanks for bringing rsleevi's suggestion back up. I think it is possibly reasonable. I guess it depends on how often sites use Connection: close in a reasonable manner. If lots of important web sites use it incorrectly, then I would consider it reasonable for Chromium to continue to preconnect, despite Connection: close. But I guess it makes sense to err on the side of being conservative here since Connection: close is a reasonable signal that the server is resource constrained. jar@, WDYT?

@18: Preconnect makes the web significantly faster. See http://www.belshe.com/2011/02/10/the-era-of-browser-preconnect/ for details.
Cc: -mbel...@chromium.org rtenneti@chromium.org
Labels: -OS-Windows -Internals-Network OS-All Internals-Network-HTTP
Apr 20, 2012
#20 sterling...@gmail.com
@19: It's not that Apache 1.3 can only handle 10 concurrent connections, it's that my backend, which incidentally runs on and cannot easily be separated from Apache 1.3, can only handle 10 concurrent connections without causing the host to run out of memory.

Apache provides two functions in my environment. It is both the container for my backend app, and also, since this is the easiest configuration to set up, the front-end web server. No matter what container I put my backend in, it will only be able to handle 10 concurrent connections unless it is completely redesigned. However, I could (and, indeed, should) separate the front and backend; I could stand up a separate front-end web server that accepted connections from the internet and proxied requests to my backend. Since the front-end would not be preconnecting to my backend, I would not starve backend connections, and since the front-end's resource footprint would be small, it could easily handle a large number of concurrent connections. This is the viable workaround I was referring to in @16.

Put another way, the issue is not that I should upgrade away from Apache 1.3, the issue is that I should separate my front and backend systems. However, this would require a fairly significant amount of work for me. If Chrome would recognize my environment was not able to handle pre-connections, I could put off this work in favour of more urgent tasks for a bit longer.

Additionally, I can imagine situations where it may not be possible to raise the maximum number of concurrent connections. I would hope Chrome could detect when it's communicating with a server that has limited connection slots, and configure itself so it doesn't perform what amounts to a DoS against that server.

Thank you,
Dan Sterling

Apr 20, 2012
#21 willchan@chromium.org
@20: Thanks for the explanation. That makes much more sense. Note that the way the network predictor system works is it analyzes how many concurrent URLs we are loading for www.foo.com, and uses that to feed back into how many connections it thinks we need to load the page, and will preconnect them when revisiting the site. Therefore, *ROUGHLY* speaking, we will never learn to preconnect X connections unless we have previously seen ourselves using X connections.

Also, if the server is overloaded, why does it not return an error code? Chromium will understand 5XX error codes and will back off before retrying. For httpbis work on this matter, check out http://trac.tools.ietf.org/wg/httpbis/trac/ticket/255.

Chromium probably ought to try to do better to identify cases where servers are poorly designed like this and thus cannot handle much concurrency at all. I'm skeptical we will prioritize this very highly since this is really the 1% or the .1% or whatever of servers.

In summary, Chromium should definitely do a better job detecting when we've overpreconnected and thus have wasted idle connections. But in terms of pure server overload, it's probably best if the server indicates its overload situation and instructs the client to back off. And yes, there's the open question of whether or not we should use Connection: close as a signal not to preconnect. I'm tentatively in favor of adopting rsleevi's suggestion here, although I'd like to see data on this impact.
Apr 20, 2012
#22 sterling...@gmail.com
@21: I do appreciate that Chrome tries to only preconnect when it's likely the connection will be used. In my case, users may generate a fairly large number of connections for a period, and then leave a tab idle for a long period of time. This causes preconnect to open 2, 4 or possibly more connections that then sit idle until they hit the server-side timeout.

You're right that it would be best if Apache served a 503 when MaxClients is hit. Instead, new connections simply timeout. Do you know if Chrome backs off when it sees a socket connection timeout, as it would if it saw a 503?

Also, the clients that are causing issues may not see a timeout or 503, since they already have connections they can use, and can always immediately reconnect after using a preconnected socket, since the act of using it frees a slot.

As for using Connection: Close as a signal, perhaps it could be used to implement less agressive preconnects, rather than completely disabling them. For example, preconnected sockets could be limited to 1 or 2, and/or have a short timeout, say 10 seconds? I appreciate the balance between optimizing for speed and respecting the server's Connection header; that is, I understand the desire to ignore the fact that "Close" may mean "I don't want idle sockets" -- so perhaps a balance could be struck. Certainly what I'm suggesting would be harder to implement, though.

One final thought -- the number of servers on the "open web" that are impacted by idle sockets may be small, but there may be more servers on the closed or semi-closed web (e.g. intranet servers, or servers with a restrictive robots.txt) that are more impacted by this. Given that, it may not be easy to collect data regarding Chrome's interactions with those servers.
Apr 20, 2012
#23 rsleevi@chromium.org
willchan,jar: Just so it's not lost in the discussion, I think comment #13 raises a real point about there being a probable bug/design issue with regards to the preconnect logic. jar indicated in comment #5 that preconnected-but-idle sockets should disconnect within ~10 seconds (typically).

It sounds like the act of the SSL handshake is throwing off (for the TCP socket pool) the IsConnectedAndIdle() calculation, and the "10 seconds for preconnected sockets" logic isn't being applied to the SSL pool. This is what I was trying to capture in comment https://code.google.com/p/chromium/issues/detail?id=85229#c7 , and which jar hinted at in https://code.google.com/p/chromium/issues/detail?id=87121#c19
Apr 20, 2012
#24 mmenke@chromium.org
There's also been a change, to improve battery life on mobile, where we only run the 10 second timer on Windows (It has to be run on windows because we don't read data on "idle" sockets, and keeping unread data around too long on XP can result in BSODS).

On other platforms, we now only check for idle sockets that need to be closed when something requests a new socket, which could have implications for servers with low connection limits.
Apr 20, 2012
#25 willchan@chromium.org
@22: First, let me say I appreciate the rational discourse here. You seem very reasonable and make very valid points.

To your first point about temporary spikes, I agree that that is bad. I characterize that as us learning the appropriate number of connections incorrectly. We should fix that.

As for connection timeouts, no, we do not retry. Now that you mention it, connection timeouts are a good signal and we should feed that back into the network predictor subsystem so it learns to connect fewer.

As for the Connection: Close comment, I should note that we do timeout preconnected sockets that are idle soon. They should be closed within 10-20 seconds (we set the timeout at 10s for unused idle sockets and have a 10s periodic timer to reap timed out sockets).

As to the open web vs intranets, I agree about that. It may be the case that, for intranet servers, we should simply disable preconnect. Preconnect's primary use is in mitigating the initial RTTs in connection establishment. In intranets, where RTTs are low, perhaps it's best to simply disable preconnect. Note my comment applies to intranets, not the public servers with restrictive robots.txt.

Just to be clear, we recognize we're making tradeoffs here. Clearly preconnect is suboptimal for some fraction of our users. We should fix any obvious bugs, as have been pointed out by yourself and others on this thread. But any global changes where there aren't good signals to identify resource-constrained servers must be evaluated against the significant overall benefit for the vast majority of the open web. As I noted, the benefits of preconnect are quite substantial, so we're very unlikely to adopt solutions that would dramatically reduce its effectiveness. But we definitely do want to fix any bugs and will happily take suggestions for good signals to clamp down or outright disable preconnect for certain servers.
Apr 20, 2012
#26 willchan@chromium.org
rsleevi/mmenke: Thanks for making these points. I think we're at the stage now where the thread is getting long and we've identified several areas that clearly need fixing. We should file separate bugs for the individual issues and mark them as blocking this bug.

Ryan, can you file a bug for the IsConnectedAndIdle() issue for sockets with SSL handshakes fooling our "previously used" check?

Later on today, I'll go through the bug and note other issues and file bugs for them unless someone else beats me to them.
Apr 20, 2012
#27 robnag...@gmail.com
#21: Also, if the server is overloaded, why does it not return an error code? 

Apache can't do much about this.  Once all the server slots are used up, game over.  If you have 10 slots, because you are running fat servers on a relatively small application, and say, three Chrome users click at the same time, all the slots go away instantly.  At that point, you can return no resources, but you have three users locking up your 10 servers. 

Like I mentioned, I see Chrome here "behaving badly" and if I could do something with BrowserMatch, I would, but there's no way to distinguish between a pre-connect request and a regular one unless, say, Chrome put something in an X-* header for pre-connects.

#22: servers with a restrictive robots.txt

What could we put in our robots.txt to stop pre-connections?  

I think the "intranet" point is a bit of a red herring.  It's the "large application" problem in small environments.  We size our system based on what (up until now) was a normal mechanism.  Browsers opened connections when they wanted something.  With pre-connections, you have to size your system for 4x the number of connections, and you can assume that most of the time only a small percentage of the connections are doing something.  That's, I'm pretty sure, how Google web servers work.  However, afaik, I don't think Apache supports this concept out of the box. I would love it to store and forward in its proxy, but it doesn't do that.  Rather, it immediately opens a back-end connection as soon as a front-end connection opens.  This probably could be avoided by having a better proxy, but afaik, there isn't an OSS proxy that supports store-and-forward(?).

Thanks,
Rob


Jun 20, 2012
#28 Petrausk...@gmail.com
Poke after two months. I still can't find more bugs referring to this one so I think no one beat willchan two months ago. Now when my services run on new hardware and software, and services are again opened for Chrome browser, I can see what harm Chrome preconnection could do for poorly designed systems. I attached Apache server status screenshot and can guarantee that all Apache processes/threads in "Reading Request" state are waiting on Chrome preconnected connections. I see no harm here for our system cause Apache application module uses shared connection pool to back-end resources, but in previous version of same module every Apache child had own connection and situation like this would lead to resource exhaustion.
evidence.png
30.8 KB   View   Download
Feb 12, 2013
#29 sterling...@gmail.com
I put together a quick and dirty perl script to monitor apache 1.3 using the server-status URL, and kill httpd processes that are serving preconnections when a threshold is reached.

This works around the issue for me for now. Here's the script:

https://gist.github.com/eqhmcow/4774549
Mar 10, 2013
#30 bugdro...@chromium.org
(No comment was entered for this change.)
Labels: -Area-Internals -Internals-Network-HTTP Cr-Internals-Network-HTTP Cr-Internals
Jul 1, 2013
#31 nebw...@gmail.com
We have an embedded webserver running in a microblaze on a vertex 6 - with LwIP and the limited resources - these speculative preconnects use too many resources in our case and I would vote to add a parameter to start chrome so it doesn't do this or http headers or something to specify not to use it or to set a maximum number of speculative preconnect sockets.

I will continue to tweak my c to try and provide enough resources for chrome - but in the end - it may not be possible.
Jul 2, 2013
#32 cugi...@gmail.com
We have the same problem in our embedded products. We have modified the source code of the webserver for send a "408: Request Timeout" and for close the Socket. But it doesn't work!
It seem that the browser ignores both the status code and the TCP FIN. The only thing to do seem to be an TCP RESET... but it's not so fine.

Please consider to disconnect the socket after an 408 error code and adjust the type of connections after this answer.

Could this be the solution?

Jul 2, 2013
#33 j...@chromium.org
re: comment 32: sending 408 "response."

Until the browser sends a request, it won't listen for (try to read) a response.  As a result, jamming a 408 into the socket before getting a request won't induce a teardown. In fact, it will leave some buffered data in the remote (client) end of the socket, waiting to be read.

More typical is to teardown the socket if you don't get a request in 10 seconds.

re: comment 31: Putting a limit on the maximum number of speculative preconnect sockets.

The speculative preconnects are already bounded (restricted) by the rule to never have more than 6 connections to a single host.  That may be higher than you desire, but there is a clear limit.  Finding a way (header proclamation? other?) to further constrain this limit, especially for preconnects, seems reasonable.
Status: Available
Owner: ---
Oct 14, 2013
#34 ncohafm...@gmail.com

I never had this problem on my servers with Centos 5 (kernel 2.6.18) and Apache.
As soon as i moved up to Centos 6 (kernel 2.6.32), tons of held reading requests and 408s all over the place.
My avg. concurrent requests per server have have gone up 500% and throw off all my monitors and scalability.

To expect system admins to adjust timeouts for this is completely unreasonable. Who knows how many permutations of dynamic content serving are out there and the time required to serve such content.

Personally, I believe this is combination chrome/apache/kernel problem and all parties need to get involved to either fix it or do away with it.

Speaking frankly, i think speculative preconnects are an abomination. Hogging web server resources in cause you MIGHT do something? That's just plain wrong any way you slice it. This is 1 step away from a DoS attack and I can't believe there hasn't been an uproar over it.


May 22, 2014
#35 williams...@gmail.com
The issue we are seeing is that it seems like pre-connect has recently gotten more aggressive. Due to our 15 second http request timeout (a connection must send a request in less than 15 seconds) our customers are seeing more 408s. Ideally we could find a transparent way to avoid the customer noticing these 408s. Frankly, some sort of reasonable disconnect/reconnect on 408 (perhaps on focus) scheme would be fine for us.
May 23, 2014
#36 willchan@chromium.org
Sending a HTTP response when there's no HTTP request sounds buggy. Why don't you just close the connection?
May 23, 2014
#37 sterling...@gmail.com
Browsers will show a blank page or an error page if the connection is closed while the http request is in flight, so this still has the potential for a poor user experience
May 23, 2014
#38 mmenke@chromium.org
There's no way for a browser to be sure if a stale socket was timed out by the server or not if a connection is just closed.  I'd assume browsers retry the request in that case - Chrome certainly does.
May 23, 2014
#39 sterling...@gmail.com
Chrome's tendency to open many connections without closing them requires workarounds that affect all browsers, so saying chrome still works in this case partially misses the point
May 23, 2014
#40 mmenke@chromium.org
Actually, I said other browsers probably do this, too.
May 23, 2014
#41 sterling...@gmail.com
So chrome causes an issue that other browsers probably handle OK; at the very least, citation needed? At worst, and to be clear chrome can cause this to happen, chrome eats up all the available server slots and no other browsers can even connect.

May 23, 2014
#42 cugi...@gmail.com
If the server sends a FIN to a browser, this one can work out that any other request cannot be answered.
This can be used to time out a socket, server side.
May 23, 2014
#43 mmenke@chromium.org
Per my comment, there's no way for a browser to know if a stale socket that was closed was timed out by the server, or was closed because the server was unhappy for some other reason, such as not liking the original HTTP request.

Other browsers work, therefore, presumably they retry in this case (Since some servers do time out sockets aggressively), or don't use stale sockets (Or don't use stale sockets that have never been used before).  So there's most likely no problem with timing out unused sockets (Or used sockets).
May 23, 2014
#44 willchan@chromium.org
Upgrade your server to SPDY / HTTP2 and then you can send a GOAWAY frame to gracefully shut down the connection and notify the peer the last accepted request (which eliminates this race). It will also lead browsers to only open a single connection, thereby solving this multiple connection issue.
May 23, 2014
#45 sterling...@gmail.com
Right, the problem is that this issue only affects old (or simple, such as embedded) servers. Saying that the fix for this bug is that old servers should stop existing definitely misses the point
May 23, 2014
#46 cugi...@gmail.com
Yes, good idea... But unfortunately our embedded server with 256kB of ram and 512 kB for code space (yes, kilobyte) can't manage nothing else than HTTP 1.0.
We can have only 8/10 sockets. They are ok for 10/15 simultaneous users from other browsers but if one user uses chrome...
May 23, 2014
#47 tribuslu...@gmail.com
Since a 408 also contains a "Connection: close" header, wouldn't it be enough to check for 408 responses on the preconnect sockets and close them, as the RFC requires? I think this would already fix most of those issues, without hitting the race condition.

Anyway, this is what Mozilla's did 9 years ago about this:
https://bugzilla.mozilla.org/show_bug.cgi?id=248827

May 23, 2014
#48 mmenke@chromium.org
408's kind of weird - I don't recall anything in the HTTP spec about reading a response before request headers were even issued.

We don't actually have anything sitting around to try and read from the stream before the request was issued, so this would be surprisingly major architectural change.  And as already noted, in the connection-keep-alive case, servers generally just close the socket (Possibly at the same time a new request was issued), and browsers have to be able to handle that, anyways.

And then what do we do if we get some other response code?  Just wait around for another request to come in, and then randomly assign the received response to that request?  Just close the socket?

That having been said, retrying on 408 may be reasonable, though there are still a whole slew of questions if we did that (What do network extensions see?  What does devtools see?  What does NavigationTiming mean in this case, etc)
May 24, 2014
#49 sterling...@gmail.com
Let's not lose sight of the main issue here:

The problem: chrome opens many speculative connections, overwhelming a server and causing a DoS. Chrome does not close these sockets for many 10s of seconds or more.

Possible solutions:
* user can tell chrome not to overwhelm a given server (somehow? probably difficult or unintuitive)
* server can tell chrome not to preconnect (via a header? or by closing sockets instead of letting pre-connect sockets stick around forever?) and chrome can act on this by no longer preconnecting to that server
* chrome can otherwise learn not to make speculative connections to servers (somehow? probably difficult)

Right now I'm continuing to work-around this issue by using a script to hit apache 1.3's server-status page, parsing it, and using signals to kill processes that are being held open by preconnect. If I don't do this, users using chrome quickly use all available server slots, causing a DoS for all users except those chrome users with open connections.

This, as recently noted in this bug's comments, is also a problem for embedded servers with a limited ability to accept multiple connections.

May 24, 2014
#50 mmenke@chromium.org
Hrm...On Windows, we close never used sockets after about 10 seconds, using a timer.  On other platforms, we close never used sockets after 10 seconds, but only when a socket is being requested (This is to save battery life, primarily on mobile, by only ramping up the radio when needed).

Used sockets have a much longer timeout.  I assuming we're talking about non-Windows clients here?
May 24, 2014
#51 mmenke@chromium.org
Oh, and Windows has the different behavior because XP has a crash issue when sockets are kept around with unread data, and we don't try to read from sockets while they're not being used by a request.  If it weren't for the XP crash issue, we'd use the same behavior everywhere.
May 26, 2014
#52 smad...@stackoverflow.com
The poor handling of 408 responses when the server's closing a never-used speculative connection seems tangential to the main thrust of this issue (server impacts and management of speculative connections), so I've opened #377581 for that.
May 27, 2014
#53 willchan@chromium.org
Thanks for forking the 408 issue to a separate thread. I've commented there and acknowledged the lack of 408 support in Chromium. That's related to this issue here, but is distinct, so please keep the 408 discussion over there.
Sign in to add a comment

Powered by Google Project Hosting