My favorites | Sign in
Project Home Downloads Wiki Issues Code Search
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 46978: Chrome on Mac OS X randomly stops loading pages.
9 people starred this issue and may be notified of changes. Back to list
Status:  WontFix
Owner:  rdsmith@chromium.org
Closed:  Sep 2010
Cc:  ism...@chromium.org, eroman@chromium.org

Restricted
  • Only users with Commit permission may comment.


Sign in to add a comment
 
Reported by paracel...@gmail.com, Jun 19, 2010
Chrome Version       : Currently 6.0.437.3, but the issue has been present throughout many previous versions, probably from the first Mac version.

After some apparently random time of using Chrome on Mac OS X 10.6, it will suddenly stop loading all new pages. The error shown is the following:

This webpage is not available.

The webpage at http://www.google.com/ might be temporarily down or it may have moved permanently to a new web address.

  More information on this error
Below is the original error message

Error 2 (net::ERR_FAILED): Unknown error.

When this happens, no new pages can be loaded until the browser is quit, and restarted. Sometimes, the error may re-occur immediately on reload, too (this just happened before posting this bug).

It seems some pages can still be loaded, however, and it seems to be pages recently browsed. It could be that some information is cached and keeps working (for a while). Perhaps DNS results?

This issue seems to have been present since the very early Mac versions, but may or may not have gotten both better and worse over time since. Since it occurs so randomly, it is hard to say. It does occur in the current version, 6.0.437.3, though.

I have been unable to figure out what triggers this malfunction. The time it takes until it happens can vary between minutes and days. In case it is a DNS issue (which is the best explanation I have so far), it might be relevant that I am using OpenDNS servers.
Jun 19, 2010
#1 paracel...@gmail.com
This has now happened several times while trying to submit this issue, and reading other issues.

Some further info: Trying to reload a page after this has happened might either result in the error above ("This webpage is not available"), or it might load the HTML page itself but not all of its other resources (leading, for instance, to an unstyled page here on the issue tracker, but with the Chromium logo in the upper corner intact).
Jun 19, 2010
#2 paracel...@gmail.com
This has actually been happing a lot more today than it has in the past. I am not sure if that is because of the latest Chrome update that just got installed, or some other local change.
Jun 19, 2010
#3 paracel...@gmail.com
More information: When this happens, I can still access the webserver running at localhost both as "localhost" and "127.0.0.1". I can not access "www.google.com" nor "72.14.221.104" which is one Google IP. After I restart, I can access Google both through hostname and IP.

Should I post the contents of about:net-internals? I saw that mentioned elsewhere.
Jun 20, 2010
#4 paracel...@gmail.com
I downgraded to beta, 5.0.375.70, and it seems this issue has now gone back to happening only rarely (it hasn't happened yet while testing) rather than all the time (as in 6.0.437.3). So it does seem some recent change has made this much worse.
Jun 20, 2010
#5 paracel...@gmail.com
Upgraded back to 6.0.437.3 to try the test suite in about:net-internals. The issue hit after less than a minute of running the browser, and the test suite reports FAIL on every single test for any external URL. If I run the tests against localhost, all of them PASS.
Jun 21, 2010
#6 paracel...@gmail.com
Managed to reproduce this error in 5.0.375.70 beta. It took about a day of browsing rather than the couple of minutes it takes in 6.0.437.3, but it did happen.

However, the error reported is slightly different in 5.0.375.70:

Error 104 (net::ERR_CONNECTION_FAILED): The attempt to connect to the server failed.

As in 6.0.437.3, this happens for all new sites opened. Some sites that were already open continue to work, for a while.
Jun 21, 2010
#7 stuartmorgan@chromium.org
(No comment was entered for this change.)
Labels: -Area-Undefined Area-Internals Internals-Network OS-Mac
Jul 12, 2010
#8 deep...@chromium.org
We are unable to repro this. Can you please attach the sample of Chrome when it stops loading pages?
Labels: FeedbackRequested
Jul 12, 2010
#9 paracel...@gmail.com
Sure, but what exact kind of sample?
Jul 20, 2010
#10 deep...@chromium.org
Can you get the sample the chrome browser process when its stuck and the chrome renderer process when you are trying to reload it?


Jul 20, 2010
#11 paracel...@gmail.com
Do you mean a sample using Activity Monitor?

I can try, but I don't think that will help, since it doesn't actually get stuck. It just fails to open network connections, but otherwise works just as normal.

I'd expect a stack sample would just show it waiting for input as usual.
Aug 5, 2010
#12 bugdroid1@gmail.com
Verified label updated by AutoAllocator, contact AmolK or KrisR for details
Labels: Verifier-Ismail
Aug 9, 2010
#13 ism...@chromium.org
Not able to repro on the latest builds.
 Can you please try the same on the latest and reopen If you have the problem again - please attach the system(console) logs or stack sample to diagnose the issue

Do you mean a sample using Activity Monitor?
Yes







Aug 9, 2010
#14 paracel...@gmail.com
Confirmed in 6.0.472.25. It happened again after a few minutes of use. (The 5.0 series I've been running in the meantime seems to stay working for at least a day, or several days.)

I have attached a sample taken of the Google Chrome Renderer process while trying to reload the page several times. If there are some more specific things I should do while sampling, please tell me.

The system log contains a lot of these messages, repeated every now and then. They did not appear while running the 5.0 branch. Not sure if they are relevant:

2010-08-10 01:10:07	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94603]: Class CrApplication is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:07	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94603]: Class NoOp is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:07	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94603]: Class TaskOperation is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:07	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94603]: Class WorkerPoolObjC is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:10	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94604]: Class CrApplication is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:10	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94604]: Class NoOp is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:10	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94604]: Class TaskOperation is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:10	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94604]: Class WorkerPoolObjC is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:20	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94605]: Class CrApplication is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:20	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94605]: Class NoOp is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:20	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94605]: Class TaskOperation is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:20	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94605]: Class WorkerPoolObjC is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:23	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94606]: Class CrApplication is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:23	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94606]: Class NoOp is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:23	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94606]: Class TaskOperation is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
2010-08-10 01:10:23	[0x0-0x1c15c14].com.google.Chrome[94372]	objc[94606]: Class WorkerPoolObjC is implemented in both /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Helper.app/Contents/MacOS/../../../Google Chrome Framework.framework/Google Chrome Framework and /Applications/Google Chrome.app/Contents/Versions/6.0.472.25/Google Chrome Framework.framework/Internet Plug-Ins/PDF.plugin/Contents/MacOS/PDF. One of the two will be used. Which one is undefined.
Sample of Google Chrome Renderer.txt
48.5 KB   View   Download
Aug 9, 2010
#15 paracel...@gmail.com
Since the log entries referred to the PDF plugin, I tried to remove it to see what would happen. This caused the messages to stop, but the issue still remains and Chrome stopped loading pages after some minutes of running.

So it seems the log messages are unrelated.
Aug 9, 2010
#16 ism...@chromium.org
Untraiging, as we have the sample and logs to diagnose.

Status: Untriaged
Cc: ism...@chromium.org
Aug 11, 2010
#17 mark@chromium.org
1. Which sites do you experience this problem with? You use google.com in your example. Is the problem isolated to google.com, or other Google sites?

2. When this happens, visit the about:net-internals URL, switch to the Requests tab, and look at some of the stalled URL_REQUESTs. You can copy the logs from the right pane and paste them here to help us get a better idea of where things are hanging. Specifically, we should see if SPDY shows up in any of those logs.
Aug 11, 2010
#18 paracel...@gmail.com
1. Like I said, all sites, except recently visited ones. These sometimes keep working for a while after the bug hits.

2. Here is one example:

(P) t=1281561612355  +REQUEST_ALIVE                [dt=6]
(P) t=1281561612355     +URL_REQUEST_START_JOB     [dt=0]
                         --> load_flags = 65664 (ENABLE_LOAD_TIMING | VERIFY_EV_CERT)
                         --> method = "GET"      
                         --> priority = 0        
                         --> url = "http://google.com/"
(P) t=1281561612355     -URL_REQUEST_START_JOB     
(P) t=1281561612355     +URL_REQUEST_START_JOB     [dt=5]
                         --> load_flags = 65664 (ENABLE_LOAD_TIMING | VERIFY_EV_CERT)
                         --> method = "GET"      
                         --> priority = 0        
                         --> url = "http://google.com/"
(P) t=1281561612355         HTTP_CACHE_WAITING     [dt=0]
(P) t=1281561612355         HTTP_CACHE_OPEN_ENTRY  [dt=0]
(P) t=1281561612355         HTTP_CACHE_WAITING     [dt=0]
(P) t=1281561612355         HTTP_CACHE_READ_INFO   [dt=2]
(P) t=1281561612357        +PROXY_SERVICE          [dt=0]
(P) t=1281561612357            PROXY_SERVICE_RESOLVED_PROXY_LIST  
                               --> pac_string = "DIRECT"
(P) t=1281561612357        -PROXY_SERVICE          
(P) t=1281561612357         TCP_CLIENT_SOCKET_POOL_REQUESTED_SOCKET  
                            --> host_and_port = "google.com [port 80]"
(P) t=1281561612357        +SOCKET_POOL            [dt=3]
(P) t=1281561612360            SOCKET_POOL_BOUND_TO_CONNECT_JOB  
                               --> source_dependency = {"id":1077,"type":4}
(P) t=1281561612360        -SOCKET_POOL            
(P) t=1281561612360     -URL_REQUEST_START_JOB     
                         --> net_error = -2 (FAILED)
(P) t=1281561612361  -REQUEST_ALIVE     
Aug 11, 2010
#19 paracel...@gmail.com
Digging through the logs, the one thing that stands out is that all failed connections have a "SOCKET_POOL_BOUND_TO_CONNECT_JOB" under "SOCKET_POOL". None of the earlier connections that worked have that, they have "SOCKET_POOL_REUSED_AN_EXISTING_SOCKET"/"SOCKET_POOL_BOUND_TO_SOCKET" or no "SOCKET_POOL" at all.

Also, just before the first connection that fails, there is a connection that has a "SOCKET_POOL" with nothing but "CANCELLED" in it. This is a request for "http://clients1.google.com/complete/search?client=chrome&hl=en-US&q=g", which I guess might be me starting to type out "google" in the URL bar?

I will restart and see if the same pattern repeats on the next failure.
Aug 11, 2010
#20 paracel...@gmail.com
I can't reproduced the "CANCELLED" part, so disregard that. However, I can confirm that before the bug hits, some connections successfully use "SOCKET_POOL_BOUND_TO_CONNECT_JOB"/"SOCKET_POOL_BOUND_TO_SOCKET", and others use "SOCKET_POOL_REUSED_AN_EXISTING_SOCKET"/"SOCKET_POOL_BOUND_TO_SOCKET". After the bug hits, every connection that tries "SOCKET_POOL_BOUND_TO_CONNECT_JOB" stops without doing a "SOCKET_POOL_BOUND_TO_SOCKET", while "SOCKET_POOL_REUSED_AN_EXISTING_SOCKET" seems to keep working.

I am attaching a copy of a series of requests that seem to be happening just as the bug is triggered. It begins with a series of successful requests to google URLs, and then at t=1281563775386 it requests http://clients1.google.com/generate_204 which fails. After this, there are some CONNECT_JOB and SOCKET entries, and then it does a successful request using SOCKET_POOL_REUSED_AN_EXISTING_SOCKET, and another failed request using SOCKET_POOL_BOUND_TO_CONNECT_JOB. After this, all other requests fail.

requestlog.txt
16.1 KB   View   Download
Aug 13, 2010
#21 paracel...@gmail.com
The issue happened again in 5.0.* (after about 38 hours of running, judging from the time of my last post, which was when I started it again), and I tried seeing what about:net-internals had to say on that version. Here are the two latest entries, one that succeeds and one that fails. (The succeeding one happened after the failed one, and like the ones in 6.0.*, it seems it reuses a socket, which works, but the one that uses "SOCKET_POOL_CONNECT_JOB" doesn't work):

1. https://mail.google.com/mail/channel/bind?VER=8&at=#######################
t=2979118202: +REQUEST_ALIVE                         [dt=222]
t=2979118203:   +URL_REQUEST_START                   [dt=221]
                   url: https://mail.google.com/mail/channel/bind?VER=8&at=#######################
t=2979118203:      PROXY_SERVICE                     [dt=  0]
t=2979118203:     +SOCKET_POOL                       [dt=  0]
t=2979118203:        "Reusing socket."
t=2979118203:        "Socket sat idle for 57256 milliseconds"
t=2979118203:     -SOCKET_POOL
t=2979118203:      HTTP_TRANSACTION_SEND_REQUEST     [dt=  0]
t=2979118203:     +HTTP_TRANSACTION_READ_HEADERS     [dt=221]
t=2979118203:        HTTP_STREAM_PARSER_READ_HEADERS [dt=220]
t=2979118424:     -HTTP_TRANSACTION_READ_HEADERS
t=2979118424:   -URL_REQUEST_START
t=2979118424:    HTTP_TRANSACTION_READ_BODY          [dt=  0]
t=2979118425: -REQUEST_ALIVE

2. http://google.com/
t=2979114840: +REQUEST_ALIVE                                             [dt=29]
t=2979114841:   +URL_REQUEST_START                                       [dt=29]
                   url: http://google.com/
t=2979114841:     +URL_REQUEST_START                                     [dt= 4]
                     url: http://google.com/
t=2979114841:        HTTP_CACHE_OPEN_ENTRY                               [dt= 0]
t=2979114841:        HTTP_CACHE_WAITING                                  [dt= 0]
t=2979114841:        HTTP_CACHE_READ_INFO                                [dt= 1]
t=2979114843:       +PROXY_SERVICE                                       [dt= 1]
t=2979114843:          PROXY_SERVICE_POLL_CONFIG_SERVICE_FOR_CHANGES     [dt= 1]
t=2979114844:       -PROXY_SERVICE
t=2979114844:       +SOCKET_POOL                                         [dt= 1]
t=2979114844:         +SOCKET_POOL_CONNECT_JOB                           [dt= 1]
                         group: http://google.com/
t=2979114844:           +HOST_RESOLVER_IMPL                              [dt= 0]
t=2979114844:              HOST_RESOLVER_IMPL_OBSERVER_ONSTART           [dt= 0]
t=2979114844:              HOST_RESOLVER_IMPL_OBSERVER_ONFINISH          [dt= 0]
t=2979114844:           -HOST_RESOLVER_IMPL
t=2979114844:            TCP_CONNECT                                     [dt= 1]
t=2979114845:         -SOCKET_POOL_CONNECT_JOB
t=2979114845:       -SOCKET_POOL
t=2979114845:     -URL_REQUEST_START
                     net error: -104 (net::ERR_CONNECTION_FAILED)
t=2979114870: -REQUEST_ALIVE

Before these, there are several failing requests, and then some that reuse a socket, and this is the newest request that doesn't reuse a socket and that does still work:

22. http://#################/favicon.ico
t=2978959493: +REQUEST_ALIVE                         [dt=343]
t=2978959493:   +URL_REQUEST_START                   [dt=341]
                   url: http://#################/favicon.ico
t=2978959493:      HTTP_CACHE_OPEN_ENTRY             [dt=  0]
t=2978959493:      HTTP_CACHE_WAITING                [dt=  0]
t=2978959493:      HTTP_CACHE_READ_INFO              [dt=  1]
t=2978959494:      PROXY_SERVICE                     [dt=  0]
t=2978959494:     +SOCKET_POOL                       [dt=  0]
t=2978959494:        "Socket sat idle for 6514 milliseconds"
t=2978959494:     -SOCKET_POOL
t=2978959494:      HTTP_TRANSACTION_SEND_REQUEST     [dt=  0]
t=2978959494:     +HTTP_TRANSACTION_READ_HEADERS     [dt=337]
t=2978959494:        HTTP_STREAM_PARSER_READ_HEADERS [dt=337]
t=2978959832:     -HTTP_TRANSACTION_READ_HEADERS
t=2978959834:   -URL_REQUEST_START
t=2978959834:    HTTP_TRANSACTION_READ_BODY          [dt=  0]
t=2978959836: -REQUEST_ALIVE
Aug 13, 2010
#22 paracel...@gmail.com
Oh, and here is the newest that has "SOCKET_POOL_CONNECT_JOB" in it:

https://seal.verisign.com/getseal?at=1&#######################
t=2978954318: +REQUEST_ALIVE                                  [dt=550]
t=2978954318:   +URL_REQUEST_START                            [dt=548]
                   url: https://seal.verisign.com/getseal?at=1&#######################
t=2978954318:      HTTP_CACHE_OPEN_ENTRY                      [dt=  0]
t=2978954318:      HTTP_CACHE_WAITING                         [dt=  0]
t=2978954318:      HTTP_CACHE_READ_INFO                       [dt=  1]
t=2978954322:      PROXY_SERVICE                              [dt=  0]
t=2978954322:     +SOCKET_POOL                                [dt=172]
t=2978954322:       +SOCKET_POOL_CONNECT_JOB                  [dt=172]
                       group: https://seal.verisign.com/
t=2978954322:         +HOST_RESOLVER_IMPL                     [dt=  0]
t=2978954322:            HOST_RESOLVER_IMPL_OBSERVER_ONSTART  [dt=  0]
t=2978954322:            HOST_RESOLVER_IMPL_OBSERVER_ONFINISH [dt=  0]
t=2978954322:         -HOST_RESOLVER_IMPL
t=2978954322:          TCP_CONNECT                            [dt=171]
t=2978954494:       -SOCKET_POOL_CONNECT_JOB
t=2978954494:     -SOCKET_POOL
t=2978954494:      SSL_CONNECT                                [dt=195]
t=2978954689:      HTTP_TRANSACTION_SEND_REQUEST              [dt=  0]
t=2978954689:     +HTTP_TRANSACTION_READ_HEADERS              [dt=174]
t=2978954689:        HTTP_STREAM_PARSER_READ_HEADERS          [dt=174]
t=2978954864:     -HTTP_TRANSACTION_READ_HEADERS
t=2978954866:   -URL_REQUEST_START
t=2978954869: -REQUEST_ALIVE
Aug 13, 2010
#23 paracel...@gmail.com
Just had this hit 5.0.375.126 again, less than two hours after last time. Strangely enough, the last request that worked before the failure seems to have been a favicon.ico file again. I have no idea if this is a coincidence or significant, but I figured it's worth mentioning. Here are the two requests just after and before the issue hits:

6. http://www.reddit.com/
t=2986538500: +REQUEST_ALIVE                                             [dt=3]
t=2986538500:   +URL_REQUEST_START                                       [dt=3]
                   url: http://www.reddit.com/
t=2986538500:     +URL_REQUEST_START                                     [dt=3]
                     url: http://www.reddit.com/
t=2986538500:        HTTP_CACHE_OPEN_ENTRY                               [dt=0]
t=2986538500:        HTTP_CACHE_WAITING                                  [dt=0]
t=2986538500:        HTTP_CACHE_READ_INFO                                [dt=1]
t=2986538502:       +PROXY_SERVICE                                       [dt=0]
t=2986538502:          PROXY_SERVICE_POLL_CONFIG_SERVICE_FOR_CHANGES     [dt=0]
t=2986538503:       -PROXY_SERVICE
t=2986538503:       +SOCKET_POOL                                         [dt=1]
t=2986538503:         +SOCKET_POOL_CONNECT_JOB                           [dt=1]
                         group: http://www.reddit.com/
t=2986538503:           +HOST_RESOLVER_IMPL                              [dt=0]
t=2986538503:              HOST_RESOLVER_IMPL_OBSERVER_ONSTART           [dt=0]
t=2986538503:              HOST_RESOLVER_IMPL_OBSERVER_ONFINISH          [dt=0]
t=2986538503:           -HOST_RESOLVER_IMPL
t=2986538503:            TCP_CONNECT                                     [dt=0]
t=2986538504:         -SOCKET_POOL_CONNECT_JOB
t=2986538504:       -SOCKET_POOL
t=2986538504:     -URL_REQUEST_START
                     net error: -104 (net::ERR_CONNECTION_FAILED)
t=2986538504: -REQUEST_ALIVE

7. http://dagobah.biz/favicon.ico
t=2986392078: +REQUEST_ALIVE                                       [dt=54]
t=2986392078:   +URL_REQUEST_START                                 [dt=53]
                   url: http://dagobah.biz/favicon.ico
t=2986392078:      HTTP_CACHE_OPEN_ENTRY                           [dt= 0]
t=2986392078:      HTTP_CACHE_CREATE_ENTRY                         [dt= 0]
t=2986392078:      HTTP_CACHE_WAITING                              [dt= 0]
t=2986392078:     +PROXY_SERVICE                                   [dt= 0]
t=2986392078:        PROXY_SERVICE_POLL_CONFIG_SERVICE_FOR_CHANGES [dt= 0]
t=2986392079:     -PROXY_SERVICE
t=2986392079:     +SOCKET_POOL                                     [dt= 0]
t=2986392079:        "Reusing socket."
t=2986392079:        "Socket sat idle for 1269 milliseconds"
t=2986392079:     -SOCKET_POOL
t=2986392079:      HTTP_TRANSACTION_SEND_REQUEST                   [dt= 0]
t=2986392079:     +HTTP_TRANSACTION_READ_HEADERS                   [dt=51]
t=2986392079:        HTTP_STREAM_PARSER_READ_HEADERS               [dt=51]
t=2986392131:     -HTTP_TRANSACTION_READ_HEADERS
t=2986392131:   -URL_REQUEST_START
t=2986392131:    HTTP_TRANSACTION_READ_BODY                        [dt= 0]
t=2986392132: -REQUEST_ALIVE

Aug 16, 2010
#24 mikesm...@chromium.org
Can you please take a look at this data? Thanks!
Status: Assigned
Owner: ero...@chromium.org
Labels: -FeedbackRequested Mstone-7
Aug 16, 2010
#25 eroman@chromium.org
Re-assigning to Randy, who is helping me look at network errors.
Owner: rdsm...@chromium.org
Cc: ero...@chromium.org
Aug 16, 2010
#26 rdsmith@chromium.org
@paracelsus: Just because I don't have enough reading material :-}, the next time this bug happens, could you get a full net-internals dump and send it to me?  (But see caveat at end of this entry.)  In version 6.0.490.1 (which should be the latest dev release) the way to do this is "about:net-internals" -> Data tab, click "Dump to Text", and then copy and past the result into an email to me (rdsmith@chromium.org).  The full dump has information on DNS requests and other net-related things that might have a bearing on this problem.  If you could send one of those right after an error (along with a description of which page the error happened on), and then send one later when you've navigated to the same page without having a problem, that would be very useful.  

Caveat: that dump can have private information in it (cookies, specifically).  The most recent dev branch has a check box (default checked) to remove the cookies from the report (which is probably fine, since this doesn't sound like a cookie related problem, at least at the moment).  If you have a version that doesn't have that check box, you might want to go through the text file and edit out the cookies.  Either way, send the dump directly to me rather than attaching it to the issue--better safe than sorry.

Thanks very much!

Aug 17, 2010
#27 rdsmith@chromium.org
@paracelsus: Thanks for the dump.  It looks to me as if the location generating the -2 error code (which is a generic "failed" error) is generating it from an underlying OS error, and it would be useful to know what error that is.  When we create the -2 error code, we log the original OS error.  That log generally goes to standard error.  For a mac application run in the normal way, that might go to the system logs (the earlier log information you posted matches stuff I see on stderr when I run Chrome that way), but the way I know to capture it is to run chrome in a shell buffer and capture the output from it.  Would you be willing to do that and reproduce the error running chrome in that context?  Steps:
* Make sure chrome isn't running on the system.
* In a shell Window, execute 
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome 
* Reproduce the issue.
* Send me the output in the shell buffer (I'm specifically looking for lines that end 
"mapped to net::ERR_FAILED")

You might also want to scan your system logs for that string; if you can correlate that output in time with the log you recently sent me, that would be fine too (I just don't want to chase a wild goose error message that doesn't have to do with this problem.)

Thanks!

Aug 17, 2010
#28 paracel...@gmail.com
It should indeed go into the logs if it's printed to stderr, but searching through old logs reveals nothing at all. It doesn't seem to get printed at all.

I'll try running it manually in a shell later, but I suspect it won't output anything then either.
Aug 18, 2010
#29 rdsmith@chromium.org
Huh.  Interesting.  When you run it in the command line, do it with --log-level=1.  That should be the default, but maybe there's something about release versions that sets the log level bar higher.  (If you do get the error message I'm looking for, take an about:net-internals dump again, just so I can correlate the one with the other).

Aug 19, 2010
#30 paracel...@gmail.com
All right, I finally got around to running it on the command line, with --log-level=1. However, still no messages at all, except for the spam about duplicated classes on startup. Nothing at all is logged after that, including when the ERR_FAILEDs start happening.
Aug 23, 2010
#31 rdsmith@chromium.org
So I went carefully through your most recent log, and it looks to me as if, at some point in time, any actual connection attempts return a 64 error from the operating system (the re-use of existing sockets are re-using sockets that are already connected).  On my mac that error is shown in sys/errno.h as EHOSTDOWN.

I'm at a loss to come up with something that would be chrome-process specific that would wedge the connect() syscall in an EHOSTDOWN state.   But we've sometimes seen situations where chrome (because it aggressively pre-fetches and uses multiple connections for minimum delay for the user) triggers responses in routers/firewalls that something bad is happening, and they start blocking outgoing packets, and I could see that leading to repeated EHOSTDOWNs.   So based on that theory, a couple of questions:
* What is the general shape of your network topology?  What are the routers/switches/firewalls between you and the main net?
* When this happens, could you try connecting to the same set of web sites through Firefox or Safari and see if it goes through?  If it does, could you retry in chrome (without restarting) afterwards?  (This theory would suggest that it's not the restart of chrome that fixes the problem so much as waiting for some timeout on an intermediate router, so it might work/fail in another web browser depending on the time delay between trying in Chrome and trying in that browser.  Thus I expect it to fail in FF or Safari, but if it works, we want to confirm that Chrome is still failing before concluding that it's Chrome-specific.)
* Similarly to the above, the next time it happens, could you leave chrome alone for say five minutes (I'm trying to do a bit of overkill :-}) and then see if reloading the page works?  

Let me know how that goes, and we'll figure out the next step.  Thanks!

Aug 23, 2010
#32 paracel...@gmail.com
> What is the general shape of your network topology?  What are the routers/switches/firewalls between you and the main net?

Just a wired connection to some generic ADSL modem/NAT router.

> When this happens, could you try connecting to the same set of web sites through Firefox or Safari and see if it goes through?

I am pretty sure I've done this (while reporting bugs), and everything works perfectly in other browsers.

> If it does, could you retry in chrome (without restarting) afterwards?

I forget if I've tried this already, but I'll try it next time, and also the other things.

One thing that has come to mind during this is that I am running Parallels, and it does install some kind of network driver of its own to talk to the net through. I'm not sure of the exact details of how that works, but I'd imagine it could add extra confusion to the networking stack.

(Also, there is the fact that this happens only rarely in 5.0 and all the time in 6.0 that might be some kind of hint, if anything obvious has changed between the two versions.)
Aug 23, 2010
#33 kr...@chromium.org
(No comment was entered for this change.)
Labels: -Verifier-Ismail
Aug 26, 2010
#34 rdsmith@chromium.org
(No comment was entered for this change.)
Labels: -Mstone-7 Mstone-X
Aug 31, 2010
#35 rdsmith@chromium.org
@paracelsus: Any luck seeing if we can prove/disprove the "overwhelming something in the network stack down the line from Chrome" theory?  I've gotten very attached to it, and if I have to let go of it I'd rather know sooner rather than later :-}.

Also, a couple of questions about your infrastructure:
* What's your desktop configuration?  (I.e. what's your host operating system, what version of parallels are you running, and what kind of hardware do you have?)  I'm debating trying to setup a local instance and see if I can reproduce the problem locally.
* If it does turn out to be downstream from Chrome, are you in a position where you might be able to get packet traces for the problem?  I realize that that may be above and beyond the call of duty, but if this does turn out to be us overwhelming a downstream entity I'd love to get what information I can on the network level signature so that we can try and figure out workarounds.

Thanks much (in advance and for all your help so far).

Aug 31, 2010
#36 paracel...@gmail.com
I did some more testing, running 6.0 besides 5.0. When 6.0 stops loading pages, loading the same page in 5.0 does not change anything (and it does load fine in 5.0). Also, leaving it alone for a while seems to do nothing neither.

For network monitoring, I tried running "tcpdump | grep slashdot" and then going to slashdot.org. Doing it in 5.0 produced the expected burst of packets, but doing it in 6.0 while it was refusing to load caused zero lines to appear. Not sure if there's any activity that didn't match the grep filter that happened, since there's so much noise I can't tell.

The machine is an oldish iMac, from 2006 or so, running the latest Snow Leopard and Parallels 5.0.9220.
Aug 31, 2010
#37 rdsmith@chromium.org
That's ... fascinating.  So, just to confirm I understand the test and have nailed down the relative details: You were running Chrome 6 and Chrome 5 in the same virtual machine at the same time.  You worked with chrome 6 until it stopped loading pages, then you went over to chrome 5 and checked to make sure you could view the web page chrome 6 had just balked on, and then went back to Chrome 6 and found that it was still refusing to load that page?

Two questions:
* Was the chrome 5 instance untouched (started up but not used for browsing) before you tried loading the page?  (I want to make sure that the chrome 5 browser didn't have a cached network connection to the page in question).
* Can you reproduce the chrome 6 part of the experiment with the tcpdump going into a file (without the grep) and send me the net-internals dump and the tcpdump output?  I'm happy to dig through them to figure out the relevant activity.  Bring up about:net-internals and start tcpdump tracking before you start browsing; the dumps will be bigger, but that'll give me pretty clear before and after situations to examine.  Send them direct to me so we avoid sharing cookie details (which don't seem relevant to this problem anyway--feel free to nuke them if you want).

Thanks very much!


Aug 31, 2010
#38 paracel...@gmail.com
Actually, I was running them on the host OS, with a VM running by the side. I mostly mentioned the VM because it installs its own drivers which might mess with things, possibly. But otherwise correct.

The Chrome 5 instance was running from before, but I tried picking some test pages that would not have been loaded by either browser.

I will try to arrange a tcpdump in a while.
Sep 6, 2010
#39 paracel...@gmail.com
All right, sent data dumps.

This issue has suddenly become a lot more urgent for me, though, since v6 got pushed out over autoupdate, meaning I can no longer use Chrome at all. Is there a way to disable autoupdate until this gets resolved?
Sep 7, 2010
#40 paracel...@gmail.com
All right, I just did a binary search through the old snapshots of Chromium, to find where this bug got introduced. (Or rather, where the more severe version got introduced. I assume all the versions I tested still suffer from the same issue if you let them run for a long time, but testing that was infeasible.)

So far, I have confirmed that revision 49371 has the bug. Revision 49234 does not seem to have the bug, but I will need to run it for a little longer to be sure, but these two at least seem to bracket it.
Sep 8, 2010
#41 paracel...@gmail.com
I've been running 49234 for 16 hours now and it hasn't broken yet, so it seems it really does not have the bug. So revisions 49234 and 49371 do bracket the introduction of this bug (or rather, the change that makes this bug much worse).

I tried reading through the commit logs, but I don't really know what a lot of the stuff referenced is, so I guess I'm not a huge help there. http://src.chromium.org/viewvc/chrome?view=rev&revision=49285 and http://src.chromium.org/viewvc/chrome?view=rev&revision=49355 stand out as sounding a little like they could have something to do with it, but there are probably other candidates too that I am missing.
Sep 8, 2010
#42 rdsmith@chromium.org
Yeah, I've just been scanning that list myself.  Nothing's jumping out at me--it *could* be a couple of things, but none of them seem likely.  And doing a round trip binary search (I compile a build, you give me a yes/no on it) really doesn't seem like the right use of our time.  

Just to write down what I found from the last set of logs you sent me:
* Same signature as last time; once one TCP_CONNECT_ATTEMPT fails, all following ones fail.
* The first TCP_CONNECT_ATTEMPT that fails (and all following ones) have no corresponding entry in the tcpdump file, though the ones before it do.  
* There is a trace in the file after that connect attempt that has the SYN flag set (indicating an initial connect).  It's replied to with an immediate reset.  I don't think it's Google Chrome; the IP address named doesn't show up on our logs.  (A reverse DNS lookup returns NXDOMAIN, which seems weird to me, so I'll call it out to you, but I don't think it's relevant to this problem:

21:33:02.670577 IP us.c3.cx.61470 > 81.200.24.121.40500: Flags [S], seq 3670595700, win 8192, options [mss 1460,nop,wscale 0,nop,nop,TS val 160326445 ecr 0,sackOK,eol], length 0
21:33:02.711409 IP 81.200.24.121.40500 > us.c3.cx.61470: Flags [R.], seq 0, ack 3670595701, win 0, length 0
)

This suggests to me that there's something wonky on-node in relationship to the particular chrome process that's making connections (the IO process; we do all net connections from a single process).  Along those lines, if it's easy for you to reproduce this and do an "lsof -c Google" I'd be curious about the output (ideally while Chrome is working and after it stops); I'm curious about how many file descriptors are being used for network connections in the failing case.  

But I don't think that's going to lead directly to a solution, and given that the problem looks to be in the native network stack rather than in Chrome's, I'm not sure instrumenting a kernel and sending it to you for more data will help either.    I suspect that the best pathway at this point is for me to try to reproduce it in house.  I know you're running parallels and Mac OS 10.6 as host, and that you see this running Chrome 6 in the Host OS; is there anything else about your configuration that might be strange?  What hardware are you using?  What guest OS(es) are you using?

Sep 8, 2010
#43 paracel...@gmail.com
It might be worth building a couple versions on either side of a commit that looks suspicious, in case that happens to catch it, since that can be done in one go. With this luck, though, it'll probably fail. Well, it'll also narrow the range a bit, I suppose.

Also, my hardware is a somewhat old iMac - iMac5,1, apparently, so this: http://www.everymac.com/systems/apple/imac/stats/imac-core-2-duo-2.0-17-inch-specs.html
The machine has some history, so I am not sure what else might have been installed in the past that could cause issues. My only other machine is a PPC, so I can't use it for comparative testing, either...

Parallels is running an old Windows 2000, but I doubt that affects it, because I tried shutting down and quitting Parallels, and the bug still showed up. (At least I think I did this, my memory is a bit hazy.)

I'll give lsof a try in a bit.
Sep 8, 2010
#44 rdsmith@chromium.org
> It might be worth building a couple versions on either side of a commit that looks suspicious, in 
> case that happens to catch it, since that can be done in one go. With this luck, though, it'll 
> probably fail. Well, it'll also narrow the range a bit, I suppose.

I'm happy to do this if you want to do the testing; your time is the one I'm most afraid of wasting.  Once I get stuff configured I can probably script things to build arbitrary builds (it will take me a couple of hours of work to get the configuration done, though).  The reason I'm not more gung-ho is that I didn't really see commits that looked suspicious to me--I have to argue myself into believing that any of the commits in that list are relevant, which almost certainly means that I don't have the right clue yet as to what's going on.  In theory, the binary search would only take seven tests--want to gear up and try that?  We can drop the number of round trips by bundling builds.



Sep 8, 2010
#45 paracel...@gmail.com
I can try, I already did one run with the public builds. It's worth a shot.

(Also, it is faster to find a build that HAS the bug, since it triggers pretty early. If it doesn't have it, it needs to run for a while so I can be sure.)
Sep 9, 2010
#46 eroman@chromium.org
Hi parcelsus, thanks for all the data!
(It is really helpful in getting to the bottom of this, and I appreciate the time you are taking to do this)

I have an idea on what could be the problem:
In your Chrome 6 logs, the browser has determined that IPv6 is not supported (via a probe test), and consequently disabled it. This could be why we eventually get EHOSTDOWN.

Could you try running Chrome with the IPv6 probe disabled? To do so, run this command-line:
  "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --enable-ipv6

If that works, we have a pretty good idea on what the problem is.

If it still fails, then could you send us a NetLog from a *working* version of Chrome (by comparing a log from working chrome and from failing chrome we can see where they differed).

Thanks!
Sep 9, 2010
#47 paracel...@gmail.com
I tried --enable-ipv6 on Chrome 6, but it still fails as before. Also, I sent a net dump from Chromium 49234 while it was working fine to Randy.
Sep 9, 2010
#48 rdsmith@chromium.org
Willing to send an about:net-internals dump from the --enable-ipv6 experiment?  Eric and I are putting our heads together here and I'm setting up to send you builds for bisecting in the background (though I'd like to come up with a different/another approach besides that one--a bug that exists in both but with different frequencies of occurrence could have close to zero relationship to the actual change that increased the frequency of occurrence.)

Sep 9, 2010
#49 paracel...@gmail.com
Sent.
Sep 19, 2010
#50 fabian.m...@gmail.com
I just wanted to weigh in. I also have this problem. I have been experiencing it on three different OS X machines. After an undefined random time Chrome just stops working at all. Open pages with Ajax can't load (e.g. image-gallery scripts) new content and new pages just won't load. I have to close all running processes and re-open chrome. All other Browsers work fine with the webpages. 

All other browsers means SR Ware Iron, Firefox, Safari and Opera (always latest version; as this error has been bugging me for quite some time I just say "latest" as that have been different versions along the line).

I switched to the beta build of Chrome yesterday, hoping the error would be fixed there but no luck. The problem is so severe that I have to look at alternative browsers again (which I really don't want). I hope you can find the error. Ah and I have the error regardless of the network I use to connect to the internet (tried: work - heavily firewalled corporate infrastructure, university infrastructure, eduroam wireless, Vodafone Germany UMTS broadband, at home via WLAN and cable to a FritzBox router, at various friends places with DSL through a router).

Anywhing I can do to help out?
Sep 19, 2010
#51 paracel...@gmail.com
As a workaround for you and anyone else until this gets fixed, I recommend using this:

http://build.chromium.org/buildbot/continuous/mac/2010-06-08/49234/chrome-mac.zip

It is the last version to not have the more severe version of the bug, and it will not auto-update to a broken version. I'm using it now and it's been working the same as the older versions, breaking only once a day or so.
Sep 19, 2010
#52 paracel...@gmail.com
@fabian.meyer: Oh, also: Could you tell us if you have Parallels installed? That was one of my pet theories for what might cause the bug.
Sep 19, 2010
#53 rdsmith@chromium.org
@fabian.meyer: Please do let me us know if you have parallels installed on any or all of your machines, along with any other configuration details of those machines--I'm interested in both similarities and differences between them.  All the same OS version?  Any network related software installed on all of them?  Anti-virus scanners?  You get the idea.  Put that info in the bug report so that @parcelsus can comment as well.  

Also, if you have a chance, giving me a network internals dump from one or more of your machines after you reproduce the problem would be useful.  It's probably got the same signature as @parcelsus' problem, but if it doesn't, that would be very illuminating.  To get a network internals dump:
+ Start the browser
+ Create a new tab and visit "about:net-internals"
+ Switch back to your first tab.
+ Reproduce the problem
+ Switch back to the net-internals tab and click "dump to text"
+ Copy and paste the result into a file and mail me (rdsmith@chromium.org) that file as an attachment.  (Please don't attach it to the bug report--depending on how old a version of chrome you have, it may have cookies in it, and it's better not to make those pubilc.)

Thanks!!

Sep 20, 2010
#54 fabian.m...@gmail.com
I don't have parallels installed - but I have VirtualBox as a VM. I will attach the requested info once the bug shows again. For antivirus I have Sophos Antivirus on all my machines. The built-in firewall is active.

Interestingly I just stumbled upon something. There seemed to be an old version of Little Snitch still lurking in my main machine. I thought I had uninstalled it about a year ago, but today it popped up blocking Skype. So now I uninstalled it and so far the problem has not occurred again. The other machine I use does not have Little Snitch but the problem still occurs there.
Sep 20, 2010
#55 rdsmith@chromium.org
@fabian.meyer: And just to confirm, you have VirtualBox on all three machines you've seen this on?  And are you running in the host or guest operating systems?  And what OS is the other one from the one you're running in?

I hadn't thought about the built-in firewall; that's a possibility.  I'll look into that along with the virtual machine 
Sep 20, 2010
#56 fabian.m...@gmail.com
Yes VirtualBox is on all machines. But they are usually not actively running any VMs at the time of the problems (so Chrome is on the host system). The problem first started about 10.6.2 I think but all machines were updated to the latest OS asap. So right now they are all on 10.6.4.

As the problem seems to be gone for now (yesterday: about 5 times / hour; today as well - after deinstall of Little Snitch nothing for at least 3 hours) my guess would be Little Snitch. 

I have to check if the third machine (at work) also has Little Snitch. I haven't installed it, but every now and then some else has to use it, so it is not impossible. I'll be at that machine wednesday and report back. 
Sep 20, 2010
#57 paracel...@gmail.com
I thought I was not running Little Snitch, but I have in the past, so I wonder if I am in the same situation, and have it left somewhere on the system. That would certainly sound like some kind of explanation.

I will have to investigate this later today when I get back to that machine.
Sep 20, 2010
#58 j...@slushpupie.com
Ive been running the latest dev version on Mac (10.6.4) with none of the problems described here until I updated to 7.0.517.8, and now I get the same problems.   I do have VMWare Fusion and VirtualBox installed, though I dont think that is an issue since all other browsers work fine.  I do have a "fairly heavy" ipv6 usage, though, as native ipv6 (not tunneled) is in use on my network.  I would be happy to provide any details if they are useful (net-internals, etc).
Sep 20, 2010
#59 paracel...@gmail.com
Indeed I did still have Little Snitch left on my system.

Specifically, I had an ancient version, 1.2.3. It seemed to still be turned on, but I am pretty sure it has not managed to block a connection for a long time. It certainly had no rule for Chrome or Chromium. I think the upgrade to 10.6 might have broken it.

Now I have uninstalled it, and it seems Chrome 6 is working fine. At least it has not stopped loading pages yet after fifteen minutes or so of testing, which usually would be enough to kill it. More testing is needed, though.

So it does seem it might just be the culprit. Probably specifically the old version that doesn't work right on new systems. I noticed the preferences panel was 32 bit. Is there some 32 bit issue for kexts too? I forget. Either way, somehow it seems to not manage to block anything except Chrome, and only sometimes. This is a bit of a mystery.

I am not sure if it is worth digging further into why Chrome triggers it or not. I might suggest checking for it being installed, and warning people it causes problems. You'd have to figure out which versions specifically cause problems, though.

Sep 20, 2010
#60 rdsmith@chromium.org
@fabian.meyer: If I understand you correctly, one of the three machines you've seen this problem on does *not* have Little Snitch installed on it?  Could you confirm that?  If so, we should probably still keep digging, although maybe at a lower priority, since most other people's failures seem to be correlated with that program.

@jay@slushpupie.com: Could you check to see if you have Little Snitch installed on your system?  That seems to be tightly correlated with the problems others are seeing.

@Paracelsus: That's awesome!  Obviously, please raise a flag if you see the problem again.  If this does turn out to be tightly Little Snitch correlated, I'm not inclined to worry too much about it.  It certainly strikes me as plausible that an old misconfigured version of a program that blocks outgoing connections would result in this problem :-}.  Thanks very much for all the effort you've put in on this.



Sep 20, 2010
#61 j...@slushpupie.com
@rdsmith@chromium.org: No, not installed, and never has been installed. Interesting concept, though, I doubt I would want that on my system.  
Sep 20, 2010
#62 rdsmith@chromium.org
@jay: I share your reluctance--I was debating installing it locally to play around with it, and decided to delay that as I'd feel inclined to do a re-image afterwards :-}.  But the fact that you don't have it installed makes your example the most interesting at the moment.  Could you follow the debugging instructions in comment 53 and send me the net-internals dump at rdsmith@chromium.org?  

Thanks much in advance!

Sep 20, 2010
#63 rdsmith@chromium.org
@jay: Could you also send me the output of a kextstat on your machine?  Little snitch is apparently a kernel extension interposing on the network stack, and it seems worthwhile scanning your network extensions and seeing if there's anything else that looks suspicious in that class.

@paracelsus, fabian.meyer: Could one of your send me the kextstat from a machine that has Little Snitch installed on it?  I'd like to see what that looks like as well.  Thanks!

Sep 20, 2010
#64 fabian.m...@gmail.com
If the machine at work has little snitch, I'll send the kextstat. My main machine is now snitch-free and my older machine has just been wiped clean for selling it later on - so no samples there.

...but looking at the kextstat I just noticed that parallels wasn't cleanly uninstalled either. I got some serious housekeeping to do, I think. :-/
Sep 20, 2010
#65 paracel...@gmail.com
My machine also no longer has it installed, and I don't seem to have the installer any more, either.

I did check that it was specifically version 1.2.3, though. I think it wasn't misconfigured, though, probably just that the old version was incompatible with 10.6. How exactly it managed to only block Chrome, only sometimes, and without actually doing any of the useful things it is supposed to do (like open up a window telling me it is blocking a new program) is a mystery, though.

If you do want to investigate further, I suggest finding a similarly old version, and installing that, and then seeing if it is blocking anything at all. Mine apparently wasn't, even though it was supposed to, which may be key to reproducing the problem.
Sep 21, 2010
#66 rdsmith@chromium.org
@jay: Your network trace has a different signature than the ones I've been looking at, so I suspect you've got a different problem.  I'll file a new bug once I've dug a little more.  In the meantime, could you tell me what behavior you see from the browser?  What error does the browser report?  If you remember, what website was the first one to fail in the trace that you sent me, and how much more browsing or exploring (and of what websites) did you do after that?

Thanks in advance for any info ...

Sep 21, 2010
#67 j...@slushpupie.com
@rdsmith: I cant recall any specific site that triggers it.  Ive not yet been able to trip it on-demand yet, but Ill keep using it until I notice a pattern. It does happen every time within an hour or so, sometimes much quicker. The behavior I see is when I click on a link or go to a new site manually the browser says down in the status bar "Sending request..." (*not* "Waiting for <whateversite>...").  It never gets past that.  I have to quit the browser to get it back, and it seems sometimes the browser does not want to quit nicely (sometimes forcequit is needed).  No errors are reported. I need to try and isolate the browser (meaning shut off all other http traffic) and get a tcpdump to be sure, but between some hurried tcpdumps and lsof I don't *think* Chrome is making *any* network requests via the OS once this problem occurs.  Some obvious settings that might be relevant:

 * OS X 10.6.4
 * Firewall enabled (built-in OS supplied one, no third party)
 * Case sensitive filesystem
 * A "fair" amount of IPv6 traffic (way more than an average user)
 * Ive tried with DNS pre-fetching both on and off, no change
 * VMWare, VirtualBox, and TUN/TAP are the only network related kernel extentions. None of these running when the problem occurs, however.

If you want any other info, just ask; Id be happy to help.

Sep 21, 2010
#68 paracel...@gmail.com
Yes, that is definitely a different issue.
Sep 21, 2010
#69 j...@slushpupie.com
@paracelsus: Apologies, then. I didn't mean to clutter up this bug report. At first glance it seemed to be the same problem.
Sep 22, 2010
#71 fabian.m...@gmail.com
So I've investigated my machine at work but couldn't reproduce the error all day. I checked for Little Snitch nevertheless and found no traces of it. To me it seems to be resolved. If the problem shows its face again, I'll report back. 
Thanks for the help everyone. I'm pretty happy to be able to use Chrome normally again. 
Sep 23, 2010
#72 rdsmith@chromium.org
@jay: Thanks for the details, sorry I didn't get back to you before this.  And don't worry about mixing it in with this bug--I'd much rather have the pattern of splitting out a new bug than trying to coallesce different bugs referring to the same issue.

Speaking of which, I've created http://crbug.com/56688 to track this new issue; folks who are interested (hopefully including jay :-}) can follow progress there.

Sep 23, 2010
#73 rdsmith@chromium.org
[Summarizing bug for use of future searches.]

Bug description: On Mac OS X, after some period of browsing, web pages will stop loading; instead, the standard Chrome "This web page is unavailable" dialog will be displayed.  Once this has happened, no other web pages can be shown; Chrome must be quit and restarted to recover.  The problem does not affect other processes (whether Chrome or some other browser or program).  A net-internals dump will show that all TCP_CONNECTION_ATTEMPTs (after the failure occurs) will result in an os_error = 64 (EHOSTDOWN).  Note for interpreting net-internals dumps: The output is not in chronological order, however, repeated "os_error = 64"s should be considered diagnostic.

Root Cause: All machines with this signature have been found to have old versions of the program Little Snitch (an outgoing connections blocker) installed on them.  This can be confirmed by looking at the installed kernel extensions through the program "kextstat".

Resolution: Complete uninstall of Little Snitch.

I'm marking the bug as "WONT_FIX" on this basis; anyone on this bug who thinks that's a mistake, please feel free to tell me so and I'll reconsider.

Status: WontFix
Oct 12, 2012
#74 bugdroid1@chromium.org
This issue has been closed for some time. No one will pay attention to new comments.
If you are seeing this bug or have new data, please click New Issue to start a new bug.
Labels: Restrict-AddIssueComment-Commit
Mar 10, 2013
#75 bugdroid1@chromium.org
(No comment was entered for this change.)
Labels: -Area-Internals -Internals-Network Cr-Internals Cr-Internals-Network
Sign in to add a comment

Powered by Google Project Hosting