| Issue 57: | Networking intermittent using Docker | |
| 10 people starred this issue and may be notified of changes. | Back to list |
What steps will reproduce the problem? 1. Use backport debian kernel to make a node. Make sure canForwardIP is on. Make sure net.ipv4.ip_forward=1 2. Install Docker 3. Log into docker via (sudo docker run -t -i base bash) 4. Run a bunch of commands in that docker container that use networking (apt-get update; apt-get install python). These commands work 3/10 times. 5. Have tried commands like GSUTIL. Also intermittent. What is the expected output? What do you see instead? Expected output is that networking works consistently. What version of the product are you using? On what operating system? Using Debian 7 as supplied by Google with Backport kernel. This has been confirmed by more than just me: https://groups.google.com/d/msg/docker-user/N085Aq3oX5Y/Ee5S-CGP2soJ
Dec 15, 2013
#2
da...@genomebridge.org
Dec 16, 2013
The above Docker instance was started in the exact way that Proppy at Google (Johan Euphrosine) recommends. (http://docs.docker.io/en/master/installation/google/)
Dec 16, 2013
David, Rather than network, I think you may be running into disk I/O waits due to a small Persistent Disk. How big is the Persistent Disk you're running from? As of the GA launch, PD I/O scales with disk size. Larger disks are faster. Details here: https://cloud.google.com/developers/articles/compute-engine-disks-price-performance-and-persistence Can you attach a larger PD (we generally recommend ~500GB) and store the docker files there as a test? Also, I'll run through the examples and propose doc updates to default to a larger PD.
Status:
Accepted
Owner: briandorsey@google.com
Dec 16, 2013
We've run the Dockernode with /var/lib/docker mounted on a 2 TB disk and still have the same issue. I'm going to run this again just to be 100% sure.
Dec 16, 2013
Confirmed. /var/lib/docker symlinks to /docker which is a 2TB Persistent disk (that we use ephemerally). sudo docker run -t -i base bash root@8ec72ac4fcbe:/# apt-get update Ign http://archive.ubuntu.com quantal InRelease Hit http://archive.ubuntu.com quantal Release.gpg Hit http://archive.ubuntu.com quantal Release Hit http://archive.ubuntu.com quantal/main amd64 Packages 40% [Waiting for headers] (hangs here -- never finishes). Have you been able to make Docker work with consistent networking in GCE?
Dec 16, 2013
Our actual boot disk is only 10G (as is standard), but our /var/lib/docker is 2 TB.
Dec 16, 2013
I'm starting to work through reproducing this and troubleshooting, but it seems like you're ahead of me. :) One thing I plan to try is using a larger boot disk. If you have time, please give it a try as well. Steps below: You can manually create a larger one from an image using gcutil: $ gcutil adddisk --source_image=backports-debian-7 --size_gb=500 docker-test-big Then adding the instance with: $ gcutil addinstance --disk=docker-test-big,boot docker-test-big (The boot partition itself will still only be 10GB, so that's all the OS will be able to see, which should be fine for a test. Instructions for repartitioning the root partition here: https://developers.google.com/compute/docs/disks?hl=en#repartitionrootpd)
Dec 16, 2013
I followed the above steps. So while the test disk is 500G, only 10G are on the root partition. The rest is unpartitioned. I then follow the rest of Proppy's steps. Still hangs. dbernick@docker-test-big:~$ sudo docker run -t -i base bash Unable to find image 'base' (tag: latest) locally Pulling repository base b750fe79269d: Download complete 27cf78414709: Download complete root@89a0be719f63:/# apt-get update Ign http://archive.ubuntu.com quantal InRelease Hit http://archive.ubuntu.com quantal Release.gpg Hit http://archive.ubuntu.com quantal Release Hit http://archive.ubuntu.com quantal/main amd64 Packages 40% [Waiting for headers] (hang)
Dec 16, 2013
Hrm... I was just able to run 'apt-get update' successfully using the stock directions (10GB boot disk): $ sudo docker run -t -i base bash Unable to find image 'base' (tag: latest) locally Pulling repository base b750fe79269d: Download complete 27cf78414709: Download complete root@09b6f86e289a:/# apt-get update Ign http://archive.ubuntu.com quantal InRelease Hit http://archive.ubuntu.com quantal Release.gpg Hit http://archive.ubuntu.com quantal Release Hit http://archive.ubuntu.com quantal/main amd64 Packages Get:1 http://archive.ubuntu.com quantal/universe amd64 Packages [5274 kB] Get:2 http://archive.ubuntu.com quantal/multiverse amd64 Packages [131 kB] Get:3 http://archive.ubuntu.com quantal/main Translation-en [660 kB] Get:4 http://archive.ubuntu.com quantal/multiverse Translation-en [100 kB] Get:5 http://archive.ubuntu.com quantal/universe Translation-en [3648 kB] Fetched 9813 kB in 15s (646 kB/s) Reading package lists... Done Not sure what is different in my environment compared to yours. :/ I'm using an n1-standard-1 in us-central1-b.
Dec 16, 2013
It works sometimes, but it's intermittent. It usually fails. That's the issue. I just ran it this way.I'm using a n1-standard-1 in us-central1-a. Should I do b? docker run -t -i tianon/debian /bin/bash Unable to find image 'tianon/debian' (tag: latest) locally Pulling repository tianon/debian 0510dba62421: Download complete 4f9975c87b56: Download complete f2d4b32a0e66: Download complete 05b866649fa8: Download complete f815021ef20d: Download complete 0ff04e2946d2: Download complete b1f77df5b54f: Download complete 764a25351209: Download complete a1390ca6935c: Download complete 511136ea3c5a: Download complete 3ead6dd57737: Download complete 6660520c5eda: Download complete root@0b31034f1017:/# apt-get update Get:1 http://ftp.us.debian.org wheezy Release.gpg [1672 B] Get:2 http://ftp.us.debian.org wheezy-updates Release.gpg [836 B] Get:3 http://ftp.us.debian.org wheezy Release [168 kB] Get:4 http://security.debian.org wheezy/updates Release.gpg [836 B] Get:5 http://ftp.us.debian.org wheezy-updates Release [124 kB] Get:6 http://ftp.us.debian.org wheezy/main amd64 Packages [5848 kB] Get:7 http://ftp.us.debian.org wheezy-updates/main amd64 Packages [2905 B] 100% [Waiting for headers] 1021 kB/s 0s 100% [Waiting for headers] 1021 kB/s 0s 100% [Waiting for headers] 100% [Waiting for headers] (hangs)
Dec 16, 2013
I just tried it on us-central1-b and it all worked. Can you see if it always work for you in us-central1-a? If it does, then I imagine it's a region issue.
Dec 16, 2013
I see a hang in us-central1-a. I'll reply back here when I have more info.
Dec 16, 2013
Yay! Neither of us is crazy! We can absolutely use central1-b if all of our quotas are moved to central1-b.
Dec 16, 2013
Intermittent errors strike again. I am no longer convinced that we have a root cause. My example working and broken VMs are now both working. I am working on creating a boot script which can test for this programmatically, rather than interactively. Then we can start running batches of test runs and use stats to narrow things down.
Dec 16, 2013
So its NOT safe to use central-1b? Have you seen the error there or only 1a?
Dec 18, 2013
We've found the root cause. Basically, Docker assumes an MTU of 1500, whereas the eth0 NIC on the GCE instances is 1460, and that's hitting a bug in the network virtualization in us-central-1a. For the short term, a work around is to add "ifconfig eth0 mtu 1460" to the script that you run inside your Docker container, before you do any network traffic. Another work around is docker run -lxc-conf="lxc.network.mtu = 1460". We've validated with a run of 10 successful apt-get updates that this workaround appears to correct things. Longer term, we will fix the bug in the virtualization.
Dec 26, 2013
The docker team has added a command line flag to set the MTU for all containers: docker -d -mtu 1460 You can now set this once, and not need the ifconfig or lxc-conf steps for each container.
Status:
Fixed
|
|
| ► Sign in to add a comment |