My favorites | Sign in
Project Home Wiki Issues Source
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 10: A whole lot of HTTPD processes
21 people starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  abl...@google.com
Closed:  Nov 2010


Sign in to add a comment
 
Reported by j.m.milm...@gmail.com, Nov 3, 2010
What steps will reproduce the problem?
1. Install
2. Turn on
3. restart apache.

What is the expected output? 

Before enabling pagespeed the server normally has about 6 httpd processes, 

What do you see instead?
Once it's enabled there is about 30 all using between 1-2.5% cpu time.

What version of the product are you using (please check X-Mod-Pagespeed
header)?
X-Mod-Pagespeed=0.9.0.0-128

On what operating system?
Fedora 13

Which version of Apache?
Server version: Apache/2.2.16 (Unix)
Server built:   Jul 27 2010 15:58:08

Which MPM?

Please provide any additional information below, especially a URL or an
HTML file that exhibits the problem.



Nov 3, 2010
#1 j.m.milm...@gmail.com
Here's a screenshot of the processes.
Screen-shot-2010-11-04-at-5.35.26-PM.png
52.3 KB   View   Download
Nov 4, 2010
#2 googlemi...@japje.nl
I experienced the same issue when testing this module today.

Module version:
0.9.0.0-128

Os:
Ubuntu 10.4

Apache2:
Server version: Apache/2.2.14 (Ubuntu)
Server built:   Aug 19 2010 03:20:29

Server MPM:
Prefork

Additional info:
The pages i tried to open are build by wordpress (with caching modules).

Nov 4, 2010
Project Member #3 sligocki@google.com
Thanks for the comments, mod_pagespeed is expected to use more processing power because we are running transformations (including expensive image transformations). We will be looking into reducing this.
Nov 4, 2010
#4 MWild1
Hi Shawn. The issue here is not processing power, but actual number of spawned processes. I can confirm the above reports - I had over 100 Apache processes after installing mod_pagespeed.

A normal shutdown of Apache did not remove these processes and I had to forcefully kill them. Is it intentional that these processes should be spawned?

It's a production site and I can't play around too much, but if you have any questions about my configuration etc. then I am happy to provide them. In the meantime I have disabled the module.

Thanks!
Matthew
Nov 4, 2010
#5 themee...@gmail.com
I can confirm this as well.  
Nov 4, 2010
#6 lwhittaker
Hi Also confirmed - had to disable this until resolved
Nov 4, 2010
#7 Wesley.S...@gmail.com
On a non-production server, I was able to reproduce this pretty quickly by setting the cleaning interval to a second and refreshing frequently to simulate high traffic. The only requests to the server were for the rewrite_images test.


X-Mod-Pagespeed=0.9.0.0-128

Debian Lenny x64 (v 5.0.6)

Server Version: Apache/2.2.16 (Unix) PHP/5.2.12 mod_ssl/2.2.16 OpenSSL/0.9.8g mod_fastcgi/2.4.6 mod_fcgid/2.3.4
Server Built: Oct 8 2010 16:39:09 

MPM: prefork


Nov 4, 2010
#8 kmr...@gmail.com
i confirm this issue as well and had to disable the mod.  i will be checking up on this issue until it is resolved.  if there is anything i can do, let me know.
Nov 5, 2010
Project Member #9 sligocki@google.com
We're looking into this. We expect some httpd processes to linger for a short time while they run asynchronous resource requests and rewrite images. We think this may be exacerbation by turning on for the first time under load.
Owner: jmaes...@google.com
Labels: -Priority-Medium Priority-High
Nov 5, 2010
#10 abl...@google.com
I believe this is due to a deadlock in serf_url_async_fetcher.  Each time someone loads an image-rich page, there's a chance of deadlocking the process.  When a certain percentage of httpd processes are deadlocked and unable to process requests, apache spins up more.
Status: Accepted
Owner: abl...@google.com
Labels: -Priority-High Priority-Critical
Nov 5, 2010
#11 hqar...@gmail.com
I can also confirm this on CentOS 5.5, Apache 2.6.18-194.8.1.el5 Prefork. 

Running PHP 5.2.10 with xcache 1.3.0

Modules (mod_pagespeed disabled): core prefork http_core mod_so mod_auth_basic mod_auth_digest mod_authn_file mod_authn_default mod_authz_host mod_authz_user mod_authz_default mod_include mod_log_config mod_env mod_mime_magic mod_expires mod_deflate mod_headers mod_setenvif mod_mime mod_status mod_vhost_alias mod_negotiation mod_dir mod_alias mod_rewrite mod_proxy mod_proxy_balancer mod_cache mod_mem_cache mod_instaweb mod_php5 mod_proxy_ajp 

Number of sleeping apache processes climbs until server limit reached.  Raising that causes all 8GB memory to be used, swap space use and very high load.
Nov 5, 2010
#12 Wesley.S...@gmail.com
In regards to comment #10, for general reference: This will eventually become a 503 error on each new page request to that Apache server, once the number of deadlocked processes hits MaxClients.
Nov 5, 2010
#13 webmas...@misabueso.com
I installed on the 3rd, and (unwillingly since I firmly believe this is a great project) had to uninstall on the 5th, the server never crashed but there were high loads and erratic behaviour including serving blank pages. I've put together some graphics, probably useless, sorry about that but they may illustrate what happened.   
page_speed_results_041110.png
20.8 KB   View   Download
Nov 5, 2010
Project Member #14 sligocki@google.com
Thanks everyone for the reports, we have been able to reproduce this and have a patch that solves the processor explosion for us. Stay tuned for an update coming ASAP.
Nov 5, 2010
Project Member #15 jmara...@google.com
We have a fix brewing for this problem.  We'll get it out to you as soon as possible.

Nov 5, 2010
Project Member #16 jmara...@google.com
https://code.google.com/p/modpagespeed/source/detail?r=146

Note: this fix has not been made available in binary form yet.  We will release new binary distributions incorporating these changes on Monday Nov 8.

Those building using the open-source instructions on 
https://code.google.com/p/modpagespeed/wiki/HowToBuild can try out these improvements immediately.

Status: Fixed
Nov 9, 2010
#17 googlemi...@japje.nl
I just tested the new(er) version 0.9.1.1-r171 and i still have the same issue.

Apache spawns many processes and load climbs to 25+ within a few seconds.
Nov 9, 2010
#18 clement....@gmail.com
I still have the bug with r173 on debian x64
Apache spawns many processes few seconds after...
Nov 9, 2010
#19 abl...@google.com
Darn, sorry you're still having issues.  Thanks for the reports.

We need some more specific information to figure this out.
1. Can you post the url of your site?
2. Can you attach your apache configuration?
3. Exactly how many apache processes are being run before and after the installation of mod_pagespeed?
4. What are those processes doing?  A screenshot of "top" like in #1 is helpful.  Even more helpful would be to run `sudo strace -p $PID` for a bunch of different PIDs (the PID is the number in the first column on `top` or the second column of `ps -efwww | egrep 'httpd|apache2'`).  The strace command will attach to the process and tell you what it's doing.  Is it sitting there in an accept(), read(), or futex()?  Is it busilly running through a bunch of stuff?  Just look at it for a second or two and then hit ctrl-C to break out.  Grab a random sampling of 5 or so and let us know what's going on.
Thanks again; with your help we'll get to the bottom of this.
Status: New
Nov 9, 2010
#20 nickz...@gmail.com
Hello,

Im still getting the same issue as well. CoS 5.5 32bit.

Date: Tue, 09 Nov 2010 17:37:54 GMT
Server: Apache/2.2.16 (Unix) mod_ssl/2.2.16 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 PHP/5.2.9
X-Powered-By: PHP/5.2.9
X-Mod-Pagespeed: 0.9.1.1-173
Vary: Accept-Encoding
Content-Length: 12175
Content-Type: text/html

Since the comment was too long I attached everything on a file.

Any ideas?
details.txt
36.6 KB   View   Download
Nov 9, 2010
Project Member #21 jmara...@google.com
Could you also post your apache error log?

Nov 9, 2010
#22 nickz...@gmail.com
Sure, here it is. I attached the error_log when re-enabling mod_pagespeed and restarting apache, killed it after the load goes up to 10.

Thanks.
error_log.txt
18.9 KB   View   Download
Nov 10, 2010
#23 abl...@google.com
Thanks for all those details, nickzoid.  But the error_log.txt you attached doesn't seem to have any mod_pagespeed logs.  Was this taken with mod_pagespeed enabled?  If not, could you enable it briefly and send a new snippet?  If so, can you try increasing your loglevel?

Also, it's interesting that mysql is taking up 36% of your CPU.  Is that normal for your site when not running mod_pagespeed?  Do you have images, javascript, or css stored in mysql?
Nov 10, 2010
#24 mark.har...@gmail.com
Hi,

I don't think this is fixed, worked fine for two hours had lots of apache process's going had hit max clients.  

[Wed Nov 10 23:10:47 2010] [error] log_message_handler: [1110/231047:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll success status 0 (110)
[Wed Nov 10 23:10:47 2010] [notice] child pid 22156 exit signal Segmentation fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23447 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23746 exit signal Segmentation fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23750 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23787 exit signal Segmentation fault (11)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23864 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23944 exit signal Aborted (6)
[Wed Nov 10 23:10:47 2010] [notice] child pid 23981 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24016 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24098 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24185 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24345 exit signal Aborted (6)
[Wed Nov 10 23:10:48 2010] [notice] child pid 24653 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23482 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23510 exit signal Aborted (6)
[Wed Nov 10 23:10:50 2010] [notice] child pid 23961 exit signal Segmentation fault (11)
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
[Wed Nov 10 23:10:58 2010] [error] log_message_handler: [1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll success status 0 (110)
[Wed Nov 10 23:10:58 2010] [error] log_message_handler: [1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll success status 0 (110)
[Wed Nov 10 23:10:58 2010] [error] log_message_handler: [1110/231058:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll success status 0 (110)
pure virtual method called
terminate called without an active exception
[Wed Nov 10 23:11:21 2010] [error] log_message_handler: [1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll success status 0 (110)
[Wed Nov 10 23:11:21 2010] [error] log_message_handler: [1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll success status 0 (110)
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception

Load got to 200+

2.6.18-164.15.1.el5xen #1 SMP Wed Mar 17 12:04:23 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
Server version: Apache/2.2.3

Nov 10, 2010
Project Member #25 jmara...@google.com
Mark, we agree we still have some issues that need to be resolved which have similar symptoms.

We do know that since our Nov 8 update, where we fixed a deadlock scenario that we were able to reproduce, the number of reports of this behavior went down.  But we are not done.

Could you tell us more about your setup?  Where does your content come from?  Static files?  PHP?  WordPress?  Is MySQL involved?  Of course those technologies are pretty stable, but evidence suggests that in some cases our interaction with them is not well tuned.

Are you using the pre-fork MPM?   Or are you trying out mod_pagespeed in threaded or event modes?

Is your site image-rich?

Could you publish the URL to your site?

Could you attach your pagespeed.conf file?

Thanks for your help!

Nov 10, 2010
Project Member #26 jmara...@google.com
This is a rather uninformative error message that we were printing:

[Wed Nov 10 23:11:21 2010] [error] log_message_handler: [1110/231121:ERROR:net/instaweb/apache/serf_url_async_fetcher.cc(638)] Poll success status 0 (110)

In our next release we'll at least print out the text for error 110, which is "Connection timed out".

I think what's happening is that a transient spike in load, say, from image compression, is causing the system to become unresponsive, which causes Apache to spawn more processes to handle incoming requests.  This makes the situation worse.

We have been load testing but haven't seen this on our systems, even with an image-rich page.  But aside from image compression, in our system, we have nothing really compute intensive going on because we load-test using static mirrors of web sites.

There is something on your system, Mark, that we believe is helping Apache & mod_instaweb go unstable.  Independent of that there are steps that we can and should take in mod_pagespeed to shed load and avoid harming system performance.  But it would be helpful to replicate your serving infrastructure to repro your problem and convince ourselves it's fixed.

Nov 11, 2010
#27 mark.har...@gmail.com
Hi,

It's www.profileheaven.com   Medium traffic, Setup is
XEN:
core app: Nginx -> apache (php+some css + php + some js)
images/media: Nginx -> nginx -> images/css/js  

Obviously I can't re-write the images/media setup without also putting apache on that server but if fear IO issues via cache/images on the SAN.


The site itself is rich in dynamic content (still on fixing this but it's a big job).  However the load was steady for a good couple of hours with only a slight increase then shortly after the apache process's went wild and the load skyrocketed as it went upto the maxclients/threads.


Nov 11, 2010
#28 mark.har...@gmail.com
Sorry to confirm it's not threads it's worker process setup.
Nov 11, 2010
#29 mark.har...@gmail.com
Our MySQL setup is Master/Slave with read/write split however I think the layer this is targeting in apache is after the transaction has completed.  I'd have to provide the usage graphs (cacti) on a non-public system as it's commercially sensitive.
Nov 11, 2010
Project Member #30 sligocki@google.com
@mark.harburn, worker MPM is multi-threaded: http://httpd.apache.org/docs/2.0/mod/worker.html
Nov 11, 2010
Project Member #31 jmara...@google.com
OK -- we do not support the 'worker' MPM or the 'thread' MPM.

We would like to do so -- just haven't tested it yet.

We did find and fix a problem recently that could easily explain this error, and it's in the 'trunk' if you want to build from source.  But it's not in a binary distribution yet.  But even if you do this, we haven't load-tested 'worker' or 'threads' yet so you may see other issues.

Please stay tuned; support of more modern MPMs is bubbling to the top of our priority queue.

Nov 11, 2010
#32 mark.har...@gmail.com
Hi,

It's prefork, was posting these in a rush appologies.  Bad day in the office and all that...
Nov 11, 2010
#33 mark.har...@gmail.com
Heres the config, turned everything on to make sure we got full perfomance.
pagespeed.txt
5.3 KB   View   Download
Nov 12, 2010
#34 nickz...@gmail.com
Hey guys,

I have changed the LogLevel to Warn, attached is the error_log after enabling pagespeed again on my webserver. By the way, Im not sure if its because of the low traffic Im getting today but the server didnt get overloaded immediately, it took about 10min-12min to start getting high load.



error_log.txt
1.1 MB   View   Download
Nov 12, 2010
#35 mark.har...@gmail.com
Ok,

Something a bit more useful from the error logs just before we get all the segfaults get a lot of:

[Fri Nov 12 20:27:38 2010] [error] log_message_handler: [1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: [1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: [1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: [1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: [1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: [1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)
[Fri Nov 12 20:27:38 2010] [error] log_message_handler: [1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)

I'd disabled all css enhancements other than the ones in the core.
Nov 12, 2010
#36 abl...@google.com
Ok, I was able to reproduce a process-blowup on my machine, and I have a fix for it.  It will be hitting the SVN repository shortly, with binaries to come soon after.  So hopefully this is fixed for real.

nickzoid, your error_log shows some very troubling messages that suggest memory corruption.  I just opened  Issue 79  to track that.

For everyone else, I encourage you to update from SVN if possible, or try the new binaries when they hit, and let me know if your process blowup disappears.
Status: Fixed
Nov 13, 2010
#37 mark.har...@gmail.com
I built from source and after around one hour, I assume it had been commited when i've built this?

top - 16:44:30 up 24 days,  1:55,  2 users,  load average: 978.57, 1140.73, 645.74

Built:   mod-pagespeed-beta-0.9.1.1-184.x86_64.rpm

pure virtual method called
terminate called without an active exception
pure virtual method called
terminate called without an active exception
[Sat Nov 13 16:36:24 2010] [notice] child pid 1745 exit signal Aborted (6)
[Sat Nov 13 16:36:26 2010] [notice] child pid 2026 exit signal Aborted (6)

Nov 13, 2010
Project Member #38 sligocki@google.com
@mark.harburn, I created a new  issue 81  for the "ReadUTFChar not supported (non-icu build)" error message. Please provide any more details and let me know if you're still getting that message.
Nov 17, 2010
#39 clement....@gmail.com
Hi,

I've always have the problem on r220.

LoadAverage is growing up very fast. It come only when I load prestashop website.

Did you need any informations to resolve this bug ? 

Thanks.
Nov 17, 2010
#40 nickz...@gmail.com
Hello,

I just tested the .1-228 version and Im still getting the same overload issue.


Nov 18, 2010
#41 cleverc...@gmail.com
mod-pagespeed-beta 0.9.8.1-r215
Debian Lenny x64 5.0.6
Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny9

confirm the same problem.
Nov 19, 2010
Project Member #42 jmara...@google.com
Hi nickzoid, clement, and clevercold.

We were able to reproduce a server-load problem caused by  Issue 85 .  This issue has now been resolved in the SVN tree and the new branch:

gclient config http://modpagespeed.googlecode.com/svn/tags/0.9.10.1/src (r238 or later on the trunk).

If you can build from source, please let us know if this fixes the symptom for you.

Sign in to add a comment

Powered by Google Project Hosting