
phusion-passenger - issue #16
*** Cannot initialize Passenger: write() failed: Bad file descriptor (9) - FreeBSD 7
What steps will reproduce the problem?
- Install Ruby, Apache 2.2.8 from ports on FreeBSD 7.0
- Install Passenger via ruby-gems
- Configure Apache as instructed from 'passenger-install-apache2-module'
- Start Apache
What is the expected output? What do you see instead?
Expected to see (in httpd-error.log):
[36970:ApplicationPoolClientServer.h:426] Client 0x8291200: received message: ['setMax', '20'] [36970:ApplicationPoolClientServer.h:426] Client 0x8291200: received message: ['setMaxIdleTime', '120']
However, instead I get:
[Mon Apr 14 15:36:29 2008] [notice] Apache/2.2.8 (FreeBSD) mod_ssl/2.2.8 OpenSSL/0.9.8e DAV/2 Phusion_Passenger/1.0.1 configured -- resuming normal operations * Cannot initialize Passenger: write() failed: Bad file descriptor (9)[77505:ApplicationPoolClientServer.h:394] Cannot send a file descriptor: Bad file descriptor (9) --- aborting! [Mon Apr 14 15:36:29 2008] [notice] seg fault or similar nasty error detected in the parent process
And the Apache parent process dies, thus Apache does not accept anymore incoming connections.
What version of the product are you using? On what operating system?
FreeBSD supertax 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Thu Nov 22 14:59:36 NZDT 2007
root@supertax:/usr/obj/usr/src/sys/GENERIC amd64
Ruby 1.8.6 RubyGems 1.1.1 Passenger 1.0.1 Apache 2.2.8
Comment #1
Posted on Apr 14, 2008 by Swift LionThe unit tests fail also:
In test:
./Apache2ModuleTests
ApplicationPoolServerTest: ..[77749:../ext/apache2/ApplicationPoolClientServer.h:394] Cannot send a file descriptor: Bad file descriptor (9) --- aborting! rake aborted! Command failed with status (): [./Apache2ModuleTests...] /usr/local/lib/ruby/gems/1.8/gems/passenger-1.0.1/Rakefile:216
Comment #2
Posted on Apr 14, 2008 by Swift LionWorked fine for me on FreeBSD 6.2-RELEASE
Comment #3
Posted on Apr 14, 2008 by Swift LionIf I create the file, /etc/libmap.conf with:
libthr.so.3 libkse.so.3
The tests will actually run and pass a few, then eventually hang at:
In test:
./Apache2ModuleTests
pplicationPoolServerTest: ....[5=F] ApplicationPoolServer_ApplicationPoolTest: [1=X][2=X]
Looks like threading bugs in both libthr and libkse
Comment #4
Posted on Apr 14, 2008 by Quick CatWorked fine for me on FreeBSD 6.3-RELEASE
do you try use in 7.0-RELEASE instead of FreeBSD 7.0-BETA3?
Comment #5
Posted on Apr 14, 2008 by Swift LionYes, no change
Comment #6
Posted on Apr 14, 2008 by Grumpy Camel(No comment was entered for this change.)
Comment #7
Posted on Apr 15, 2008 by Swift LionIncidentally, if I set libmap.conf on FreeBSD 6.2 to:
libpthread.so.2 libthr.so.2
(So they all use the libthr threading instead of KSE)
All the tests pass successfully
Comment #8
Posted on Apr 15, 2008 by Grumpy CamelSo if I understand it correctly, libthr.so.2 is a replacement for libpthread, and it was introduced in newer versions of FreeBSD? Is libthr also available on FreeBSD 6?
Comment #9
Posted on Apr 15, 2008 by Grumpy CamelHm, FreeBSD 5 has libthr as well, but if I link to libthr in FreeBSD 5 then the unit tests segfault at startup. What happens if one links to libthr in FreeBSD 6?
Comment #10
Posted on Apr 15, 2008 by Swift LionIn FreeBSD 5 and 6, libpthread is libkse (an M:N threading library). libthr is a 1:1 threading library provided as an alternative. In FreeBSD 7, libthr became the default as it outperformed libkse in almost all situations.
In FreeBSD 6, if I use libthr (as mapped in libmap), all the tests run sucessfully.
Comment #11
Posted on Apr 15, 2008 by Grumpy CamelOkay, so I think the solution is to autodetect FreeBSD >= 6 in the Rakefile, and then use '-lthr' instead of '-lpthread'.
Currently I have a deadline for a non-Passenger-related project, so a patch is will be much appreciated. :)
What happened to libpthread by the way? Why does the new version fail while the older versions pass?
Comment #12
Posted on Apr 15, 2008 by Swift LionI think we may have got confused along the way here!
FreeBSD 6, libpthread(libkse): Passes libthr: Passes FreeBSD 7, libkse: Fails libthr: Fails
Comment #13
Posted on Apr 16, 2008 by Swift LionSeems to be dying in MessageChannel#writeFileDescriptor(int fileDescriptor) at line 290 (MessageChannel.h):
if (sendmsg(fd, &msg, 0) == -1) {
throw SystemException("Cannot send file descriptor with sendmsg()", errno);
}
If I hack in a ::write() (MessageChannel defines it's own write()), to either fd or fileDescriptor (which gets added to the msghdr struct passed to sendmsg()) just before the sendmsg(), I don't get EBADF from either descriptor.
It seems both descriptors are valid, but sendmsg() is failing.
Comment #14
Posted on Apr 16, 2008 by Swift LionIf I put the writes()s after the sendmsg(), they also return without error and with the correct number of bytes written.
Comment #15
Posted on Apr 16, 2008 by Grumpy CamelThis problem is confirmed to be a kernel bug in FreeBSD 7. The file descriptor passing unit test in Ruby also fails on FreeBSD 7.
Comment #16
Posted on Apr 16, 2008 by Swift LionFurther to this... it seems there's a problem in FreeBSD 7 with FD passing with sendmsg() in general.
The Ruby 1.8.6 unit test for FD passing also fails, without any threading involved.
Comment #17
Posted on Apr 19, 2008 by Grumpy CamelHi. I suspect that this is not a FreeBSD kernel issue, but rather a mistake in the file descriptor passing code in Ruby (and in Passenger, since that's based on the code in Ruby). We found a very similar issue on 64-bit MacOS X, and it was fixed a few minutes ago.
Could you test the latest development version (i.e. the one in git) against FreeBSD 7? Thanks.
Comment #18
Posted on Apr 19, 2008 by Swift LionHi,
The tests hang after the second test, as per below:
[root@supertax ~/FooBarWidget-passenger-dfc09b7292ab499fa0f8b99af0d42872fd4ea8ef/test]# ./Apache2ModuleTests
ApplicationPoolServerTest: ..^C
At that stage the foreground process is 'ruby spawn_server.rb' and waiting in 'select()'
So I compiled the MessageChannel tests by itself and ran that one and get the following:
[root@supertax ~/FooBarWidget-passenger-dfc09b7292ab499fa0f8b99af0d42872fd4ea8ef/test]# ./MyTests
MessageChannelTest: .....hello
It hangs there and is stuck in state 'piperd'.
If I install it into Apache, update the config to the correct locations and start apache I get the following constantly repeating in httpd-error.log:
[Sun Apr 20 10:50:20 2008] [notice] child pid 37703 exit signal Abort trap (6) * Cannot initialize Passenger: write() failed: Bad file descriptor (9)[Sun Apr 20 10:50:22 2008] [notice] child pid 37711 exit signal Abort trap (6) Cannot initialize Passenger: write() failed: Bad file descriptor (9) Cannot initialize Passenger: write() failed: Bad file descriptor (9)[Sun Apr 20 10:50:23 2008] [notice] child pid 37713 exit signal Abort trap (6) [Sun Apr 20 10:50:23 2008] [notice] child pid 37712 exit signal Abort trap (6) Cannot initialize Passenger: write() failed: Bad file descriptor (9) Cannot initialize Passenger: write() failed: Bad file descriptor (9)* Cannot initialize Passenger: write() failed: Bad file descriptor (9)* Cannot initialize Passenger: write() failed: Bad file descriptor (9)
Comment #19
Posted on Apr 20, 2008 by Grumpy Camel"* Cannot initialize Passenger: write() failed: Bad file descriptor (9)[Sun Apr 20 10:50:22 2008]..."
The "Cannot initialize Passenger" message is missing a newline ("\n"). This bug was fixed a while ago, so the fact that you see it makes me suspect that you didn't compile/install the module correctly. Could you check whether that's the case? Thanks.
Comment #20
Posted on Apr 20, 2008 by Grumpy CamelAnd please pull the latest version from git. I've added some more error checking code to the unit tests.
Comment #21
Posted on Apr 20, 2008 by Swift LionSee my log attached. The load: 0.00 cmd ruby ...
output is from Ctrl+T or SIGINFO, which Linux doesn't have, but it's still hung after the first two tests. But
as you can see if I run it over and over again it sometimes starts to pass a few.
No newlines still, unless I'm working github wrong.
- typescript.txt 10.06KB
Comment #22
Posted on Apr 20, 2008 by Grumpy CamelSorry, my mistake. The newline fix was not in the 'master' branch but in a private branch that I haven't merged yet.
Your typescript seems to be fine. Then maybe it really is FreeBSD that's bugged.
Comment #23
Posted on Apr 22, 2008 by Swift RabbitSame issue. From httpd-error.log:
* Cannot initialize Passenger: write() failed: Bad file descriptor (9)* Cannot initialize Passenger: write() failed: Bad f ile descriptor (9)* Cannot initialize Passenger: write() failed: Bad file descriptor (9)
FreeBSD 6.2-RELEASE-p9 (SMP)
Passenger was installed via "gem install passenger". The system is an 2x4 core Xeon setup.
I have a test configuration upon which passenger seems to be working fine:
FreeBSD 6.2-RELEASE-p8 (GENERIC)
Which is a single 333Mhz Pentium 2. On this machine, passenger was compiled from source.
Comment #24
Posted on Apr 22, 2008 by Grumpy CamelJacob, is your Xeon setup 64-bit? And could you give the development version (git repository) a try?
Comment #25
Posted on Apr 23, 2008 by Swift RabbitWhoops, I guess that would have been an important thing to mention.
Yes, the Xeon setup is running AMD64. I gave the development version a try yesterday after reading through here a bit, but I got pretty much the same thing:
[Tue Apr 22 15:51:33 2008] [notice] Apache/2.2.8 (FreeBSD) mod_ssl/2.2.8 OpenSSL/0.9.8g DAV/2 SVN/1.4.4 PHP/5.2.5 with Suhosin -Patch Phusion_Passenger/1.0.1 configured -- resuming normal operations [Tue Apr 22 15:51:33 2008] [notice] child pid 74446 exit signal Abort trap (6) * Cannot initialize Passenger: write() failed: Bad file descriptor (9) * Cannot initialize Passenger: write() failed: Bad file descriptor (9) [Tue Apr 22 15:51:34 2008] [notice] child pid 74448 exit signal Abort trap (6) [Tue Apr 22 15:51:34 2008] [notice] child pid 74447 exit signal Abort trap (6) * Cannot initialize Passenger: write() failed: Bad file descriptor (9)
Comment #26
Posted on May 2, 2008 by Grumpy MonkeyHi
I guess it's a bit related, I cannot start apache 2.2.8 with passenger 1.04. Error log says: Fatal error 'mutex is on list' at line 540 in file /usr/src/lib/libpthread/thread/thr_mutex.c (errno = 0)
I'm running FreeBSD 6.2-RC1 (SMP) on a PowerEdge 2450 (I guess it's two 4-years old Xeon 64 bits)
Will test the git dev version
Comment #27
Posted on May 2, 2008 by Grumpy MonkeyOk I've just cloned the passenger repository and tests are successful. Just installed too. Hope it's ok now.
Comment #28
Posted on May 7, 2008 by Happy GiraffeI found this thread searching for the error in comment #26. I'm just trying to install passenger on my freebsd box (6.2, 32-bit). I compiled apache 2.2.8 from ports and installed passenger with gems.
I get "Fatal error 'mutex is on list' at line 540 in file /usr/src/lib/libpthread/thread/thr_mutex.c (errno = 0)
This is while loading the passenger module (I don't initialize anything else).
Basically, should I just get the latest passenger from cvs?
Comment #29
Posted on May 7, 2008 by Happy GiraffeAlright, I did the same (checked out the repository) and it fixed my issues.
Comment #30
Posted on May 7, 2008 by Swift LionWith Passenger 1.0.5, all the MessageChannelTests pass perfectly on FreeBSD 7. The tests involving Ruby still end up throwing EBADF. So it seems the C++/Apache side of the FD passing now works... the Ruby side is still broken.
Comment #31
Posted on May 22, 2008 by Grumpy MonkeyAnything new ? :(
Comment #32
Posted on May 22, 2008 by Grumpy CamelNo new information.
Comment #33
Posted on May 24, 2008 by Grumpy MonkeyThe FreeBSD-bugs mailing list says:
"It's not clear to me what the issue is (or is claimed to be). Some comments say file descriptor passing is broken, some of them say it is a bug in the third party code, some of them say there is a thread library problem (but the error 'mutex is already on list' indicates it's a mis-compiled binary).
Can we get a clear statement of the bug report please? :)"
Comment #34
Posted on May 24, 2008 by Grumpy CamelWe don't know any more than they do. :( If someone can come up with file descriptor passing code that does work on FreeBSD 7, then that would prove that there's a (portability?) bug in our code.
Comment #35
Posted on May 24, 2008 by Grumpy MonkeyI don't have a deep experience in C development; however I may test any code you can provide. Regards
Comment #36
Posted on May 26, 2008 by Quick RabbitI don't know if this helps, but adding '-D__APPLE__' to the Rakefile on the dev-version allows the tests to come a bit further.
Pristine clone:
ApplicationPoolServerTest: .....Rails Error: Unable to access log file. Please ensure that /root/passenger/test/stub/railsapp/log/production.log exists and is chmod 0666. The log level has been raised to WARN and the output directed to STDERR until the problem is fixed. * Exception Errno::EBADF in application (Bad file descriptor - sendmsg(2)) (process 57497):
With '-D__APPLE__':
ApplicationPoolServerTest: ....[5=F] ApplicationPoolServer_ApplicationPoolTest: [1=X][2=X][3=X][4=X][5=X][6=X][7=X][8=X][9=X][10=X][11=X][12=X][13=X][14=X][15=X][16=X][17=X] MessageChannelTest: .....
...and then it just hangs there indefinitely. Also, using the module in Apache results in:
[Mon May 26 16:12:20 2008] [notice] child pid 56399 exit signal Abort trap (6) [ pid=56400 file=Hooks.cpp:400 time=05/26/08 16:12:21.665 ]: Cannot initialize Passenger in an Apache child process: write() failed: Bad file descriptor (9) (this warning is harmless if you're currently restarting or shutting down Apache)
being repeated over and over in the error-log.
Comment #37
Posted on May 26, 2008 by Quick RabbitRunning 'rake --trace test' produces the following:
ApplicationPoolServerTest: ..... ApplicationPoolServer_ApplicationPoolTest: [1=X][2=X][3=X][4=X][5=X][6=X]Assertion failed: (!ret), function unlock, file ../boost/thread/pthread/mutex.hpp, line 66. [7=X][8=X][9=X][10=X][11=X][12=X][13=X][14=X][15=X][16=X][17=X] MessageChannelTest: ........../stub/../../lib/passenger/utils.rb:223: [BUG] rb_sys_fail(No valid file descriptor received.) - errno == 0
Comment #38
Posted on May 27, 2008 by Quick RabbitOkay, this is definitely a 64-bit problem. Phusion Passenger 1.0.5 works fine on a 64-bit FreeBSD 7, if all ports are built for i386.
Comment #39
Posted on May 27, 2008 by Quick Rabbithttp://lists.canonical.org/pipermail/kragen-hacks/2002-January/000292.html has some code to do file descriptor passing, and it works flawlessly on FreeBSD 7/amd64.
novarg# uname -r 7.0-RELEASE-p1 novarg# uname -p amd64
novarg# ./test.sh
hellofdpass' is up to date.
portlisten' is up to date.
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
hello, world
Connection closed by foreign host.
Terminated
Terminated
Comment #40
Posted on May 27, 2008 by Grumpy Camel(No comment was entered for this change.)
Comment #41
Posted on Jun 5, 2008 by Grumpy Camelhttp://blog.typosphere.org/2008/06/05/moving-from-trac-to-redmine-and-other-upcoming-plans is reporting success on FreeBSD 7.
Comment #42
Posted on Jun 5, 2008 by Quick RabbitI've got it running on FreeBSD 7, too, but only in 32-bit mode. I suspect that's what they've done, too.
Comment #43
Posted on Jun 19, 2008 by Happy WombatHi, anything new on this?
Comment #44
Posted on Jun 19, 2008 by Happy WombatComment deleted
Comment #45
Posted on Jun 19, 2008 by Grumpy Monkey@42: how did you compile that way in 32bits ?
Comment #46
Posted on Jun 30, 2008 by Quick RabbitI compiled it in a chroot with an i386-world installed in it. It's by no means optimal :-)
Comment #47
Posted on Jul 11, 2008 by Quick RabbitThe closest I've come to finding a cause of this bug is in the unp_externalize()- function in the FreeBSD-kernel, which is responsible for copying the file-descriptors between processes.
It seems it thinks the incoming fd's are 'struct file *' in size, which won't work if pointers are 8 bytes and fd's are 4. The XNU kernel source has the same function, imported from FreeBSD many years ago, and altered somewhat in the meantime, and it mentions that the code assumes pointers are sizeof(int).
What bothers me then, is the fact that it works on 64-bit Mac OS X, and that the example code I linked to earlier works. I'm no kernel-hacker, so I'm not sure I'll be able to get any longer than this, if this is even the right path.
Comment #48
Posted on Jul 22, 2008 by Grumpy Hippohi folks, any news :) ?
Comment #49
Posted on Jul 22, 2008 by Happy Wombatworks for me in 32 bit mode on FreeBSD 7
Comment #50
Posted on Jul 24, 2008 by Grumpy CamelI've taken some time to install 64-bit FreeBSD 7 myself. The problem turns out to be caused by Ruby. Ruby's UNIXSocket#send_io and UNIXSocket#recv_io implementations are broken on 64-bit FreeBSD. I've overrided them with our own implementation, and now all tests pass.
This problem should be fixed as of commit 2de5e1c93be18d167b69d0a1b95ca76e4fccece5.
Comment #51
Posted on Jul 24, 2008 by Grumpy MonkeyCan't wait next stable gem :)
Comment #52
Posted on Jul 28, 2008 by Quick RabbitConfirmed working - thanks!
Comment #53
Posted on Jul 31, 2008 by Grumpy CamelYou guys can go ahead and use the git version. It's stable enough: wiki.rubyonrails.org is running on it.
Status: Fixed
Labels:
Type-Defect
Priority-Medium
OpSys-BSD
Portability
Milestone-2.1.0