Export to GitHub

phusion-passenger - issue #16

*** Cannot initialize Passenger: write() failed: Bad file descriptor (9) - FreeBSD 7


Posted on Apr 14, 2008 by Swift Lion

What steps will reproduce the problem?

  1. Install Ruby, Apache 2.2.8 from ports on FreeBSD 7.0
  2. Install Passenger via ruby-gems
  3. Configure Apache as instructed from 'passenger-install-apache2-module'
  4. Start Apache

What is the expected output? What do you see instead?

Expected to see (in httpd-error.log):

[36970:ApplicationPoolClientServer.h:426] Client 0x8291200: received message: ['setMax', '20'] [36970:ApplicationPoolClientServer.h:426] Client 0x8291200: received message: ['setMaxIdleTime', '120']

However, instead I get:

[Mon Apr 14 15:36:29 2008] [notice] Apache/2.2.8 (FreeBSD) mod_ssl/2.2.8 OpenSSL/0.9.8e DAV/2 Phusion_Passenger/1.0.1 configured -- resuming normal operations * Cannot initialize Passenger: write() failed: Bad file descriptor (9)[77505:ApplicationPoolClientServer.h:394] Cannot send a file descriptor: Bad file descriptor (9) --- aborting! [Mon Apr 14 15:36:29 2008] [notice] seg fault or similar nasty error detected in the parent process

And the Apache parent process dies, thus Apache does not accept anymore incoming connections.

What version of the product are you using? On what operating system?

FreeBSD supertax 7.0-BETA3 FreeBSD 7.0-BETA3 #0: Thu Nov 22 14:59:36 NZDT 2007
root@supertax:/usr/obj/usr/src/sys/GENERIC amd64

Ruby 1.8.6 RubyGems 1.1.1 Passenger 1.0.1 Apache 2.2.8

Comment #1

Posted on Apr 14, 2008 by Swift Lion

The unit tests fail also:

In test:

./Apache2ModuleTests

ApplicationPoolServerTest: ..[77749:../ext/apache2/ApplicationPoolClientServer.h:394] Cannot send a file descriptor: Bad file descriptor (9) --- aborting! rake aborted! Command failed with status (): [./Apache2ModuleTests...] /usr/local/lib/ruby/gems/1.8/gems/passenger-1.0.1/Rakefile:216

Comment #2

Posted on Apr 14, 2008 by Swift Lion

Worked fine for me on FreeBSD 6.2-RELEASE

Comment #3

Posted on Apr 14, 2008 by Swift Lion

If I create the file, /etc/libmap.conf with:

libthr.so.3 libkse.so.3

The tests will actually run and pass a few, then eventually hang at:

In test:

./Apache2ModuleTests

pplicationPoolServerTest: ....[5=F] ApplicationPoolServer_ApplicationPoolTest: [1=X][2=X]

Looks like threading bugs in both libthr and libkse

Comment #4

Posted on Apr 14, 2008 by Quick Cat

Worked fine for me on FreeBSD 6.3-RELEASE

do you try use in 7.0-RELEASE instead of FreeBSD 7.0-BETA3?

Comment #5

Posted on Apr 14, 2008 by Swift Lion

Yes, no change

Comment #6

Posted on Apr 14, 2008 by Grumpy Camel

(No comment was entered for this change.)

Comment #7

Posted on Apr 15, 2008 by Swift Lion

Incidentally, if I set libmap.conf on FreeBSD 6.2 to:

libpthread.so.2 libthr.so.2

(So they all use the libthr threading instead of KSE)

All the tests pass successfully

Comment #8

Posted on Apr 15, 2008 by Grumpy Camel

So if I understand it correctly, libthr.so.2 is a replacement for libpthread, and it was introduced in newer versions of FreeBSD? Is libthr also available on FreeBSD 6?

Comment #9

Posted on Apr 15, 2008 by Grumpy Camel

Hm, FreeBSD 5 has libthr as well, but if I link to libthr in FreeBSD 5 then the unit tests segfault at startup. What happens if one links to libthr in FreeBSD 6?

Comment #10

Posted on Apr 15, 2008 by Swift Lion

In FreeBSD 5 and 6, libpthread is libkse (an M:N threading library). libthr is a 1:1 threading library provided as an alternative. In FreeBSD 7, libthr became the default as it outperformed libkse in almost all situations.

In FreeBSD 6, if I use libthr (as mapped in libmap), all the tests run sucessfully.

Comment #11

Posted on Apr 15, 2008 by Grumpy Camel

Okay, so I think the solution is to autodetect FreeBSD >= 6 in the Rakefile, and then use '-lthr' instead of '-lpthread'.

Currently I have a deadline for a non-Passenger-related project, so a patch is will be much appreciated. :)

What happened to libpthread by the way? Why does the new version fail while the older versions pass?

Comment #12

Posted on Apr 15, 2008 by Swift Lion

I think we may have got confused along the way here!

FreeBSD 6, libpthread(libkse): Passes libthr: Passes FreeBSD 7, libkse: Fails libthr: Fails

Comment #13

Posted on Apr 16, 2008 by Swift Lion

Seems to be dying in MessageChannel#writeFileDescriptor(int fileDescriptor) at line 290 (MessageChannel.h):

if (sendmsg(fd, &msg, 0) == -1) {
        throw SystemException("Cannot send file descriptor with sendmsg()", errno);
}

If I hack in a ::write() (MessageChannel defines it's own write()), to either fd or fileDescriptor (which gets added to the msghdr struct passed to sendmsg()) just before the sendmsg(), I don't get EBADF from either descriptor.

It seems both descriptors are valid, but sendmsg() is failing.

Comment #14

Posted on Apr 16, 2008 by Swift Lion

If I put the writes()s after the sendmsg(), they also return without error and with the correct number of bytes written.

Comment #15

Posted on Apr 16, 2008 by Grumpy Camel

This problem is confirmed to be a kernel bug in FreeBSD 7. The file descriptor passing unit test in Ruby also fails on FreeBSD 7.

Comment #16

Posted on Apr 16, 2008 by Swift Lion

Further to this... it seems there's a problem in FreeBSD 7 with FD passing with sendmsg() in general.

The Ruby 1.8.6 unit test for FD passing also fails, without any threading involved.

http://pastie.org/181634

Comment #17

Posted on Apr 19, 2008 by Grumpy Camel

Hi. I suspect that this is not a FreeBSD kernel issue, but rather a mistake in the file descriptor passing code in Ruby (and in Passenger, since that's based on the code in Ruby). We found a very similar issue on 64-bit MacOS X, and it was fixed a few minutes ago.

Could you test the latest development version (i.e. the one in git) against FreeBSD 7? Thanks.

Comment #18

Posted on Apr 19, 2008 by Swift Lion

Hi,

The tests hang after the second test, as per below:

[root@supertax ~/FooBarWidget-passenger-dfc09b7292ab499fa0f8b99af0d42872fd4ea8ef/test]# ./Apache2ModuleTests

ApplicationPoolServerTest: ..^C

At that stage the foreground process is 'ruby spawn_server.rb' and waiting in 'select()'

So I compiled the MessageChannel tests by itself and ran that one and get the following:

[root@supertax ~/FooBarWidget-passenger-dfc09b7292ab499fa0f8b99af0d42872fd4ea8ef/test]# ./MyTests

MessageChannelTest: .....hello

It hangs there and is stuck in state 'piperd'.

If I install it into Apache, update the config to the correct locations and start apache I get the following constantly repeating in httpd-error.log:

[Sun Apr 20 10:50:20 2008] [notice] child pid 37703 exit signal Abort trap (6) * Cannot initialize Passenger: write() failed: Bad file descriptor (9)[Sun Apr 20 10:50:22 2008] [notice] child pid 37711 exit signal Abort trap (6) Cannot initialize Passenger: write() failed: Bad file descriptor (9) Cannot initialize Passenger: write() failed: Bad file descriptor (9)[Sun Apr 20 10:50:23 2008] [notice] child pid 37713 exit signal Abort trap (6) [Sun Apr 20 10:50:23 2008] [notice] child pid 37712 exit signal Abort trap (6) Cannot initialize Passenger: write() failed: Bad file descriptor (9) Cannot initialize Passenger: write() failed: Bad file descriptor (9)* Cannot initialize Passenger: write() failed: Bad file descriptor (9)* Cannot initialize Passenger: write() failed: Bad file descriptor (9)

Comment #19

Posted on Apr 20, 2008 by Grumpy Camel

"* Cannot initialize Passenger: write() failed: Bad file descriptor (9)[Sun Apr 20 10:50:22 2008]..."

The "Cannot initialize Passenger" message is missing a newline ("\n"). This bug was fixed a while ago, so the fact that you see it makes me suspect that you didn't compile/install the module correctly. Could you check whether that's the case? Thanks.

Comment #20

Posted on Apr 20, 2008 by Grumpy Camel

And please pull the latest version from git. I've added some more error checking code to the unit tests.

Comment #21

Posted on Apr 20, 2008 by Swift Lion

See my log attached. The load: 0.00 cmd ruby ... output is from Ctrl+T or SIGINFO, which Linux doesn't have, but it's still hung after the first two tests. But as you can see if I run it over and over again it sometimes starts to pass a few.

No newlines still, unless I'm working github wrong.

Attachments

Comment #22

Posted on Apr 20, 2008 by Grumpy Camel

Sorry, my mistake. The newline fix was not in the 'master' branch but in a private branch that I haven't merged yet.

Your typescript seems to be fine. Then maybe it really is FreeBSD that's bugged.

Comment #23

Posted on Apr 22, 2008 by Swift Rabbit

Same issue. From httpd-error.log:

* Cannot initialize Passenger: write() failed: Bad file descriptor (9)* Cannot initialize Passenger: write() failed: Bad f ile descriptor (9)* Cannot initialize Passenger: write() failed: Bad file descriptor (9)

FreeBSD 6.2-RELEASE-p9 (SMP)

Passenger was installed via "gem install passenger". The system is an 2x4 core Xeon setup.

I have a test configuration upon which passenger seems to be working fine:

FreeBSD 6.2-RELEASE-p8 (GENERIC)

Which is a single 333Mhz Pentium 2. On this machine, passenger was compiled from source.

Comment #24

Posted on Apr 22, 2008 by Grumpy Camel

Jacob, is your Xeon setup 64-bit? And could you give the development version (git repository) a try?

Comment #25

Posted on Apr 23, 2008 by Swift Rabbit

Whoops, I guess that would have been an important thing to mention.

Yes, the Xeon setup is running AMD64. I gave the development version a try yesterday after reading through here a bit, but I got pretty much the same thing:

[Tue Apr 22 15:51:33 2008] [notice] Apache/2.2.8 (FreeBSD) mod_ssl/2.2.8 OpenSSL/0.9.8g DAV/2 SVN/1.4.4 PHP/5.2.5 with Suhosin -Patch Phusion_Passenger/1.0.1 configured -- resuming normal operations [Tue Apr 22 15:51:33 2008] [notice] child pid 74446 exit signal Abort trap (6) * Cannot initialize Passenger: write() failed: Bad file descriptor (9) * Cannot initialize Passenger: write() failed: Bad file descriptor (9) [Tue Apr 22 15:51:34 2008] [notice] child pid 74448 exit signal Abort trap (6) [Tue Apr 22 15:51:34 2008] [notice] child pid 74447 exit signal Abort trap (6) * Cannot initialize Passenger: write() failed: Bad file descriptor (9)

Comment #26

Posted on May 2, 2008 by Grumpy Monkey

Hi

I guess it's a bit related, I cannot start apache 2.2.8 with passenger 1.04. Error log says: Fatal error 'mutex is on list' at line 540 in file /usr/src/lib/libpthread/thread/thr_mutex.c (errno = 0)

I'm running FreeBSD 6.2-RC1 (SMP) on a PowerEdge 2450 (I guess it's two 4-years old Xeon 64 bits)

Will test the git dev version

Comment #27

Posted on May 2, 2008 by Grumpy Monkey

Ok I've just cloned the passenger repository and tests are successful. Just installed too. Hope it's ok now.

Comment #28

Posted on May 7, 2008 by Happy Giraffe

I found this thread searching for the error in comment #26. I'm just trying to install passenger on my freebsd box (6.2, 32-bit). I compiled apache 2.2.8 from ports and installed passenger with gems.

I get "Fatal error 'mutex is on list' at line 540 in file /usr/src/lib/libpthread/thread/thr_mutex.c (errno = 0)

This is while loading the passenger module (I don't initialize anything else).

Basically, should I just get the latest passenger from cvs?

Comment #29

Posted on May 7, 2008 by Happy Giraffe

Alright, I did the same (checked out the repository) and it fixed my issues.

Comment #30

Posted on May 7, 2008 by Swift Lion

With Passenger 1.0.5, all the MessageChannelTests pass perfectly on FreeBSD 7. The tests involving Ruby still end up throwing EBADF. So it seems the C++/Apache side of the FD passing now works... the Ruby side is still broken.

Comment #31

Posted on May 22, 2008 by Grumpy Monkey

Anything new ? :(

Comment #32

Posted on May 22, 2008 by Grumpy Camel

No new information.

Comment #33

Posted on May 24, 2008 by Grumpy Monkey

The FreeBSD-bugs mailing list says:

"It's not clear to me what the issue is (or is claimed to be). Some comments say file descriptor passing is broken, some of them say it is a bug in the third party code, some of them say there is a thread library problem (but the error 'mutex is already on list' indicates it's a mis-compiled binary).

Can we get a clear statement of the bug report please? :)"

Comment #34

Posted on May 24, 2008 by Grumpy Camel

We don't know any more than they do. :( If someone can come up with file descriptor passing code that does work on FreeBSD 7, then that would prove that there's a (portability?) bug in our code.

Comment #35

Posted on May 24, 2008 by Grumpy Monkey

I don't have a deep experience in C development; however I may test any code you can provide. Regards

Comment #36

Posted on May 26, 2008 by Quick Rabbit

I don't know if this helps, but adding '-D__APPLE__' to the Rakefile on the dev-version allows the tests to come a bit further.

Pristine clone:

ApplicationPoolServerTest: .....Rails Error: Unable to access log file. Please ensure that /root/passenger/test/stub/railsapp/log/production.log exists and is chmod 0666. The log level has been raised to WARN and the output directed to STDERR until the problem is fixed. * Exception Errno::EBADF in application (Bad file descriptor - sendmsg(2)) (process 57497):

With '-D__APPLE__':

ApplicationPoolServerTest: ....[5=F] ApplicationPoolServer_ApplicationPoolTest: [1=X][2=X][3=X][4=X][5=X][6=X][7=X][8=X][9=X][10=X][11=X][12=X][13=X][14=X][15=X][16=X][17=X] MessageChannelTest: .....

...and then it just hangs there indefinitely. Also, using the module in Apache results in:

[Mon May 26 16:12:20 2008] [notice] child pid 56399 exit signal Abort trap (6) [ pid=56400 file=Hooks.cpp:400 time=05/26/08 16:12:21.665 ]: Cannot initialize Passenger in an Apache child process: write() failed: Bad file descriptor (9) (this warning is harmless if you're currently restarting or shutting down Apache)

being repeated over and over in the error-log.

Comment #37

Posted on May 26, 2008 by Quick Rabbit

Running 'rake --trace test' produces the following:

ApplicationPoolServerTest: ..... ApplicationPoolServer_ApplicationPoolTest: [1=X][2=X][3=X][4=X][5=X][6=X]Assertion failed: (!ret), function unlock, file ../boost/thread/pthread/mutex.hpp, line 66. [7=X][8=X][9=X][10=X][11=X][12=X][13=X][14=X][15=X][16=X][17=X] MessageChannelTest: ........../stub/../../lib/passenger/utils.rb:223: [BUG] rb_sys_fail(No valid file descriptor received.) - errno == 0

Comment #38

Posted on May 27, 2008 by Quick Rabbit

Okay, this is definitely a 64-bit problem. Phusion Passenger 1.0.5 works fine on a 64-bit FreeBSD 7, if all ports are built for i386.

Comment #39

Posted on May 27, 2008 by Quick Rabbit

http://lists.canonical.org/pipermail/kragen-hacks/2002-January/000292.html has some code to do file descriptor passing, and it works flawlessly on FreeBSD 7/amd64.

novarg# uname -r 7.0-RELEASE-p1 novarg# uname -p amd64

novarg# ./test.sh hellofdpass' is up to date. portlisten' is up to date. Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. hello, world Connection closed by foreign host. Terminated Terminated

Comment #40

Posted on May 27, 2008 by Grumpy Camel

(No comment was entered for this change.)

Comment #41

Posted on Jun 5, 2008 by Grumpy Camel

http://blog.typosphere.org/2008/06/05/moving-from-trac-to-redmine-and-other-upcoming-plans is reporting success on FreeBSD 7.

Comment #42

Posted on Jun 5, 2008 by Quick Rabbit

I've got it running on FreeBSD 7, too, but only in 32-bit mode. I suspect that's what they've done, too.

Comment #43

Posted on Jun 19, 2008 by Happy Wombat

Hi, anything new on this?

Comment #44

Posted on Jun 19, 2008 by Happy Wombat

Comment deleted

Comment #45

Posted on Jun 19, 2008 by Grumpy Monkey

@42: how did you compile that way in 32bits ?

Comment #46

Posted on Jun 30, 2008 by Quick Rabbit

I compiled it in a chroot with an i386-world installed in it. It's by no means optimal :-)

Comment #47

Posted on Jul 11, 2008 by Quick Rabbit

The closest I've come to finding a cause of this bug is in the unp_externalize()- function in the FreeBSD-kernel, which is responsible for copying the file-descriptors between processes.

It seems it thinks the incoming fd's are 'struct file *' in size, which won't work if pointers are 8 bytes and fd's are 4. The XNU kernel source has the same function, imported from FreeBSD many years ago, and altered somewhat in the meantime, and it mentions that the code assumes pointers are sizeof(int).

What bothers me then, is the fact that it works on 64-bit Mac OS X, and that the example code I linked to earlier works. I'm no kernel-hacker, so I'm not sure I'll be able to get any longer than this, if this is even the right path.

Comment #48

Posted on Jul 22, 2008 by Grumpy Hippo

hi folks, any news :) ?

Comment #49

Posted on Jul 22, 2008 by Happy Wombat

works for me in 32 bit mode on FreeBSD 7

Comment #50

Posted on Jul 24, 2008 by Grumpy Camel

I've taken some time to install 64-bit FreeBSD 7 myself. The problem turns out to be caused by Ruby. Ruby's UNIXSocket#send_io and UNIXSocket#recv_io implementations are broken on 64-bit FreeBSD. I've overrided them with our own implementation, and now all tests pass.

This problem should be fixed as of commit 2de5e1c93be18d167b69d0a1b95ca76e4fccece5.

Comment #51

Posted on Jul 24, 2008 by Grumpy Monkey

Can't wait next stable gem :)

Comment #52

Posted on Jul 28, 2008 by Quick Rabbit

Confirmed working - thanks!

Comment #53

Posted on Jul 31, 2008 by Grumpy Camel

You guys can go ahead and use the git version. It's stable enough: wiki.rubyonrails.org is running on it.

Status: Fixed

Labels:
Type-Defect Priority-Medium OpSys-BSD Portability Milestone-2.1.0