What steps will reproduce the problem? 1. Leave molniya running on my debian sid system running icinga 1.2.1 2. Molniya terminates at random time with trace found below
What is the expected output? What do you see instead? Software should be robust and not bomb out. Yeah I could run it in a while loop, but it could just be robust in the first place. Why not back off for $time and retry parsing the file or something like that?
What version of the product are you using? On what operating system? trunk, r47 debian sid, icinga 1.2.1
Please provide any additional information below.
/usr/src/molniya-trunk/nagios.rb:75:in parse_object': unexpected line: (RuntimeError)
from /usr/src/molniya-trunk/nagios.rb:55:in
parse_status'
from /usr/src/molniya-trunk/nagios.rb:496:in parse'
from /usr/lib/ruby/1.8/pathname.rb:812:in
open'
from /usr/lib/ruby/1.8/pathname.rb:812:in open'
from /usr/src/molniya-trunk/nagios.rb:496:in
parse'
from /usr/src/molniya-trunk/nagios.rb:159:in _refresh'
from /usr/src/molniya-trunk/nagios.rb:127:in
refresh_if_needed'
from /usr/lib/ruby/1.8/monitor.rb:242:in synchronize'
from /usr/src/molniya-trunk/nagios.rb:124:in
refresh_if_needed'
from /usr/src/molniya-trunk/nagios.rb:442:in refresh_if_needed'
from /usr/src/molniya-trunk/nagios.rb:164:in
contents'
from /usr/src/molniya-trunk/molniya.rb:445:in status_report'
from /usr/src/molniya-trunk/molniya.rb:623:in
update_status_msg'
from /usr/src/molniya-trunk/molniya.rb:568:in run'
from /usr/src/molniya-trunk/molniya.rb:801:in
launch'
from -e:1
Comment #1
Posted on Nov 2, 2010 by Happy KangarooRight now I seem to have icinga in a state which makes this problem reproducible. I've saved both status.dat and . Let me know if you want those as I'm not going to attach them to a public bug.
In addition here's the relevant part of a strace -f. Maybe that gives you an idea of what's happening.
[pid 10688] gettimeofday({1288680165, 578538}, NULL) = 0
[pid 10688] stat64("/var/lib/icinga/status.dat", {st_mode=S_IFREG|0664, st_size=53296, ...}) = 0
[pid 10688] stat64("/var/lib/icinga/status.dat", {st_mode=S_IFREG|0664, st_size=53296, ...}) = 0
[pid 10688] open("/var/lib/icinga/status.dat", O_RDONLY|O_LARGEFILE) = 3
[pid 10688] fstat64(3, {st_mode=S_IFREG|0664, st_size=53296, ...}) = 0
[pid 10688] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb776c000
[pid 10688] read(3, "################################"..., 4096) = 4096
[pid 10688] read(3, "apping=0\n\tpercent_state_change=0"..., 4096) = 4096
[pid 10688] read(3, "=0\n\tcheck_command=check-host-ali"..., 4096) = 4096
[pid 10688] read(3, "0\n\tretry_interval=1.000000\n\teven"..., 4096) = 4096
[pid 10688] rt_sigprocmask(SIG_SETMASK, [PIPE], NULL, 8) = 0
[pid 10688] close(3) = 0
[pid 10688] munmap(0xb776c000, 4096) = 0
[pid 10688] write(2, "/usr/src/molniya-trunk/nagios.rb"..., 53/usr/src/molniya-trunk/nagios.rb:75:in `parse_object') = 53
[pid 10688] write(2, ": ", 2: ) = 2
[pid 10688] write(2, "unexpected line: ", 17unexpected line: ) = 17
[pid 10688] write(2, " (", 2 () = 2
[pid 10688] write(2, "RuntimeError", 12RuntimeError) = 12
[pid 10688] write(2, ")\n", 2)
) = 2
Hope that helps.
Comment #2
Posted on Nov 3, 2010 by Happy KangarooWhen one specific host came back online I was able to start molniya again. After looking at both status files, to me there are two things that look like possible causes here: - there is an empty line in that status block when it's down - the plugin_output filed contains an IPv6 address
See also attached snippets.
PS: I'm not convinced that this is also the source of the random termination during parsing. More likely it fails parsing something else too.
- singlet-status-snippets.txt 3.46KB
Status: New
Labels:
Type-Defect
Priority-Medium