What happened: I installed a 9front onto a qcow2 image through QEMU. I finished the installation, and QEMU rebooted. I killed the boot, and restarted QEMU to boot into the 9front installation. However it hung just after reaching the bootsector.
What was expected: After printing that the PC is booting from the Hard Drive / MBR, 9front should boot.
Steps to reproduce:
0. Install QEMU (I use Homebrew's bottled QEMU 2.1.2)
1. Download 9front ISO (9front-3853.02ebd469f43a.iso.bz2)
2. Create new QEMU image: qemu-img create -f qcow2 9front.qcow2.img 20G
.
3. Boot 9front ISO: qemu-system-i386 -hda 9front.qcow2.img -cdrom 9front-3853.02ebd469f43a.iso -boot d -vga std -m 1G
4. Install 9front onto "9front.qcow2.img"
5. Press enter at [finish], QEMU reboots (into the ISO, since the QEMU call hasn't changed)
6. Kill QEMU
7. Boot 9front installation: qemu-system-i386 -hda 9front.qcow2.img -boot c -vga std -m 1G
8. Boot hangs
Comment #1
Posted on Sep 28, 2014 by Happy Hippothat seems odd. there are way too many dots here. the pbs is responsible for loading the 2nd stage loader "9bootfat" from the root of the 9fat partition.
boot the iso, and in a rio window, type:
9fs 9fat
then compare the files:
/n/9fat/9bootfat with /386/9bootfat
like:
ls -l /n/9fat/9bootfat /386/9bootfat md5sum /n/9fat/9bootfat /386/9bootfat
they should be identical.
Comment #2
Posted on Sep 28, 2014 by Swift BirdSo /n/9fat/9bootfat didn't exist. I reran the installation, and I tried hjfs as well, but it didn't help.
Comment #3
Posted on Sep 28, 2014 by Happy Hippothat would make sense, the pbs is scanning the root directory looking for the file.
the 9fat partition is setup in the "bootsetup" step of the install process. re-run this step and check for error messages (scroll up if it scrolled away).
Comment #4
Posted on Sep 28, 2014 by Swift BirdOkay, just reinstalled again. If I run md5sum /n/9fat/9bootfat /386/9bootfat
at the end of the installation, just before finishing and rebooting, then /n/9fat/9bootfat is present, and the md5 sums match.
However, after rebooting, the boot still hangs. I'm stuck on the divide error bug, but I bet /n/9fat/9bootfat wouldn't exist, if I could get to rio.
Comment #5
Posted on Sep 28, 2014 by Happy Hippomaybe we'r just rebooting too fast before qemu flushes its data to the disk? at least you got the kernel booted now. the divide by zero panic is caused by the stats(1) command reading /dev/sysstat (this little graphing system statistics window).
you might just wait a bit before hitting enter on the bootargs prompt to avoid this.
you can also try the kernel i just made that has the fix:
http://www.felloff.net/usr/cinap_lenrek/9pcf.alexchandel
you can copy it to 9fat renamed as 9pcf.
another thing, the 9front kernel is a multiboot image. you can try loading it directly with qemu with the -kernel option. plan9.ini (contents) can be passed as -initrd option.
Comment #6
Posted on Sep 28, 2014 by Grumpy Birdalexchandel: remember you need to run 9fs 9fat to mount the 9fat partition. /n/9fat will not be mounted until you do so. also note: /n/9fat will only be accessible from the same namespace where you run 9fs 9fat.
Comment #7
Posted on Sep 28, 2014 by Swift BirdNice, I booted with 9pcf.alexchandel with the command: qemu-system-i386 -hda 9front.qcow2.img -cdrom 9front-3853.02ebd469f43a.iso -boot d -vga std -m 1G -kernel 9pcf.alexchandel -initrd plan9.ini
As soon as the GUI is drawn, a screen with this error flashes:
Plan 9 Console
i8042: 08 returned to the ea command
It disappears quickly, and then there's a "kernel fault: no user process" panic. I've attached a screenshot.
Comment #8
Posted on Sep 28, 2014 by Grumpy Dogwhat is the content of the plan9.ini you passed to qemu?
Comment #9
Posted on Sep 28, 2014 by Swift Bird@mischief It's: ```
config for initial cd booting
cdboot=yes mouseport=ask monitor=ask vgasize=ask bootfile=/386/9pcf ```
Comment #10
Posted on Sep 28, 2014 by Swift BirdAnd yeah, I ran 9fs 9fat
before checking each time, and from within the same window. In fact /n/9fat
was empty. Also it's worth noting that for my past three posts, I chose cwfs64x during installation.
When I use hjfs and boot with qemu-system-i386 -hda 9front.qcow2.img -cdrom 9front-3853.02ebd469f43a.iso -boot d -vga std -m 1G -kernel 9pcf.alexchandel -initrd plan9.ini
, the "panic: kernel fault: no user process" error doesn't occur. However, /n/9fat
is still empty.
Moreover, booting with qemu-system-i386 -hda 9front.qcow2.img -boot c -vga std -m 1G -kernel 9pcf.alexchandel
gives lots of errors, mostly along the lines of "can't open, /rc not found".
Comment #11
Posted on Sep 28, 2014 by Swift BirdTo summarize, 9front appears to create the second stage bootloader on the hard drive during installation, but after rebooting it's gone. Booting off the hard drive hangs; it's only possible to boot off the ISO. Even booting off the hard drive using a kernel image (thus skipping the bootloader) still fails.
Additionally, after installation, if the HD's filesystem is cwfs64x, booting off the ISO will panic with "kernel fault: no user process".
Comment #12
Posted on Sep 28, 2014 by Happy Hippodecoded the panic, but it makes no sense. it would mean that the machp[0] array contains 0x9 for the mach address of cpu0. this entry gets only set once to a fixed address and then is never touched.
term% ktrace -i f0108507 f0015b24 src(0xf0108507); // dumpstack+0x10 // data at 0xf0015b2c? f0163141 src(0xf0163141); // panic+0xd2 // data at 0xf0015c54? f010867a src(0xf010867a); // fault386+0xd2 // data at 0xf0015d04? f0107c14 src(0xf0107c14); // trap+0x15b // data at 0xf0015dc4? f01005ec src(0xf01005ec); // forkret //passing interrupt frame; last pc found at sp=0xf0015dc4 // data at 0xf0015e04? f013882e src(0xf013882e); // ps2mouseputc+0x19 // data at 0xf0015e38? f01f2add src(0xf01f2add); // i8042intr+0x7a // data at 0xf0015e58? f0107c14 src(0xf0107c14); // trap+0x15b // data at 0xf0015f18? f01005ec src(0xf01005ec); // forkret //passing interrupt frame; last pc found at sp=0xf0015f18 // data at 0xf0015f58? f010055b src(0xf010055b); // halt+0xe // data at 0xf0015f64? f015d946 src(0xf015d946); // idlehands+0x11 // data at 0xf0015f70? f020bee6 src(0xf020bee6); // runproc+0x160 // data at 0xf0015fa4? f020b6b5 src(0xf020b6b5); // sched+0x165 // data at 0xf0015fd0? f020b463 src(0xf020b463); // schedinit+0x85 // data at 0xf0015fe4?
acid: src(0xf013882e); // ps2mouseputc+0x19 /sys/src/9/pc/mouse.c:99 94 int buttons, dx, dy; 95 96 /* 97 * Resynchronize in stream with timing; see comment above. 98 */
99 m = MACHP(0)->ticks; 100 if(TK2SEC(m - lasttick) > 2) 101 nb = 0; 102 lasttick = m; 103
104 /*
acid: asm(ps2mouseputc) ps2mouseputc 0xf0138815 SUBL $0x28,SP ps2mouseputc+0x3 0xf0138818 MOVL packetsize(SB),DI ps2mouseputc+0x9 0xf013881e MOVL nb$1(SB),SI ps2mouseputc+0xf 0xf0138824 MOVL c+0x0(FP),BX ps2mouseputc+0x13 0xf0138828 MOVL machp(SB),AX ps2mouseputc+0x19 0xf013882e MOVL 0x24(AX),BP <- fault ps2mouseputc+0x1c 0xf0138831 MOVL BP,CX
Comment #13
Posted on Sep 28, 2014 by Happy Hippook, i could reproduce this now with many tries in qemu for windows. the trick is to keep twitching the mouse on boot constantly. fix commited in rd2af87472b59. see the explaination there. i build another kernel for you to test under:
Comment #14
Posted on Sep 28, 2014 by Swift BirdThe panic no longer occurs. However I just noticed an abnormalities during the install:
Ream the filesystem? (yes, no)[yes]
Starting cwfs64x file server for /dev/sdC0/fscache
Reaming filesystem
bad nvram key
bad authentication id
bad authentication domain
nvrcheck: can't read nvram
config: config: config: auth disabled
config: config: config: config: config: config: config: currnt fs in "main"
cmd_users: cannot access /adm/users
63-bit cwfs as of Thu Sep 4 20:04:10 2014
last boot Sun Sep 28 17:06:33 2014
Configuring cwfs64x file server for /dev/sdC0/fscache
% mount -c /srv/cwfs /n/newfs
Mounting cwfs64x file server for /dev/sdC0other
% mount -c /srv/cwfs /n/other other
The bootsetup still appears error free:
dossrv: serving #s/dos
% dd -bs 512 -count 1 -if /dev/sdC0/9fat -of /tmp/pbs.bak
1+0 records in
1+0 records out
Initializing Plan 9 FAT partition
% disk/format -r 2 -d -b /n/newfs/386/pbs /dev/sdC0/9fat
Initializing FAT file system
type hard, 12 tracks, 255 heads, 63 secors/track, 512 bytes/sec
used 4096 bytes
% mount -c /srv/dos /n/9fat /dev/sdC0/9fat
% rm -f /n/9fat/9bootfat /n/9fat/plan9.ini /n/9fat/9pcf
% cp /n/newfs/386/9bootfat /n/9fat/9bootfat
% chmod +al /n/9fat/9bootfat
% cp /tmp/plan9.ini /n/9fat/plan9.ini
% cp /n/newfs/386/9pcf /n/9fat/9pcf
% cp /tmp/pbs.bak /n/9fat
% unmount /n/9fat
Regardless, /n/9fat is still empty when I reboot and run "9fs 9fat". And attempting to boot into the HD still hangs at "MBR...pbs....."
Moreover, attempting to boot into the HD using the kernel flag (qemu-system-i386 -hda 9front.qcow2.img -boot c -vga std -m 1G -kernel 9pcf.alexchandel
) throws bad nvram key errors and more, screenshot attached. When I type ls
in the terminal, it errors with:
checktag pc=9b4f cw"/dev/sdC0/fscache"w"/dev/sdC0/fsworm"(11305) tag/path=Tnone/0; expected Tdir
ls: . :phase error -- cannot happen
Comment #15
Posted on Sep 28, 2014 by Happy Hippothe messages from the installation are expected. these are ok. but after reboot, the fat is missing and the cwfs filesystem is partially corrupted. my guess would be that we'r just too fast in rebooting? and qemu doesnt flush the changes out to the qcow image for some reason?
reads and writes to /dev/sdXX/parts are uncached and synchronous. plan9 kernel has no buffer caches. and dossrv writes immidiately. maybe qemu expects us to issue write barriers to really flush stuff to the disk?
maybe just wait a minute after installation when it prompts for the [finish] step?
i can try checking qemu source in the meantime...
Comment #16
Posted on Sep 28, 2014 by Happy Hipposhort explaination what checktag messages are:
the cwfs fileserver uses blocks (of 16k in case of cwfs64x) where it stores some redundant checking info at the end (the tag). the tag contains the type of the block (file-data/directory/indirect pointer blocks...) and the qid (file number). it always checks the tag to see that the block is just read is what it expected.
a tag of Tnone/0 means the tag is zero. the block appears to be zeroed out. ... like it was never written.
Comment #17
Posted on Sep 28, 2014 by Swift BirdI waited ~20 minutes, same result. Is it possible that 9front is corrupting the filesystem when it's shutdown? Zeroed out blocks might be a result of qcow2 corruption.
Comment #18
Posted on Sep 28, 2014 by Happy Hippocwfs writes changes to disk lazily. that is, theres a background process that flushes dirty blocks to disk. but waiting 20 minutes is a bit crazy. it should be a few seconds at max. even with qemus slow i/o, not more than 10 seconds max.
dossrv on the other hand writes immidiately. the write() syscall will not return until dossrv did the whole roundtrip to disk. what puzzles me is that your fat filesystem is missing.
this corruption cannot be explained with the lazy writing of cwfs.
maybe it has someting todo with the qemu configuration? can you try using a sparsefile for the disk image? maybe the qcow got damaged with all this testing?
people use qemu with 9front for a while now, but these issues didnt came up yet.
Comment #19
Posted on Sep 28, 2014 by Happy Hippoanother theory. maybe the ide controller that qemu emulates doesnt work right?
you could try using virtio instead.
Comment #20
Posted on Sep 28, 2014 by Swift BirdJust noticed, when I restart QEMU by entering fshalt
in 9front, and then system_reset
in the QEMU console, the filesystem is preserved, and /n/9fat has its contents. However, if I restart QEMU in any other way, including killing it while 9front is idle, then the filesystem is corrupted.
Comment #21
Posted on Sep 28, 2014 by Swift Birdhttp://wiki.qemu.org/Features/Qcow2DataIntegrity recommends using I/O barriers to avoid data corruption.
Comment #22
Posted on Sep 28, 2014 by Swift BirdNevermind, I was using the wrong image. fshalt
/system_reset
still results in a corrupted filesystem.
Comment #23
Posted on Dec 28, 2014 by Grumpy Dogany progress here? is this still reproducible?
Comment #24
Posted on Jan 1, 2015 by Swift BirdThe newest ISO, 9front-4045 still exhibits the same hanging behavior, when the reported steps are performed. (install, [finish], QEMU restarts, kill QEMU, restart QEMU without cdrom arg, hangs at boot)
Comment #25
Posted on Jan 2, 2015 by Grumpy Dogare you still using qemu 2.1.2, and the same qemu arguments as in the original bug report? i can try to reproduce on this version, but i only have linux to test on. i have never had a problem like you described, and i've tried quite a number of qemu versions during ethervirtio development.. it could be an osx-specific issue, or an issue with how brew packages qemu..
Status: NeedsTesting
Labels:
Type-Other
Priority-Low