| Issue 33: | swapon fails on android G1 (ARM) | |
| 10 people starred this issue and may be notified of changes. | Back to list |
What steps will reproduce the problem? Compiling compcache with the latest CodeSourcery toolchain for arm against kernel 2.6.27. Push compcache modules to device. insmod works fine, dmesg reports normal output swapon reboots the device. This however works fine in the android emulator, I have no idea why this happens. Any ideas? Or ideas on how to debug this? I'm thinking put some sleeps in whatever code is triggered on a swapon, but I'm not sure where to look. If you could give me any clues that'd be very helpfull. |
|
,
Jun 25, 2009
I have slightly different results with compcache on android-msm-2.6.29- I am able to swapon, but afterwards any process allocating memory immediately segfaults. |
|
,
Jun 25, 2009
I've tried with gcc, the device still works fine after swapon, what follows after that is very similar to behavior described in ' Issue 2 ', that is, processes get segfaults or bus errors. |
|
,
Jun 25, 2009
So you got compcache working reliably after swapon on a G1? If so what gcc version, what tools did you use to build it, did you compile the android kernel / lzo modules with that compiler as well or only compcache? |
|
,
Jun 26, 2009
> I have slightly different results with compcache on android-msm-2.6.29- I am able to swapon, but afterwards any process allocating memory immediately segfaults. Do you also see any warnings from compcache in kernel logs? Its quite difficult for me to debug this issue since I don't have this H/W also my lack of experience with this processor. |
|
,
Jun 26, 2009
Hopefully steve.kondik can get you some more output, personally with a cat /proc/kmsg I'm getting nothing before the device reboots. |
|
,
Jun 28, 2009
Ok, got some more to report, with 2.6.29 things seem a lot better. lzo built into the kernel, insmod's and swapon actually work. I can check /proc/ramzswap and see high GoodCompress, but after torturing it a while it crashes the user interface (is my theory, it's not a reboot as I dont loose connection with the phone, and can watch the kernel logs this time). Note that the send sigkill to process is completely normal, and happens all the time on android. However there seems to be 0 output from compcache in here. <4>[ 316.945526] send sigkill to 568 (app_process), adj 14, size 4436 <4>[ 324.601165] select 612 (app_process), adj 15, size 4411, to kill <4>[ 324.601196] send sigkill to 612 (app_process), adj 15, size 4411 <6>[ 346.488891] binder: release 134:323 transaction 6478 in, still active <6>[ 346.489135] binder: send failed reply for transaction 6478 to 194:505 <6>[ 346.744750] binder: 194 invalid dec strong, ref 1079 desc 17 s 0 w 1 <6>[ 346.754028] binder: 423 invalid dec strong, ref 8585 desc 17 s 0 w 1 <6>[ 346.760559] binder: 585 invalid dec strong, ref 9347 desc 17 s 0 w 1 <6>[ 348.089965] request_suspend_state: wakeup (0->0) at 341189074786 (2009-06-28 19:54:03.283935557 UTC) <3>[ 348.092315] init: untracked pid 371 exited <3>[ 348.093719] init: untracked pid 383 exited <3>[ 348.094207] init: untracked pid 390 exited <3>[ 348.094635] init: untracked pid 414 exited <3>[ 348.133636] init: untracked pid 190 exited <3>[ 348.133911] init: untracked pid 273 exited <3>[ 348.134277] init: untracked pid 621 exited <3>[ 348.140106] init: untracked pid 266 exited <3>[ 348.140563] init: untracked pid 352 exited <3>[ 348.160003] init: untracked pid 194 exited <3>[ 348.160461] init: untracked pid 423 exited <3>[ 348.160705] init: untracked pid 585 exited <6>[ 381.844940] request_suspend_state: wakeup (0->0) at 374944049146 (2009-06-28 19:54:37.038909917 UTC) <6>[ 384.697967] binder: release 112:127 transaction 10775 in, still active <6>[ 384.698333] binder: send failed reply for transaction 10775 to 645:653 <6>[ 385.784729] htc-acoustic: open <6>[ 385.845764] htc-acoustic: mmap <6>[ 385.846740] htc-acoustic: ioctl <6>[ 385.846954] htc-acoustic: ioctl: ACOUSTIC_ARM11_DONE called 678. <6>[ 385.849548] htc-acoustic: ioctl: ONCRPC_ACOUSTIC_INIT_PROC success. <6>[ 385.849792] htc-acoustic: release <6>[ 385.890563] snd_set_device 1 1 1 <6>[ 385.901885] snd_set_volume 0 0 5 <6>[ 385.903289] snd_set_volume 1 0 5 <6>[ 385.912017] snd_set_volume 3 0 5 <6>[ 385.913360] snd_set_volume 2 0 5 <6>[ 386.833923] snd_set_volume 256 0 5 |
|
,
Jun 29, 2009
Checking adb logcat during a soft restart, and also during an application that fails to start. I'm not all that much smarter from this output, and I'm a bit unsure where to go from here debugging this. |
|
,
Jul 01, 2009
Ah, I don't have this hardware and there is nothing in logs that can help me debug this issue. I promise a bounty of $100 for the one who gets it working on ARM :) I am serious! |
|
,
Jul 01, 2009
What about posting the debug output to the google android dev group. There's a few google employees that monitor that board. Maybe they can help out. http://groups.google.com/group/android-platform |
|
,
Jul 04, 2009
For the record, it seems to work fine on the Beagleboard, an ARM-based single board computer. This is a Cortex-A8, while the G1 uses an ARM11; that could certainly be a factor. Details: - Kernel and compcache were built natively on the Beagleboard, using a standard Debian gcc 4.3.2. - I'm running a kernel 2.6.30 from the linux-omap git tree, no other patches. - compcache 0.5.3 built just fine, and "use_ramzswap.sh 32768 /dev/mmcblk0p3" ran fine with no errors. - As a quick stress test, I fired up firefox in a VNC session, resulting in /proc/ramzswap giving ~24k reads, ~32k writes, ~75M OrigDataSize, ~23M ComprDataSize. This sure looks like it's actually working. (Also, firefox was actually usable, which is a first for me on this board). - Finally, useuse_ramzswap got rid of the swap as expected. I'm not sure how helpful this is; the hardware is pretty different from the G1. But it does suggest that there's hope, since it works on at least one ARM device. |
|
,
Jul 05, 2009
It is useful, but could you try stress testing it some more, I can also get ramzswap to report everything working, it's not until after some stress testing has occured that things actually start to fail. |
|
,
Jul 05, 2009
Hi,
I have been monitoring the functionality of compcache on Nokia N810, which has a
OMAP2420 processor, which is of course ARM.
I am getting similar errors with my N810, like random reboots at times. I have been
monitoring the dmesg and /proc/ramzswap but no avail at this point. The kernel
version the N810 uses is 2.6.21-omap1. Maybe some kernel debugging would help on this
but I'm not familiar with such "lore" :). So I am just reporting a different ARM
device on this thread.
So swapon and {use,unuse}_ramzswap.sh works but after a while of usage (like opening
the browser and pdf reader), the tablet crashes with unknown reason.
|
|
,
Jul 05, 2009
More stress testing on the Beagleboard; a full kernel compile on -j8 (typically something like ~50M in swap according to free, and gcc processes were definitely swapping), combined with bits of firefox, stress ( http://weather.ou.edu/~apw/projects/stress/ ) for another 30M-60M of memory usage, and video streaming to my laptop. No faults as far as I can tell after several hours and over 11M reads and 6M writes according to /proc/ramzswap. It also shows no FailedReads/Writes or InvalidIO, and the resulting kernel works. I'd say it's solid. If there are any particular tests that might be helpful, let me know. And if anything does come up, I'll be sure to update. |
|
,
Jul 05, 2009
Thanks you all for help till now. Summarizing a bit: - Cortex-A8 (Beagleboard): seems to work fine. - OMAP2420 (Nokia N810): no problems with module load/unload and swapon/swapoff but apps crash or system reboots after some time. - ARM11 (Android G1): swapon reboots the device. I will try reading about these ARM variations and maybe we will get some clues ... |
|
,
Jul 05, 2009
Its possible that the issue here is the same as described here: http://www.linux-mips.org/archives/linux-mips/2008-11/msg00038.html |
|
,
Jul 06, 2009
I'd just like to point out that on recent kernels on android, the device doesn't reboot, the interface does. Which is a rather big difference, as the kernel stays up. Note that the device works fine with normal swap. |
|
,
Jul 06, 2009
Yeah I just tested the latest compcache on N810 last night and experience only interface freezing, but the device is still reacting to button presses and ssh connection is alive, although dmesg revealed nothing special. |
|
,
Jul 09, 2009
I can confirm that compcache doesn't reboot my android g1 with the 2.6.29 kernel. I've set up a 8meg compcache swapfile with swappiness to 60 and it actually works pretty well. I can get things to crash left and right if I set swappiness to 100 though. |
|
,
Jul 09, 2009
Seems like once the swapfile starts getting full and reaching the end of the file, that's when processes start crashing. There's some corruption somewhere. |
|
,
Jul 09, 2009
> I've set up a 8meg compcache swapfile with swappiness to 60 and it actually works > pretty well. > I can get things to crash left and right if I set swappiness to 100 though. With swappiness set to 100, compcache will quickly fill up. Maybe with so much memory pinned with compcache, you are running into OOM Killer? Do you see any oom kill messages in logs? > Seems like once the swapfile starts getting full and reaching the end of the file, > that's when processes start crashing. There's some corruption somewhere. Seems like a good test case. I can try this atleast on my system (x64). |
|
,
Jul 09, 2009
The processes aren't being killed. They're crashing with segfaults and tracebacks. |
|
,
Jul 09, 2009
Could there be an issue with the kernel writing in the same memory space that the compcache swap is residing, since both are using the same memory space. |
|
,
Jul 09, 2009
I would require following data - /proc/cpuinfo - /proc/meminfo - /var/log/messages (on some systems its /var/log/kernel) Above data is need for *each* of following devices: - Cortex-A8 (Beagleboard) - OMAP2420 (Nokia N810) - ARM11 (Android G1) |
|
,
Jul 09, 2009
Here's cpuinfo and meminfo. There is no /var/log/messages or /var/log/kernel on the android g1. # cat /proc/cpuinfo cat /proc/cpuinfo Processor : ARMv6-compatible processor rev 2 (v6l) BogoMIPS : 245.36 Features : swp half thumb fastmult edsp java CPU implementer : 0x41 CPU architecture: 6TEJ CPU variant : 0x1 CPU part : 0xb36 CPU revision : 2 Hardware : trout Revision : 0080 Serial : 0000000000000000 # cat /proc/meminfo cat /proc/meminfo MemTotal: 97908 kB MemFree: 2192 kB Buffers: 536 kB Cached: 24640 kB SwapCached: 12 kB Active: 37608 kB Inactive: 44392 kB Active(anon): 27096 kB Inactive(anon): 30292 kB Active(file): 10512 kB Inactive(file): 14100 kB Unevictable: 252 kB Mlocked: 0 kB SwapTotal: 8188 kB SwapFree: 7244 kB Dirty: 0 kB Writeback: 0 kB AnonPages: 57088 kB Mapped: 14676 kB Slab: 6152 kB SReclaimable: 868 kB SUnreclaim: 5284 kB PageTables: 3072 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 57140 kB Committed_AS: 742092 kB VmallocTotal: 155648 kB VmallocUsed: 53404 kB VmallocChunk: 44028 kB # |
|
,
Jul 09, 2009
> Here's cpuinfo and meminfo. There is no /var/log/messages or /var/log/kernel on the android g1. Ok, then send output of: - uname -a - kernel config file (/boot/config*) -- maybe this will be missing on g1. |
|
,
Jul 09, 2009
# uname -a Linux localhost 2.6.29-cm #1 PREEMPT Thu Jul 2 19:13:31 EDT 2009 armv6l GNU/Linux |
|
,
Jul 10, 2009
All info from the same device, for config.gz I should be on the same (or very similar) to dwang5's. I dont have the config for the exact build I'm using. Linux localhost 2.6.29-cm #2 PREEMPT Sun Jun 28 02:29:13 EDT 2009 armv6l GNU/Linux For kernel log I pulled /proc/kmsg |
|
,
Jul 14, 2009
On ARMv6 and newer: - Caches are VIPT (Virtually Indexed, Physically Tagged) - Writeback caches. So, I think, these crashes are happening due to following: - On swap read, ramzswap gets a 'bio' page which is mapped to kernel VA address, say V(k). All above systems have mem <= 1G. So, kmap simply gives lowmem address. - The data cache at location corresponding to VA == V(k) now contains decompressed data. This data cache location is tagged with decompressed page's physical address, say P. - However, corresponding RAM location still contains some stale data (writeback cache). - Now this page is mapped to userspace VA, say at V(u). - The data cache at location V(u) has a tag different from P (decompressed page's physical address). So, it goes to RAM to fetch the data. - The corresponding RAM location still has some stale data. We fetch this stale data at cache location for VA == V(u) <--------------- - Thus user gets some stale data and it segfaults. Thus, we need to do flush_dcache_page() after writing out decompressed page in ramzswap_read(). However, as mentioned in this mail: http://www.linux-mips.org/archives/linux-mips/2008-11/msg00038.html ... this solution will not work "as is" but still, some workaround should be doable. I will try to upload a custom compcache version with this fix and lets see if it solves the issue. |
|
,
Jul 14, 2009
Ok, sounds great, will try it when you have the version on my N810. |
|
,
Jul 14, 2009
Awsome, based on how much this helps on my main computer (with 4gb memory) I can't imagine how much of an improvement it'll be on my G1 with 96mb memory. I'll be testing the second you push out a test version. |
|
,
Jul 14, 2009
Looking forward to the new version as well. Thank! |
|
,
Jul 15, 2009
Please try compcache test version attached. Thanks. |
|
,
Jul 15, 2009
Thanks Nitin! Would somebody mind posting the compiled android .29 modules? Thanks! |
|
,
Jul 15, 2009
Totally untested, going to bed now will test tomorrow morning. |
|
,
Jul 15, 2009
thank you! thank you! Running with a 24meg swap file (25% of available ram) and swappiness set to 100. No crashes! Running imeem streaming music player in the background while loading up gmail, calendar, browser, maps, and market. awesome! |
|
,
Jul 15, 2009
here's the cat output. 74% compression, is that good? # cat /proc/ramzswap cat /proc/ramzswap DiskSize: 24476 kB NumReads: 99829 NumWrites: 55941 FailedReads: 0 FailedWrites: 0 InvalidIO: 0 PagesDiscard: 0 ZeroPages: 228 GoodCompress: 74 % NoCompress: 6 % PagesStored: 5890 PagesUsed: 2249 OrigDataSize: 23560 kB ComprDataSize: 8333 kB MemUsedTotal: 8996 kB # |
|
,
Jul 15, 2009
one question, is the swappiness setting considered? Will using 60 or 100 make a difference? |
|
,
Jul 15, 2009
Issue 2 has been merged into this issue. |
|
,
Jul 15, 2009
> Running with a 24meg swap file (25% of available ram) and swappiness set to 100. No crashes! Great news! Just to confirm, did you run test on G1 or some emulator? > here's the cat output. 74% compression, is that good? Its a bit unusual. I usually see ~90% for GoodCompress. Also, 6% for "NoCompress" doesn't look too good. > one question, is the swappiness setting considered? Will using 60 or 100 make a difference? Higher the swappiness, more quickly ramzswap will fill up. For kernel its just another swap device for swappiness values applies. |
|
,
Jul 15, 2009
Actual g1 hardware. |
|
,
Jul 15, 2009
Testing this as well on a G1, using kernel 2.6.29 jacHEROski ROM 1.4C (kernel is CM's) Everything loaded just fine. # cat /proc/ramzswap DiskSize: 63473 kB MemLimit: 14684 kB NumReads: 507 NumWrites: 2577 FailedReads: 0 FailedWrites: 0 InvalidIO: 0 PagesDiscard: 0 ZeroPages: 117 GoodCompress: 100 % NoCompress: 0 % PagesStored: 1820 PagesUsed: 352 OrigDataSize: 7280 kB ComprDataSize: 1394 kB MemUsedTotal: 1408 kB BDevNumReads: 108 BDevNumWrites: 640 I have a 64mb swap partition that I am using in conjunction. Question: Does the lzo_compress.ko and lzo_decompress.ko have to be loaded as well? |
|
,
Jul 15, 2009
> Question: Does the lzo_compress.ko and lzo_decompress.ko have to be loaded as well? Yes, they must be loaded. |
|
,
Jul 15, 2009
lzo_compress.ko and lzo_decompress.ko modules are already loaded in cyanogen's kernel. |
|
,
Jul 15, 2009
Fantastic, then this works wonderful! The music app on Hero is actually usable now, and the people app works fantastic! |
|
,
Jul 16, 2009
It seems to be in pretty widespread testing on G1 now, without any reports of crashes : http://forum.xda-developers.com/showthread.php?t=537236 Very nice work! |
|
,
Jul 16, 2009
Ok... so now status of the issue is: 1- Cortex-A8 (Beagleboard) -- seems to work even without the fix (see comment #13). 2- OMAP2420 (Nokia N810) -- crashes without fix. No testing done with the fix. 3- ARM11 (Android G1) -- crashes without fix. Fix resolved the issue. So, now testing is needed for case (2): Nokia 810. (test version uploaded in comment #33). |
|
,
Jul 16, 2009
Yup, I'm gonna test it when I get my VMWare running again to compile the testing version in scratchbox. |
|
,
Jul 16, 2009
Ok I got it compiled in the device itself, no problems whatsoever. Thanks for this Nitin :) |
|
,
Jul 16, 2009
Nokia-N810-43-7:~# free
total used free shared buffers
Mem: 126796 124004 2792 0 4
Swap: 31692 31688 4
Total: 158488 155692 2796
Nokia-N810-43-7:~# cat /proc/ramzswap
DiskSize: 31696 kB
NumReads: 5688
NumWrites: 13113
FailedReads: 0
FailedWrites: 0
InvalidIO: 0
PagesDiscard: 0
ZeroPages: 180
GoodCompress: 52 %
NoCompress: 20 %
PagesStored: 7743
PagesUsed: 4295
OrigDataSize: 30972 kB
ComprDataSize: 15267 kB
MemUsedTotal: 17180 kB
|
|
,
Jul 16, 2009
Thank you all for your testing efforts. The fix is now committed to default and multiple_rzs branch. So, it will now be included in compcache-0.6.
Status: Fixed
|
|
,
Jul 19, 2009
Just FYI, compcache-0.6pre2 now includes this fix. |
|
|
|