My favorites | Sign in
Logo
                
New issue | Search
for
| Advanced search | Search tips
Issue 33: swapon fails on android G1 (ARM)
10 people starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  ----
Closed:  Jul 16
Type-Defect
Priority-Medium


Sign in to add a comment
 
Reported by aagaa...@gmail.com, Jun 25, 2009
What steps will reproduce the problem?
Compiling compcache with the latest CodeSourcery toolchain for arm against
kernel 2.6.27.
Push compcache modules to device.
insmod works fine, dmesg reports normal output
swapon reboots the device.

This however works fine in the android emulator, I have no idea why this
happens.

Any ideas?  Or ideas on how to debug this?  I'm thinking put some sleeps in
whatever code is triggered on a swapon, but I'm not sure where to look.  If
you could give me any clues that'd be very helpfull.
Comment 1 by steve.kondik, Jun 25, 2009
I have slightly different results with compcache on android-msm-2.6.29- I am able to
swapon, but afterwards any process allocating memory immediately segfaults.
Comment 2 by ondra.herman, Jun 25, 2009
I've tried with gcc, the device still works fine after swapon, what follows after
that is very similar to behavior described in ' Issue 2 ', that is, processes get
segfaults or bus errors.
Comment 3 by aagaa...@gmail.com, Jun 25, 2009
So you got compcache working reliably after swapon on a G1?  If so what gcc version,
what tools did you use to build it, did you compile the android kernel / lzo modules
with that compiler as well or only compcache?
Comment 4 by nitingupta910, Jun 26, 2009
> I have slightly different results with compcache on android-msm-2.6.29- I am able to
swapon, but afterwards any process allocating memory immediately segfaults.

Do you also see any warnings from compcache in kernel logs?
Its quite difficult for me to debug this issue since I don't have this H/W also my
lack of experience with this processor.


Comment 5 by aagaa...@gmail.com, Jun 26, 2009
Hopefully steve.kondik can get you some more output, personally with a cat /proc/kmsg
I'm getting nothing before the device reboots.
Comment 6 by aagaa...@gmail.com, Jun 28, 2009
Ok, got some more to report, with 2.6.29 things seem a lot better.

lzo built into the kernel, insmod's and swapon actually work.

I can check /proc/ramzswap and see high GoodCompress, but after torturing it a while
it crashes the user interface (is my theory, it's not a reboot as I dont loose
connection with the phone, and can watch the kernel logs this time).


Note that the send sigkill to process is completely normal, and happens all the time
on android.  However there seems to be 0 output from compcache in here.

<4>[  316.945526] send sigkill to 568 (app_process), adj 14, size 4436
<4>[  324.601165] select 612 (app_process), adj 15, size 4411, to kill
<4>[  324.601196] send sigkill to 612 (app_process), adj 15, size 4411
<6>[  346.488891] binder: release 134:323 transaction 6478 in, still active
<6>[  346.489135] binder: send failed reply for transaction 6478 to 194:505
<6>[  346.744750] binder: 194 invalid dec strong, ref 1079 desc 17 s 0 w 1
<6>[  346.754028] binder: 423 invalid dec strong, ref 8585 desc 17 s 0 w 1
<6>[  346.760559] binder: 585 invalid dec strong, ref 9347 desc 17 s 0 w 1
<6>[  348.089965] request_suspend_state: wakeup (0->0) at 341189074786 (2009-06-28
19:54:03.283935557 UTC)
<3>[  348.092315] init: untracked pid 371 exited
<3>[  348.093719] init: untracked pid 383 exited
<3>[  348.094207] init: untracked pid 390 exited
<3>[  348.094635] init: untracked pid 414 exited
<3>[  348.133636] init: untracked pid 190 exited
<3>[  348.133911] init: untracked pid 273 exited
<3>[  348.134277] init: untracked pid 621 exited
<3>[  348.140106] init: untracked pid 266 exited
<3>[  348.140563] init: untracked pid 352 exited
<3>[  348.160003] init: untracked pid 194 exited
<3>[  348.160461] init: untracked pid 423 exited
<3>[  348.160705] init: untracked pid 585 exited
<6>[  381.844940] request_suspend_state: wakeup (0->0) at 374944049146 (2009-06-28
19:54:37.038909917 UTC)
<6>[  384.697967] binder: release 112:127 transaction 10775 in, still active
<6>[  384.698333] binder: send failed reply for transaction 10775 to 645:653
<6>[  385.784729] htc-acoustic: open
<6>[  385.845764] htc-acoustic: mmap
<6>[  385.846740] htc-acoustic: ioctl
<6>[  385.846954] htc-acoustic: ioctl: ACOUSTIC_ARM11_DONE called 678.
<6>[  385.849548] htc-acoustic: ioctl: ONCRPC_ACOUSTIC_INIT_PROC success.
<6>[  385.849792] htc-acoustic: release
<6>[  385.890563] snd_set_device 1 1 1
<6>[  385.901885] snd_set_volume 0 0 5
<6>[  385.903289] snd_set_volume 1 0 5
<6>[  385.912017] snd_set_volume 3 0 5
<6>[  385.913360] snd_set_volume 2 0 5
<6>[  386.833923] snd_set_volume 256 0 5
Comment 7 by aagaa...@gmail.com, Jun 29, 2009
Checking adb logcat during a soft restart, and also during an application that fails
to start.

I'm not all that much smarter from this output, and I'm a bit unsure where to go from
here debugging this.
android runtime shutting down and restarting.txt
24.6 KB   Download
app failing to start.txt
3.9 KB   Download
Comment 8 by nitingupta910, Jul 01, 2009
Ah, I don't have this hardware and there is nothing in logs that can help me debug
this issue.

I promise a bounty of $100 for the one who gets it working on ARM :)   I am serious!

Comment 9 by dwang5, Jul 01, 2009
What about posting the debug output to the google android dev group.  There's a few
google employees that monitor that board.  Maybe they can help out.

http://groups.google.com/group/android-platform


Comment 10 by edanaher, Jul 04, 2009
For the record, it seems to work fine on the Beagleboard, an ARM-based single board
computer.  This is a Cortex-A8, while the G1 uses an ARM11; that could certainly be a
factor.

Details:
- Kernel and compcache were built natively on the Beagleboard, using a standard
Debian gcc 4.3.2.
- I'm running a kernel 2.6.30 from the linux-omap git tree, no other patches.
- compcache 0.5.3 built just fine, and "use_ramzswap.sh 32768 /dev/mmcblk0p3" ran
fine with no errors.
- As a quick stress test, I fired up firefox in a VNC session, resulting in
/proc/ramzswap giving ~24k reads, ~32k writes, ~75M OrigDataSize, ~23M ComprDataSize.
 This sure looks like it's actually working.  (Also, firefox was actually usable,
which is a first for me on this board).
- Finally, useuse_ramzswap got rid of the swap as expected.

I'm not sure how helpful this is; the hardware is pretty different from the G1.  But
it does suggest that there's hope, since it works on at least one ARM device.
Comment 11 by aagaa...@gmail.com, Jul 05, 2009
It is useful, but could you try stress testing it some more, I can also get ramzswap
to report everything working, it's not until after some stress testing has occured
that things actually start to fail.
Comment 12 by suomalainen.aleksi, Jul 05, 2009
Hi, 
I have been monitoring the functionality of compcache on Nokia N810, which has a
OMAP2420 processor, which is of course ARM. 
I am getting similar errors with my N810, like random reboots at times. I have been
monitoring the dmesg and /proc/ramzswap but no avail at this point. The kernel
version the N810 uses is 2.6.21-omap1. Maybe some kernel debugging would help on this
but I'm not familiar with such "lore" :). So I am just reporting a different ARM
device on this thread.

So swapon and {use,unuse}_ramzswap.sh works but after a while of usage (like opening
the browser and pdf reader), the tablet crashes with unknown reason.
Comment 13 by edanaher, Jul 05, 2009
More stress testing on the Beagleboard; a full kernel compile on -j8 (typically
something like ~50M in swap according to free, and gcc processes were definitely
swapping), combined with bits of firefox, stress (
http://weather.ou.edu/~apw/projects/stress/ ) for another 30M-60M of memory usage,
and video streaming to my laptop.

No faults as far as I can tell after several hours and over 11M reads and 6M writes
according to /proc/ramzswap.  It also shows no FailedReads/Writes or InvalidIO, and
the resulting kernel works.  I'd say it's solid.

If there are any particular tests that might be helpful, let me know.  And if
anything does come up, I'll be sure to update.
Comment 14 by nitingupta910, Jul 05, 2009
Thanks you all for help till now.

Summarizing a bit:
 - Cortex-A8 (Beagleboard): seems to work fine.
 - OMAP2420 (Nokia N810): no problems with module load/unload and swapon/swapoff but
apps crash or system reboots after some time.
 - ARM11 (Android G1): swapon reboots the device.

I will try reading about these ARM variations and maybe we will get some clues ...

Comment 15 by nitingupta910, Jul 05, 2009
Its possible that the issue here is the same as described here:

http://www.linux-mips.org/archives/linux-mips/2008-11/msg00038.html

Comment 16 by aagaa...@gmail.com, Jul 06, 2009
I'd just like to point out that on recent kernels on android, the device doesn't
reboot, the interface does.  Which is a rather big difference, as the kernel stays up.

Note that the device works fine with normal swap.
Comment 17 by suomalainen.aleksi, Jul 06, 2009
Yeah I just tested the latest compcache on N810 last night and experience only
interface freezing, but the device is still reacting to button presses and ssh
connection is alive, although dmesg revealed nothing special. 
Comment 18 by dwang5, Jul 09, 2009
I can confirm that compcache doesn't reboot my android g1 with the 2.6.29 kernel.

I've set up a 8meg compcache swapfile with swappiness to 60 and it actually works
pretty well.

I can get things to crash left and right if I set swappiness to 100 though.
Comment 19 by dwang5, Jul 09, 2009
Seems like once the swapfile starts getting full and reaching the end of the file,
that's when processes start crashing.  There's some corruption somewhere.


Comment 21 by nitingupta910, Jul 09, 2009
> I've set up a 8meg compcache swapfile with swappiness to 60 and it actually works
> pretty well.
> I can get things to crash left and right if I set swappiness to 100 though.

With swappiness set to 100, compcache will quickly fill up. Maybe with so much memory
pinned with compcache, you are running into OOM Killer? Do you see any oom kill
messages in logs?

> Seems like once the swapfile starts getting full and reaching the end of the file,
> that's when processes start crashing.  There's some corruption somewhere.

Seems like a good test case. I can try this atleast on my system (x64).

Comment 22 by dwang5, Jul 09, 2009
The processes aren't being killed.   They're crashing with segfaults and tracebacks.
Comment 23 by dwang5, Jul 09, 2009
Could there be an issue with the kernel writing in the same memory space that the
compcache swap is residing, since both are using the same memory space.
Comment 24 by nitingupta910, Jul 09, 2009
I would require following data
 - /proc/cpuinfo
 - /proc/meminfo
 - /var/log/messages (on some systems its /var/log/kernel)

Above data is need for *each* of following devices:
 - Cortex-A8 (Beagleboard)
 - OMAP2420 (Nokia N810)
 - ARM11 (Android G1)

Comment 25 by dwang5, Jul 09, 2009
Here's cpuinfo and meminfo.  There is no /var/log/messages or /var/log/kernel on the
android g1.

# cat /proc/cpuinfo
cat /proc/cpuinfo
Processor       : ARMv6-compatible processor rev 2 (v6l)
BogoMIPS        : 245.36
Features        : swp half thumb fastmult edsp java
CPU implementer : 0x41
CPU architecture: 6TEJ
CPU variant     : 0x1
CPU part        : 0xb36
CPU revision    : 2

Hardware        : trout
Revision        : 0080
Serial          : 0000000000000000
# cat /proc/meminfo
cat /proc/meminfo
MemTotal:          97908 kB
MemFree:            2192 kB
Buffers:             536 kB
Cached:            24640 kB
SwapCached:           12 kB
Active:            37608 kB
Inactive:          44392 kB
Active(anon):      27096 kB
Inactive(anon):    30292 kB
Active(file):      10512 kB
Inactive(file):    14100 kB
Unevictable:         252 kB
Mlocked:               0 kB
SwapTotal:          8188 kB
SwapFree:           7244 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         57088 kB
Mapped:            14676 kB
Slab:               6152 kB
SReclaimable:        868 kB
SUnreclaim:         5284 kB
PageTables:         3072 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:       57140 kB
Committed_AS:     742092 kB
VmallocTotal:     155648 kB
VmallocUsed:       53404 kB
VmallocChunk:      44028 kB
#
Comment 26 by nitingupta910, Jul 09, 2009
> Here's cpuinfo and meminfo.  There is no /var/log/messages or /var/log/kernel on the
android g1.

Ok, then send output of:
 - uname -a
 - kernel config file (/boot/config*) -- maybe this will be missing on g1.
Comment 27 by dwang5, Jul 09, 2009
# uname -a
Linux localhost 2.6.29-cm #1 PREEMPT Thu Jul 2 19:13:31 EDT 2009 armv6l GNU/Linux
config.gz
9.3 KB   Download
Comment 28 by aagaa...@gmail.com, Jul 10, 2009
All info from the same device, for config.gz I should be on the same (or very
similar) to dwang5's.  I dont have the config for the exact build I'm using.

Linux localhost 2.6.29-cm #2 PREEMPT Sun Jun 28 02:29:13 EDT 2009 armv6l GNU/Linux

For kernel log I pulled /proc/kmsg
cpuinfo
286 bytes   Download
kmsg
23.8 KB   Download
meminfo
899 bytes   Download
Comment 29 by nitingupta910, Jul 14, 2009
On ARMv6 and newer:
 - Caches are VIPT (Virtually Indexed, Physically Tagged)
 - Writeback caches.

So, I think, these crashes are happening due to following:
 - On swap read, ramzswap gets a 'bio' page which is mapped to kernel VA address, say
V(k). All above systems have mem <= 1G. So, kmap simply gives lowmem address.
 - The data cache at location corresponding to VA == V(k) now contains decompressed
data. This data cache location is tagged with decompressed page's physical address,
say P.
 - However, corresponding RAM location still contains some stale data (writeback cache).
 - Now this page is mapped to userspace VA, say at V(u).
 - The data cache at location V(u) has a tag different from P (decompressed page's
physical address). So, it goes to RAM to fetch the data.
 - The corresponding RAM location still has some stale data. We fetch this stale data
at cache location for VA == V(u) <---------------
 - Thus user gets some stale data and it segfaults.

Thus, we need to do flush_dcache_page() after writing out decompressed page in
ramzswap_read(). However, as mentioned in this mail:
http://www.linux-mips.org/archives/linux-mips/2008-11/msg00038.html
... this solution will not work "as is" but still, some workaround should be doable.

I will try to upload a custom compcache version with this fix and lets see if it
solves the issue.

Comment 30 by suomalainen.aleksi, Jul 14, 2009
Ok, sounds great, will try it when you have the version on my N810.
Comment 31 by aagaa...@gmail.com, Jul 14, 2009
Awsome, based on how much this helps on my main computer (with 4gb memory) I can't 
imagine how much of an improvement it'll be on my G1 with 96mb memory.  I'll be 
testing the second you push out a test version.
Comment 32 by dwang5, Jul 14, 2009
Looking forward to the new version as well.  Thank!
Comment 33 by nitingupta910, Jul 15, 2009
Please try compcache test version attached. Thanks.
compcache-0.5.3_arm_test1.tar.gz
17.6 KB   Download
Comment 34 by dwang5, Jul 15, 2009
Thanks Nitin!

Would somebody mind posting the compiled android .29 modules?  Thanks!
Comment 35 by aagaa...@gmail.com, Jul 15, 2009
Totally untested, going to bed now will test tomorrow morning.
arm_test1.tbz2
92.2 KB   Download
Comment 36 by dwang5, Jul 15, 2009
thank you! thank you!

Running with a 24meg swap file (25% of available ram) and swappiness set to 100.  No
crashes!

Running imeem streaming music player in the background while loading up gmail,
calendar, browser, maps, and market.

awesome!
Comment 37 by dwang5, Jul 15, 2009
here's the cat output.  74% compression, is that good?

# cat /proc/ramzswap
cat /proc/ramzswap
DiskSize:          24476 kB
NumReads:          99829
NumWrites:         55941
FailedReads:           0
FailedWrites:          0
InvalidIO:             0
PagesDiscard:          0
ZeroPages:           228
GoodCompress:         74 %
NoCompress:            6 %
PagesStored:        5890
PagesUsed:          2249
OrigDataSize:      23560 kB
ComprDataSize:      8333 kB
MemUsedTotal:       8996 kB
#
Comment 38 by dwang5, Jul 15, 2009
one question, is the swappiness setting considered?  Will using 60 or 100 make a
difference?
Comment 39 by nitingupta910, Jul 15, 2009
 Issue 2  has been merged into this issue.
Comment 40 by nitingupta910, Jul 15, 2009
> Running with a 24meg swap file (25% of available ram) and swappiness set to 100.  No
crashes!

Great news! Just to confirm, did  you run test on G1 or some emulator?

> here's the cat output.  74% compression, is that good?
Its a bit unusual. I usually see ~90% for GoodCompress. Also, 6% for "NoCompress"
doesn't look too good.

> one question, is the swappiness setting considered?  Will using 60 or 100 make a
difference?

Higher the swappiness, more quickly ramzswap will fill up. For kernel its just
another swap device for swappiness values applies.

Comment 41 by dwang5, Jul 15, 2009
Actual g1 hardware.
Comment 42 by greg.hypta, Jul 15, 2009
Testing this as well on a G1, using kernel 2.6.29

jacHEROski ROM 1.4C (kernel is CM's)

Everything loaded just fine. 

# cat /proc/ramzswap                 
DiskSize:	   63473 kB
MemLimit:	   14684 kB
NumReads:	     507
NumWrites:	    2577
FailedReads:	       0
FailedWrites:	       0
InvalidIO:	       0
PagesDiscard:	       0
ZeroPages:	     117
GoodCompress:	     100 %
NoCompress:	       0 %
PagesStored:	    1820
PagesUsed:	     352
OrigDataSize:	    7280 kB
ComprDataSize:	    1394 kB
MemUsedTotal:	    1408 kB
BDevNumReads:	     108
BDevNumWrites:	     640

I have a 64mb swap partition that I am using in conjunction.

Question: Does the lzo_compress.ko and lzo_decompress.ko have to be loaded as well?
Comment 43 by nitingupta910, Jul 15, 2009
> Question: Does the lzo_compress.ko and lzo_decompress.ko have to be loaded as well?

Yes, they must be loaded.
Comment 44 by dwang5, Jul 15, 2009
lzo_compress.ko and lzo_decompress.ko modules are already loaded in cyanogen's kernel.
Comment 45 by greg.hypta, Jul 15, 2009
Fantastic, then this works wonderful!

The music app on Hero is actually usable now, and the people app works fantastic!
Comment 46 by aagaa...@gmail.com, Jul 16, 2009
It seems to be in pretty widespread testing on G1 now, without any reports of crashes
: http://forum.xda-developers.com/showthread.php?t=537236

Very nice work!
Comment 47 by nitingupta910, Jul 16, 2009
Ok... so now status of the issue is:
 1- Cortex-A8 (Beagleboard)  -- seems to work even without the fix (see comment #13).
 2- OMAP2420 (Nokia N810)    -- crashes without fix. No testing done with the fix.
 3- ARM11 (Android G1)       -- crashes without fix. Fix resolved the issue.

So, now testing is needed for case (2): Nokia 810.
(test version uploaded in comment #33).

Comment 48 by suomalainen.aleksi, Jul 16, 2009
Yup, I'm gonna test it when I get my VMWare running again to compile the testing
version in scratchbox.
Comment 49 by suomalainen.aleksi, Jul 16, 2009
Ok I got it compiled in the device itself, no problems whatsoever. Thanks for this
Nitin :)
Comment 50 by suomalainen.aleksi, Jul 16, 2009
Nokia-N810-43-7:~# free
              total         used         free       shared      buffers
  Mem:       126796       124004         2792            0            4
 Swap:        31692        31688            4
Total:       158488       155692         2796
Nokia-N810-43-7:~# cat /proc/ramzswap
DiskSize:          31696 kB
NumReads:           5688
NumWrites:         13113
FailedReads:           0
FailedWrites:          0
InvalidIO:             0
PagesDiscard:          0
ZeroPages:           180
GoodCompress:         52 %
NoCompress:           20 %
PagesStored:        7743
PagesUsed:          4295
OrigDataSize:      30972 kB
ComprDataSize:     15267 kB
MemUsedTotal:      17180 kB
Comment 51 by nitingupta910, Jul 16, 2009
Thank you all for your testing efforts. The fix is now committed to default and
multiple_rzs branch. So, it will now be included in compcache-0.6.

Status: Fixed
Comment 52 by nitingupta910, Jul 19, 2009
Just FYI, compcache-0.6pre2 now includes this fix.
Sign in to add a comment

Hosted by Google Code