Export to GitHub

weed-fs - issue #15

Stress test corrupts volume


Posted on Feb 5, 2013 by Quick Horse

What steps will reproduce the problem? 1. rm -rf /tmp/weed && mkdir -p /tmp/weed && bin/weed master -mdir=/tmp/weed -debug=true 2. bin/weed volume -debug=true -dir=/tmp/weed 3. ./filestore-upload-test -weed=:9333 -request.num=10000 (from https://github.com/tgulacsi/filestore-upload-test.git)

What is the expected output? What do you see instead? OK, instead I got after some time: 404 cannot retrieve file. [read bytes 0 error File Entry Not Found! Needle 17254 Memory 32809] [read error: File Entry Not Found! Needle 17254 Memory 32809 /4,43665ad11d0b]

What version of the product are you using? On what operating system? 0b7a235c1746ae23186d7ec9c707fc019ec25c25 Linux waterhouse 3.2.0-4-amd64 #1 SMP Debian 3.2.35-2 x86_64 GNU/Linux

Please provide any additional information below. this appears after some time, after some "cannot write to local filesystem" errors.

Comment #1

Posted on Feb 5, 2013 by Grumpy Cat

Can you show the "cannot write to local filesystem" errors?

If there are errors writing to disk, and an error code is properly returned, the file key should not be used, otherwise, the 404 error is expected.

Comment #2

Posted on Feb 6, 2013 by Quick Horse

You're right.

I cannot reproduce this easily.

I've also met with partial writes (pipe closed on localhost), and also met with real corruption: after some partial write, the data size in needle header was read as some huuuuge number, which resulted in memory panic.

Please consider the attached patch for ensuring full writes (seeks back to the beginning on needle append error). Hope this helps. Tested with a small tmpfs, lot of "no space left on device" :) (sudo umount -lf /tmp/weed; mkdir -p /tmp/weed && sudo mount -o size=128M,mode=4777 -t tmpfs tmpfs /tmp/weed && bin/weed master -mdir=/tmp/weed -debug=true & bin/weed volume -dir=/tmp/weed -debug=true)

GThomas

Attachments

Comment #3

Posted on Feb 6, 2013 by Grumpy Cat

Adjusted minor text and checked in to the HEAD.

Please verify.

Comment #4

Posted on Feb 8, 2013 by Quick Horse

Thanks!

I've slept about it some, and maybe the seek-back is not enough: on appending, we always seek to the datafile's end, thus the previous seek is ineffective.

Attached a patch whitch uses os.File.Truncate to truncate the file to its previous size on error, to avoid partial writes.

After this, after "no space left on device", "volume fix" and "volume export" both goes without error.

GThomas

Attachments

Comment #5

Posted on Feb 10, 2013 by Grumpy Cat

Thanks! Merged into HEAD now!

Comment #6

Posted on Feb 10, 2013 by Quick Horse

Thanks for accepting!

Status: Fixed

Labels:
Type-Defect Priority-Medium