Export to GitHub

weed-fs - issue #26

Could not download uploaded files


Posted on Jul 4, 2013 by Swift Rhino

After uploaded a million files, I could not download them, I got the following messages in servers:

File Entry Not Found! Needle 1246720000 Memory 69510... .... File Entry Not Found! Needle 1895825706 Memory 76422... .... File Entry Not Found! Needle 1929380015 Memory 44888... ....

I used the attached file to upload

Attachments

Comment #1

Posted on Jul 4, 2013 by Grumpy Cat

Can you do a "ls -al" for the volume server's directory, where the *.dat and *.idx files are stored?

And I suppose the disk have enough spaces left, right?

Comment #2

Posted on Jul 4, 2013 by Swift Rhino

yes, there are 4.4Tb free space please see the uploaded screen-shot

Attachments

Comment #3

Posted on Jul 4, 2013 by Grumpy Cat

Looks like some .dat file is exceeding the size limit, 32*1024*1024*1024 = 34359738368 bytes.

I will need to add one additional check at the volume server level to prevent this.

Comment #4

Posted on Jul 4, 2013 by Grumpy Cat

checked in a fix just now. I have not tried your test suite. Please run your test suite to confirm.

Comment #5

Posted on Jul 4, 2013 by Swift Rhino

I just tested and errors still occurred.

There are no message indicate that the .dat file is exceeding the size limit

Comment #6

Posted on Jul 4, 2013 by Swift Rhino

I just checked and only 2 volumes 18 and 21 are possible to download,

All the rest volumes are not possible to download because it read the wrong header value.

Comment #7

Posted on Jul 4, 2013 by Swift Rhino

There are error occurred when converting between uint32 and uint64:

[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:156) Append offset uint32: %!(EXTRA uint32=1565170750) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:157) Append offset uint64: %!(EXTRA int64=12521366002) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Needle).Append:71) Appended header: %!(EXTRA []uint8=[101 136 31 154 0 0 0 0 0 80 66 228 0 0 122 221]) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:166) Write n.Size: 31453, Needle id: 5260004, Needle cookie%!(EXTRA uint32=1703419802) [2013/07/04 16:17:46.242178] [TRAC] (main.PostHandler:222) Uploaded file size: %!(EXTRA uint32=31391) [2013/07/04 16:17:46.242178] [TRAC] (main.PostHandler:226) Upload completed [2013/07/04 16:17:46.242178] [TRAC] (main.GetOrHeadHandler:114) Download: /13,5042e465881f9a [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:197) Volume Id: 13 [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:198) Append offset uint32: %!(EXTRA uint32=1565170750) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:199) Read offset uint64: %!(EXTRA int64=12521366000) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Needle).Read:139) Read header: %!(EXTRA []uint8=[0 0 101 136 31 154 0 0 0 0 0 80 66 228 0 0])

Write: The value in uint32 = 1565170750, uint64 = 12521366002 Read: The value in uint32 = 1565170750, uint64 = 12521366000

Comment #8

Posted on Jul 4, 2013 by Swift Rhino

There are error when computing padding size because 12521366002%8 != 0

Comment #9

Posted on Jul 4, 2013 by Swift Rhino

We should check the padding value when writing files, I added the below code and it works fine:

func (v *Volume) write(n *Needle) (size uint32, err error) { if v.readOnly { err = fmt.Errorf("%s is read-only", v.dataFile) return } v.accessLock.Lock() defer v.accessLock.Unlock() var offset int64 if offset, err = v.dataFile.Seek(0, 2); err != nil { return }

    //check padding
if offset % NeedlePaddingSize != 0 {
    offset = offset + (NeedlePaddingSize - offset % NeedlePaddingSize)
    if offset, err = v.dataFile.Seek(offset, 0); err != nil {
        return
    }
}
    //end

if size, err = n.Append(v.dataFile, v.Version()); err != nil {
    if e := v.dataFile.Truncate(offset); e != nil {
        err = fmt.Errorf("%s\ncannot truncate %s: %s", err, v.dataFile, e)
    }
    return
}
nv, ok := v.nm.Get(n.Id)
if !ok || int64(nv.Offset)*NeedlePaddingSize < offset {
    logger.LoggerVolume.Trace("Write n.Size: %d, Needle id: %d, Needle cookie", n.Size, n.Id, n.Cookie)
    _, err = v.nm.Put(n.Id, uint32(offset/NeedlePaddingSize), n.Size)
}
return

}

Comment #10

Posted on Jul 5, 2013 by Grumpy Cat

Can you please attach the whole volume.go file which was used to generate logs in the comment #7 ?

You fix seems can avoid the problem with 7/8 probability, because a random offset has 1/8 chances to pass your test.

Comment #11

Posted on Jul 5, 2013 by Swift Rhino

Please find the attached volume.go file

Attachments

Comment #12

Posted on Jul 5, 2013 by Grumpy Cat

Thanks! Was your error output in comment #7 generated after my fix?

My fix is during writing period. So if you are continue to read or write existing volumes, you will see errors.

To use my fix, you would need to clean everything and restart your test from an empty system.

Comment #13

Posted on Jul 5, 2013 by Swift Rhino

Hi Chris,

I tested yesterday and the files were not written in the full volumes, I don't think your fix can fix this error

Comment #14

Posted on Jul 5, 2013 by Grumpy Cat

I re-thought about your fix. It can ensure current file are written kind of correctly, but it will likely over-write on other existing files.

So we need to ensure when size limit is exceeded, we fail the write attempt and ask the user to get another file id from the master.

Comment #15

Posted on Jul 5, 2013 by Grumpy Cat

Comment deleted

Comment #16

Posted on Jul 5, 2013 by Swift Rhino

There are no message "Volume Size Limit %d Exceeded! Current size is %d" in the log file

Comment #17

Posted on Jul 5, 2013 by Swift Rhino

Hi Chris, Can yoy please explain: "But it will likely over-write on other existing files."

If something wrong with writing/computing padding value of the file(IO interrupt...), every later files will be stored wrongly.

I added this code to make sure that if something wrong with one file, it will not affect later files

Comment #18

Posted on Jul 5, 2013 by Grumpy Cat

I think your guess is right that my fix seems not related to the issue. (but it should be OK to leave the fix there)

We need to find out why the offset can be different from what we expected, by how much. There are several possibilities: 1. we have an error when writing previous file. 2. the offset returned from v.dataFile.Seek(0, 2) is wrong by a few bytes 3. the offset returned from v.dataFile.Seek(0, 2) is wrong randomly

If it is case 3, we will overwrite existing files.

Can you help to identify which case is causing your problem?

Comment #19

Posted on Jul 5, 2013 by Grumpy Cat

Hi, Hieu,

Your fix should be good. The current actual disk writing is done in several write() calls. If one of them failed, the offset would be incorrect, making all the following files wrong.

It would be helpful to find out what really was wrong in the first place, but your fix should be a very good way to prevent all following file read/write errors.

Comment #20

Posted on Jul 5, 2013 by Grumpy Cat

Checked in the fix to HEAD. Thanks!

If possible, please let me know what was the error that caused the padding alignment error.

Status: Fixed

Labels:
Type-Defect Priority-Medium