After uploaded a million files, I could not download them, I got the following messages in servers:
File Entry Not Found! Needle 1246720000 Memory 69510... .... File Entry Not Found! Needle 1895825706 Memory 76422... .... File Entry Not Found! Needle 1929380015 Memory 44888... ....
I used the attached file to upload
- uploader_downloader.pl 4.32KB
Comment #1
Posted on Jul 4, 2013 by Grumpy CatCan you do a "ls -al" for the volume server's directory, where the *.dat and *.idx files are stored?
And I suppose the disk have enough spaces left, right?
Comment #2
Posted on Jul 4, 2013 by Swift Rhinoyes, there are 4.4Tb free space please see the uploaded screen-shot
Comment #3
Posted on Jul 4, 2013 by Grumpy CatLooks like some .dat file is exceeding the size limit, 32*1024*1024*1024 = 34359738368 bytes.
I will need to add one additional check at the volume server level to prevent this.
Comment #4
Posted on Jul 4, 2013 by Grumpy Catchecked in a fix just now. I have not tried your test suite. Please run your test suite to confirm.
Comment #5
Posted on Jul 4, 2013 by Swift RhinoI just tested and errors still occurred.
There are no message indicate that the .dat file is exceeding the size limit
Comment #6
Posted on Jul 4, 2013 by Swift RhinoI just checked and only 2 volumes 18 and 21 are possible to download,
All the rest volumes are not possible to download because it read the wrong header value.
Comment #7
Posted on Jul 4, 2013 by Swift RhinoThere are error occurred when converting between uint32 and uint64:
[2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:156) Append offset uint32: %!(EXTRA uint32=1565170750) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:157) Append offset uint64: %!(EXTRA int64=12521366002) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Needle).Append:71) Appended header: %!(EXTRA []uint8=[101 136 31 154 0 0 0 0 0 80 66 228 0 0 122 221]) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).write:166) Write n.Size: 31453, Needle id: 5260004, Needle cookie%!(EXTRA uint32=1703419802) [2013/07/04 16:17:46.242178] [TRAC] (main.PostHandler:222) Uploaded file size: %!(EXTRA uint32=31391) [2013/07/04 16:17:46.242178] [TRAC] (main.PostHandler:226) Upload completed [2013/07/04 16:17:46.242178] [TRAC] (main.GetOrHeadHandler:114) Download: /13,5042e465881f9a [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:197) Volume Id: 13 [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:198) Append offset uint32: %!(EXTRA uint32=1565170750) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Volume).read:199) Read offset uint64: %!(EXTRA int64=12521366000) [2013/07/04 16:17:46.242178] [TRAC] (storage.(*Needle).Read:139) Read header: %!(EXTRA []uint8=[0 0 101 136 31 154 0 0 0 0 0 80 66 228 0 0])
Write: The value in uint32 = 1565170750, uint64 = 12521366002 Read: The value in uint32 = 1565170750, uint64 = 12521366000
Comment #8
Posted on Jul 4, 2013 by Swift RhinoThere are error when computing padding size because 12521366002%8 != 0
Comment #9
Posted on Jul 4, 2013 by Swift RhinoWe should check the padding value when writing files, I added the below code and it works fine:
func (v *Volume) write(n *Needle) (size uint32, err error) { if v.readOnly { err = fmt.Errorf("%s is read-only", v.dataFile) return } v.accessLock.Lock() defer v.accessLock.Unlock() var offset int64 if offset, err = v.dataFile.Seek(0, 2); err != nil { return }
//check padding
if offset % NeedlePaddingSize != 0 {
offset = offset + (NeedlePaddingSize - offset % NeedlePaddingSize)
if offset, err = v.dataFile.Seek(offset, 0); err != nil {
return
}
}
//end
if size, err = n.Append(v.dataFile, v.Version()); err != nil {
if e := v.dataFile.Truncate(offset); e != nil {
err = fmt.Errorf("%s\ncannot truncate %s: %s", err, v.dataFile, e)
}
return
}
nv, ok := v.nm.Get(n.Id)
if !ok || int64(nv.Offset)*NeedlePaddingSize < offset {
logger.LoggerVolume.Trace("Write n.Size: %d, Needle id: %d, Needle cookie", n.Size, n.Id, n.Cookie)
_, err = v.nm.Put(n.Id, uint32(offset/NeedlePaddingSize), n.Size)
}
return
}
Comment #10
Posted on Jul 5, 2013 by Grumpy CatCan you please attach the whole volume.go file which was used to generate logs in the comment #7 ?
You fix seems can avoid the problem with 7/8 probability, because a random offset has 1/8 chances to pass your test.
Comment #11
Posted on Jul 5, 2013 by Swift RhinoPlease find the attached volume.go file
- volume.go 8.52KB
Comment #12
Posted on Jul 5, 2013 by Grumpy CatThanks! Was your error output in comment #7 generated after my fix?
My fix is during writing period. So if you are continue to read or write existing volumes, you will see errors.
To use my fix, you would need to clean everything and restart your test from an empty system.
Comment #13
Posted on Jul 5, 2013 by Swift RhinoHi Chris,
I tested yesterday and the files were not written in the full volumes, I don't think your fix can fix this error
Comment #14
Posted on Jul 5, 2013 by Grumpy CatI re-thought about your fix. It can ensure current file are written kind of correctly, but it will likely over-write on other existing files.
So we need to ensure when size limit is exceeded, we fail the write attempt and ask the user to get another file id from the master.
Comment #15
Posted on Jul 5, 2013 by Grumpy CatComment deleted
Comment #16
Posted on Jul 5, 2013 by Swift RhinoThere are no message "Volume Size Limit %d Exceeded! Current size is %d" in the log file
Comment #17
Posted on Jul 5, 2013 by Swift RhinoHi Chris, Can yoy please explain: "But it will likely over-write on other existing files."
If something wrong with writing/computing padding value of the file(IO interrupt...), every later files will be stored wrongly.
I added this code to make sure that if something wrong with one file, it will not affect later files
Comment #18
Posted on Jul 5, 2013 by Grumpy CatI think your guess is right that my fix seems not related to the issue. (but it should be OK to leave the fix there)
We need to find out why the offset can be different from what we expected, by how much. There are several possibilities: 1. we have an error when writing previous file. 2. the offset returned from v.dataFile.Seek(0, 2) is wrong by a few bytes 3. the offset returned from v.dataFile.Seek(0, 2) is wrong randomly
If it is case 3, we will overwrite existing files.
Can you help to identify which case is causing your problem?
Comment #19
Posted on Jul 5, 2013 by Grumpy CatHi, Hieu,
Your fix should be good. The current actual disk writing is done in several write() calls. If one of them failed, the offset would be incorrect, making all the following files wrong.
It would be helpful to find out what really was wrong in the first place, but your fix should be a very good way to prevent all following file read/write errors.
Comment #20
Posted on Jul 5, 2013 by Grumpy CatChecked in the fix to HEAD. Thanks!
If possible, please let me know what was the error that caused the padding alignment error.
Status: Fixed
Labels:
Type-Defect
Priority-Medium