New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parity file support [$5] #314
Comments
From langner....@gmail.com on September 10, 2013 01:41:51 Possible PAR2 implementations: http://www.quickpar.org.uk/ http://hp.vector.co.jp/authors/VA021385/ http://paulhoule.com/phpar2/index.php http://chuchusoft.com/par2_tbb/index.html |
There is an open source implementation here (in Java): And one here (in C#): |
Added $5 bounty to try to get more attention to this feature... |
Is parity recover no longer seen as needed? Or is this concept already incorporated by a supported compression method? |
Parity is still a nice feature to add. We don't have any way to combat block loss other than being lucky that the blocks are still on the machine. |
Ok thanks. I'm thinking of adding par2 at least to the backup process to prevent bit rot. I'm thinking the par2 set of files would be zip'd so there is a 1:1 file ratio. It's why I'm starting with my subfolder file limit of 2000 since I may have another 2000 par2 files in a directory. |
Would you see it as a separate file so that it's easier to later "add" for existing backups? I had been thinking a little about how to best deal with that, since we obviously want to grandfather in existing backups without making them reupload everything. Perhaps parity could be added when detected missing on verification? That way we don't even download more data. |
I'm thinking a separate file. The par files can be toggled by an option and added to an existing backup. The idea being it all automagically creates par and repairs files as they traverse to and from the remote backend. I think the adding of par files can be fairly easily accomplished. That could go first. Then I would look to add auto recovery during the retrieval of remote files. If a volume file is bad then check for and pull par files, then repair the file, reload the repaired file and use the repaired file for whatever purpose it was pulled. |
I'm wondering what options to provide: enable-parity-file : (bool) true/false From the users perspective, would just having parity-file-redundancy be enough? If it equals 0 then parity is off and 1-100 would have it enabled. Or are both options needed? With both then it allows the parity-file-redundancy to have a default of say 5. |
@BlueBlock - my vote is for a single option. Default for |
That makes it simple. I have some ideas for how to allow existing backups to add par2 files but any way will mean pulling down a copy in order to add par2 files. So far I have creation of a par2 file for each uploaded backend file. When retrieving a backend file, if the hash does not match then it pulls the par2, performs a repair and also uploads the repaired file to the backend. Then the repaired file is utilized for whatever operation. I'm figuring even 1% redundancy would be good to protect against possible bit rot. I'd probably run at 5-10% for really important things. |
I like the idea of going with 1 consolidated option. For the default value, would "1" not be better? At 1% the parity file is pretty small and offers at least some protection. Note that since a parity file exists in a 1-1 relationship with backend files, the number of files effectively doubles. This won't be a problem once I have the subfolders work done. |
Maybe.... Only negative I can think of is if people are using back ends that are nearing the max file limit. That may push it over the edge. |
The subfolder solution will take care of that even for existing backups. |
Would it be useful to add "Parity" to the usage-reporter feature stats? |
I believe the usage reporter only runs on Google app engine. |
OK thanks. I'll look at it a bit and see if I can run a copy of it. |
What is this saying? Is this a proposal for future if the feature gets done, or is it of any use right now?
This interests me quite a lot because so little is known about which Duplicati versions are being used. Survey for Linux users, which version of Mono do you have installed? |
What would happen in a compaction situation where some backed up files/blocks were deleted? Wouldn't the whole par2 become invalid and you would have to re-retrieve the whole backup set to recreate new par2? |
Just if it would be useful to track the subfolder feature usage in usage-reporter. |
Each backend volume file has a par2 file generated during the upload process. So any changes would go together. |
For Windows I have the par2cmdline.exe but does anyone have a recommendation on handling this on the linux side as an external dependency? Do we just require the user to install par2cmdline through apt-get, etc? |
I think this would be a good way, as it can be easy to check for the presence of the tool and notify the user of the need for its installation if it's not present and the user wishes to make use of the parity feature. (I know the question was posed a while ago now, but as the issue is still open and I have only just stumbled upon this issue and would love to see parity file support I thought I'd add my $0.02. I hope that's alright!) |
This issue has been mentioned on Duplicati. There might be relevant details there: https://forum.duplicati.com/t/bountysource-pocketing-peoples-money/10092/1 |
This could be done as a warning message when parity is enabled but binaries are not found. |
There's also a Reed Solomon implementation for C# https://github.com/antiduh/ErrorCorrection. Edit: And a C# parchive https://github.com/heksesang/Parchive.NET |
I propose to implement simple Reed-Solomon code, then store the code of each With PAR2 standard, the number of files will be at least doubled, somewhat reduce the directory efficiency. |
I think this would be sub-optimal because if you decided to change parity ratio or just stop using it completely, you'd need to reshuffle your whole backup. Whereas if parity were stored in separate files, only they would need to be touched. Also if your backup destination is nearing a file limit, I think you have bigger problems to worry about. If your backup suddenly grows in size, you'd be unable to backup anyway. |
@mxxcon Why is reshuffle required to change the parity ratio? The parity is calculated based on the final zip files, if parity format changed, you only need to update the parity data stored in Use external parity files are also viable, though I don't think using Par2 is an elegant way. |
Though I'm not quite familiar with the underlying logic of |
How the backup process works gets into dblock and dlist a little, but glosses over dindex, which is just an index to its dblock. There's not a constant relationship between a dlist file and block storage dblock and dindex. Compacting files at the backend Deduplication means that new dlist files can refer to the same blocks, if source didn't change. All of the references are hashes. |
@ts678 Thanks for the key information! Several unclear points for me:
|
No, but it should be there. A missing dindex will be recreated from DB information. A DB Recreate will prefer dindex over dblock (because they're smaller) but will search all the dblocks if it has to. Duplicati.CommandLine.RecoveryTool uses dblock not dindex.
I don't think anything guarantees it, but it's typical and it should be so.
Reconstructing a partial temporary database is also used for Direct restore from backup files. Without dindex, if you had a 10 TB backup and you wanted to direct-restore a 1 KB file, you would download 10 TB looking through blocks to find the right blocks.
so yes, it's redundant, but it's also very useful.
Then where is the parity for the whole dindex file? How does it get to parity data in dindex if the dindex file itself gets corrupted?
I don't follow. This proposal assumes a dlist tied to a dindex, but this isn't so. The dblock and dindex will change during compact. |
@ts678 Thanks for your excellent explanation! I read through the code these days and I found that my initial understanding of the So I plan to continue with what @BlueBlock proposed with a little modification. A separate PR was created. |
This issue has been mentioned on Duplicati. There might be relevant details there: https://forum.duplicati.com/t/the-compacting-process-is-very-dangerous/10832/17 |
From kenneth@hexad.dk on December 07, 2010 22:27:02
If a volume archive is broken either due to transfer issues, or an issue on the backend, the data will likely not be recoverable, especially if it is encrypted.
One solution to this would be to simply store a parity file together with the volumes.
The most prominent parity application I know of this par2: http://www.par2.net/
Original issue: http://code.google.com/p/duplicati/issues/detail?id=314
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: