File tools (based on md5):
- file lists for different carriers (md5fwalker)
- indexer for file data from different carriers (md5findex)
- file duplicate search (md5fdupl)
How-to use
Basically there are needed only two scripts md5fwalker and md5fdupl. First one is used for calculation of md5 hash all files in path defined by you and second one for search and remove duplicates.
Example:
./md5fwalker.py -p /data01 -f files/homepc.data01.files
Advices:
- use one folder for all files (option -f). It will be more easy to manage these files in future.
- try to avoid global path defintion (option -p) like "/" or "C:/". More effective way is define the directory (folder) where located your files (documents, music, video, photos, etc)
Next, when file with calculated md5 hashes is created. You can find duplicates.
Example:
./md5fdupl.py -f files/homepc.data01.files
Output:
/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09 /data01/docs/cooking/recipes.pdf;83042;2009-02-12 21:10:09
For saving the result in file
./md5fdupl.py -f files/homepc.data01.files > dupl.files
If you find duplicates in dupl.files and you want to delete these files, just mark the duplicates file by "star" at line begin
*/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09 /data01/docs/cooking/recipes.pdf;83042;2009-02-12 21:10:09
Save this file. For deleting all lines marked by 'star' use option -r
./md5fdupl.py -r dupl.files
If you want to find duplicates in two directories and more (for example: find duplicates on CDs, DVDs, USB HDDs, etc) you need to calculate md5 hash files by md5fwalker (see above) for each directories or devices and then run script md5fdupl.py
./md5fdupl.py -d files
Output:
home-pc.data01.files/home-pc.data05.files:122 home-pc.data01.files/dvd0013.files:8 home-pc.data01.files/usbhdd120.data02.files:14 dvd0003.files/dvd0012.files:11
Let's explain each line.
- 1st line: you have 122 duplicates in home computer, folders data01 and data05
- 2d line: you have 8 duplicates in home computer and DVD (label 'dvd0013' in DVD catalog)
- 3th line: you have 14 duplicates in home computer and external usb disk in folder data02
- 4th line: you have 11 duplicates in two DVDs (labels 'dvd0003' and 'dvd0012' in DVD catalog)
To get more detail information about duplicates use md5fdupl script
./md5fdupl.py -t files/home-pc.data01.files:files/home-pc.data05.files
Output:
files/home-pc.data01.files: /data01/docs/recipes.pdf;83042;2009-02-12 21:10:09 files/home-pc.data05.files: /data05/mydocs/cooking/recipes.pdf;83042;2009-02-12 21:10:09
To delete duplicates just replace 'files/home-pc.data01.files: ' or 'files/home-pc.data05.files: ' by 'star' and run md5fdupl with option -r
Example:
File: dupl.files
*/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09 files/home-pc.data05.files: /data05/mydocs/cooking/recipes.pdf;83042;2009-02-12 21:10:09
Run md5fdupl.py script with option -r
./md5fdupl.py -r dupl.files