My favorites | Sign in
Project Logo
                
Code license: New BSD License
Labels: pyhton, file, sync, md5, duplicate
Show all Featured downloads:
md5filetools-0.2.zip
Feeds:
People details
Project owners:
  uandrey

File tools (based on md5):

How-to use

Basically there are needed only two scripts md5fwalker and md5fdupl. First one is used for calculation of md5 hash all files in path defined by you and second one for search and remove duplicates.

Example:

./md5fwalker.py -p /data01 -f files/homepc.data01.files

Advices:

Next, when file with calculated md5 hashes is created. You can find duplicates.

Example:

./md5fdupl.py -f files/homepc.data01.files

Output:

/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
/data01/docs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

For saving the result in file

./md5fdupl.py -f files/homepc.data01.files > dupl.files

If you find duplicates in dupl.files and you want to delete these files, just mark the duplicates file by "star" at line begin

*/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
/data01/docs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

Save this file. For deleting all lines marked by 'star' use option -r

./md5fdupl.py -r dupl.files

If you want to find duplicates in two directories and more (for example: find duplicates on CDs, DVDs, USB HDDs, etc) you need to calculate md5 hash files by md5fwalker (see above) for each directories or devices and then run script md5fdupl.py

./md5fdupl.py -d files

Output:

home-pc.data01.files/home-pc.data05.files:122
home-pc.data01.files/dvd0013.files:8
home-pc.data01.files/usbhdd120.data02.files:14
dvd0003.files/dvd0012.files:11

Let's explain each line.

To get more detail information about duplicates use md5fdupl script

./md5fdupl.py -t files/home-pc.data01.files:files/home-pc.data05.files

Output:

files/home-pc.data01.files: /data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
files/home-pc.data05.files: /data05/mydocs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

To delete duplicates just replace 'files/home-pc.data01.files: ' or 'files/home-pc.data05.files: ' by 'star' and run md5fdupl with option -r

Example:

File: dupl.files

*/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
files/home-pc.data05.files: /data05/mydocs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

Run md5fdupl.py script with option -r

./md5fdupl.py -r dupl.files








Hosted by Google Code