My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads

File tools (based on md5):

  • file lists for different carriers (md5fwalker)
  • indexer for file data from different carriers (md5findex)
  • file duplicate search (md5fdupl)

How-to use

Basically there are needed only two scripts md5fwalker and md5fdupl. First one is used for calculation of md5 hash all files in path defined by you and second one for search and remove duplicates.

Example:

./md5fwalker.py -p /data01 -f files/homepc.data01.files

Advices:

  • use one folder for all files (option -f). It will be more easy to manage these files in future.
  • try to avoid global path defintion (option -p) like "/" or "C:/". More effective way is define the directory (folder) where located your files (documents, music, video, photos, etc)

Next, when file with calculated md5 hashes is created. You can find duplicates.

Example:

./md5fdupl.py -f files/homepc.data01.files

Output:

/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
/data01/docs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

For saving the result in file

./md5fdupl.py -f files/homepc.data01.files > dupl.files

If you find duplicates in dupl.files and you want to delete these files, just mark the duplicates file by "star" at line begin

*/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
/data01/docs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

Save this file. For deleting all lines marked by 'star' use option -r

./md5fdupl.py -r dupl.files

If you want to find duplicates in two directories and more (for example: find duplicates on CDs, DVDs, USB HDDs, etc) you need to calculate md5 hash files by md5fwalker (see above) for each directories or devices and then run script md5fdupl.py

./md5fdupl.py -d files

Output:

home-pc.data01.files/home-pc.data05.files:122
home-pc.data01.files/dvd0013.files:8
home-pc.data01.files/usbhdd120.data02.files:14
dvd0003.files/dvd0012.files:11

Let's explain each line.

  • 1st line: you have 122 duplicates in home computer, folders data01 and data05
  • 2d line: you have 8 duplicates in home computer and DVD (label 'dvd0013' in DVD catalog)
  • 3th line: you have 14 duplicates in home computer and external usb disk in folder data02
  • 4th line: you have 11 duplicates in two DVDs (labels 'dvd0003' and 'dvd0012' in DVD catalog)

To get more detail information about duplicates use md5fdupl script

./md5fdupl.py -t files/home-pc.data01.files:files/home-pc.data05.files

Output:

files/home-pc.data01.files: /data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
files/home-pc.data05.files: /data05/mydocs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

To delete duplicates just replace 'files/home-pc.data01.files: ' or 'files/home-pc.data05.files: ' by 'star' and run md5fdupl with option -r

Example:

File: dupl.files

*/data01/docs/recipes.pdf;83042;2009-02-12 21:10:09
files/home-pc.data05.files: /data05/mydocs/cooking/recipes.pdf;83042;2009-02-12 21:10:09

Run md5fdupl.py script with option -r

./md5fdupl.py -r dupl.files
Powered by Google Project Hosting