My favorites | Sign in
Project Home Downloads Wiki Issues
Project Information
Members

This is a set of java programs to find duplcate directories or files, and generate script to auto delete those. This is still in beta state (I have used many times).

I DON'T TAKE RESPONSIBILITY IF YOU DELETE YOUR IMPORTANT FILES BY RUNNING THIS PROGRAM (like unix links - which make same file have 2 different names).

(this is same as http://sourceforge.net/projects/off-dup-finder/ . See wiki here : http://sourceforge.net/p/off-dup-finder/wiki/Home/ ). usage java Main 1/2 [dir1/dir2] (or java -jar offDupFinder.jar 1/2 [dir1/dir2]) (don't put ending slash for directory name as of now)

1. "java Main 1 [dir1 ] "(find duplicate dir/files) - generates two script files rm1.sh and rm1_files.sh which will later be used to clear up duplicate files/dirs sorted in inverse size order.

2. "java Main 2 [dir2 ] " (compare) compares dir2 with previously compared [dir ] to generate delete scripts for [dir2 ] in "second" subdirectory - rm2_files.sh and rm2_unsorted.sh.

3.(execute with care - actually deletes files after comparing) : Then use "java exec_rmdirs 1/2 " (pass "1" if after step1, "2" if after step2) "java exec_rmfiles 1/2 " to auto delete dir/files after auto compare with possible pause options. reads rm1.sh(exec_rmdirs 1), rm1_files.sh(exec_rmfiles 1), rm2_unsorted.sh(exec_rmdirs 2), rm2_files.sh(exec_rmfiles 2) in current directory.


"java exec_rmfiles 1 0 0 0" - if you can't wait for 10 secs pause between deleting files.- ideal for cleaning up pics/vids/docs.

"java exec_rmdirs 1 0 0 0" - if you can't wait for 10 secs pause between deleting files.- ideal for cleaning up folders.

---

Min File Size specifying (hardcoded)

edit AnalyzeFiles4.java -> change minsize manually from 10000 to 0 (this will consume lot of time while running). min dir size can be passed as args in "java Main 1/2 dir f t 0 0 "-{wait for wiki update and next ver0.7 for these tweaks.}

So if you have million files with half million of them just 43 byte error file - change this file and run "java Main 1 dirname"

Misc-

"java SuperDir [dir1 ] [dir2 ] f "- will say whether dir1 is super directory of dir2 after comaring each and every file (unlike merge tool which doesn't say this)

more options like detect from archive files, cyclic best sub-tree for deletion calculation logic are coming.

Features:

1. Detect highest level up directory matches and reports them. (make sure no conflict- to be deleted items are not compared again)

2. Also detects partial match of directories(this is not released as of yet- as it has lots of possibilities)

3. Detect duplicate files not covered in step 1 and 2 and report them.

4. Sort based on file/directory size and show biggest on top .

5. Two way compare, and report directories to be deleted from second(also generate copy script from second to first)

6. Filtered directory Compare based on pattern, min/max size.

7. Good speed- millions of files could take less than 5 mins to get the script generated.

8. proper log backups, partial support for archived file analysis.

9. offline mode(using ls-laR output on ubuntu linux to say- duplicate directories/files). File Deletion program is open source (4 java files- for diff, SuperDir, removefiles and removedirs).

10. two simple ways to duplicate search- exhaustive(ignore file name), non-exhaustive(consider file name)

To come:

1. GUI for this

2. SuperFile - One file is corrupted and better one exists (partially downloaded youtube videos)

3. zip extraction, duplicate search inside zips.

4. better cyclic subtree detection - with user being prompted for better choice of which of n subtrees to delete

5. Integration in CMS.

6. autoarchive to DVDs - and maintain DB of all archived files/dirs in which DVD-

Powered by Google Project Hosting