Export to GitHub

fdupes - issue #8

fdupes: option to replace duplicates with hard links


Posted on Oct 8, 2009 by Happy Lion

Debian bug #284274 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=284274

From: Rupert Levene <rlevene@uwaterloo.ca>

It would be nice to have the option of telling fdupes to replace duplicate files with hard links. This would be a more symmetric behaviour than using symlinks.


From: Javier Fernández-Sanguino Peña <jfs@computer.org>

Attached is a patch to the program sources (through the use of a dpatch patch in the Debian package) that adds a new -L / --linkhard option to fdupes. This option will replace all duplicate files with hardlinks which is useful in order to reduce space.

It has been tested only slightly, but the code looks (to me) about right.

Attachments

Comment #1

Posted on Jul 28, 2010 by Swift Horse

There is a typo in the help output in this patch, after the --debug line, "each set of duplicates without prompting the user" is spuriously output again, repeated from the --linkhard output.

Comment #2

Posted on Aug 1, 2010 by Happy Lion

I can see this (it's on the output of 'fdupes --help'); I'd just remove

  • printf(" \teach set of duplicates without prompting the user\n");

after --debug.

Comment #3

Posted on Aug 26, 2010 by Massive Wombat

the manpage has typo: the advertised option isn't "--hardlink", but "--linkhard".

Comment #4

Posted on Dec 4, 2010 by Quick Lion

I ran fdupes with this option. Unfortunately, I forgot that I was on a vfat partition. It looks like this basically deleted all files. Recovery in progress using photorec.

Comment #5

Posted on Oct 30, 2011 by Swift Giraffe

I have an independently developed patch to do the same thing. It's a bit more careful about creating the links, so it should fail safely on filesystems that don't support links. It also merges sets of previously hardlinked files correctly (see issue 22).

The patch includes some code cleanups and optimizations as well, which I split out separately:

0001-Whitespace-cleanups.patch - No functional changes, just whitespace changes to improve readability.

0002-Use-strdup-instead-of-malloc-strcpy-pairs.patch - Use strdup instead of separate malloc/strcpy calls. - Use calloc instead of individually clearing members - Avoid multiple passes over strings when concatenating dir and dirinfo->d_name

0003-Cache-stat-2-results.patch - Cache results from stat(2) calls for improved performance. - Reorganize grokdir() flow to avoid multiple free/continue blocks.

0004-Add-relink-support.patch - Rework relink function to start with a temp file in the right directory then rename it to the correct name. This ensures that any likely errors are detected before the file is lost. - relinkfiles correctly handles duplicate files across multiple filesystems, linking together those files that reside on the same filesystem. - Update sort to: - Prefer the file with more hardlinks - Fall back to a filename comparison if link counts and mtimes are the same to ensure stable results.

Relink mode automatically enables --hardlinks, it doesn't make much sense to ignore existing hardlinked files while crating new ones. It also skips empty files, hard links save nothing in this case. If the user really wants to link the empty files, we provide a --relinkempty mode that will do so.

Attachments

Comment #6

Posted on Jun 17, 2012 by Grumpy Lion

The patch as presented in the original post is bugged, in case some of the files are on 2 different filesystems, as reported on http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=677419 :

> By mistake, I have been using fdupes -rL on what i thought were directories, but one was a symbolic link to a directory on another file system. It found duplicates accross file systems, ... and removed one end. For the little story, the end where it removed files was /lib, and I was deduplicating chroots, so it found plenty of duplicates... try running anything without ld.so...

Anyways, this can easily be reproduced. Assuming /mnt/a and /mnt/b are two different filesystems: $ echo a > /mnt/a/a $ echo a > /mnt/a/b $ echo a > /mnt/b/a $ fdupes -rL /mnt/a /mnt/b [+] /mnt/a/a [h] /mnt/a/b -- unable to create a hardlink for the file: Invalid cross-device link [!] /mnt/a/a $ ls /mnt/b $ <<<

Status: New

Labels:
Type-Defect Priority-Medium