FDUPES(1) General Commands Manual FDUPES(1)NAME
fdupes - finds duplicate files in a given set of directories
SYNOPSIS
fdupes [ options ] DIRECTORY ...
DESCRIPTION
Searches the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte
comparison.
OPTIONS -r --recurse
for every directory given follow subdirectories encountered within
-R --recurse:
for each directory given after this option follow subdirectories encountered within (note the ':' at the end of option; see the
Examples section below for further explanation)
-s --symlinks
follow symlinked directories
-H --hardlinks
normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behav-
ior
-n --noempty
exclude zero-length files from consideration
-f --omitfirst
omit the first file in each set of matches
-A --nohidden
exclude hidden files from consideration
-1 --sameline
list each set of matches on a single line
-S --size
show size of duplicate files
-m --summarize
summarize duplicate files information
-q --quiet
hide progress indicator
-d --delete
prompt user for files to preserve, deleting all others (see CAVEATS below)
-N --noprompt
when used together with --delete, preserve the first file in each set of duplicates and delete the others without prompting the user
-v --version
display fdupes version
-h --help
displays help
SEE ALSO md5sum(1)NOTES
Unless -1 or --sameline is specified, duplicate files are listed together in groups, each file displayed on a separate line. The groups are
then separated from each other by blank lines.
When -1 or --sameline is specified, spaces and backslash characters () appearing in a filename are preceded by a backslash character.
EXAMPLES
fdupes a --recurse: b
will follow subdirectories under b, but not those under a.
fdupes a --recurse b
will follow subdirectories under both a and b.
CAVEATS
If fdupes returns with an error message such as fdupes: error invoking md5sum it means the program has been compiled to use an external
program to calculate MD5 signatures (otherwise, fdupes uses internal routines for this purpose), and an error has occurred while attempting
to execute it. If this is the case, the specified program should be properly installed prior to running fdupes.
When using -d or --delete, care should be taken to insure against accidental data loss.
When used together with options -s or --symlink, a user could accidentally preserve a symlink while deleting the file it points to.
Furthermore, when specifying a particular directory more than once, all files within that directory will be listed as their own duplicates,
leading to data loss should a user preserve a file without its "duplicate" (the file itself!).
AUTHOR
Adrian Lopez <adrian2@caribe.net>
FDUPES(1)
Check Out this Related Man Page
rdfind(1) rdfind rdfind(1)NAME
rdfind - finds duplicate files
SYNOPSIS
rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...
DESCRIPTION
rdfind finds duplicate files across and/or within several directories. It calculates checksum only if necessary. rdfind runs in O(Nlog(N))
time with N being the number of files.
If two (or more) equal files are found, the program decides which of them is the original and the rest are considered duplicates. This is
done by ranking the files to each other and deciding which has the highest rank. See section RANKING for details.
If you need better control over the ranking than given, you can use some preprocessor which sorts the file names in desired order and then
run the program using xargs. See examples below for how to use find and xargs in conjunction with rdfind.
To include files or directories that have names starting with -, use rdfind ./- to not confuse them with options.
RANKING
Given two or more equal files, the one with the highest rank is selected to be the original and the rest are duplicates. The rules of rank-
ing are given below, where the rules are executed from start until an original has been found. Given two files A and B which have equal
content, the ranking is as follows:
If A was found while scanning an input argument earlier than than B, A is higher ranked.
If A was found at a depth lower than B, A is higher ranked (A closer to the root)
If A was found earlier than B, A is higher ranked.
The last rule is needed when two files are found in the same directory (obviously not given in separate arguments, otherwise the first rule
applies) and gives the same order between the files as the operating system delivers the files while listing the directory. This is operat-
ing system specific behaviour.
OPTIONS
Searching options etc:
-ignoreempty true|false
Ignore empty files. (default)
-followsymlinks true|false
Follow symlinks. Default is false.
-removeidentinode true|false
removes items found which have identical inode and device ID. Default is true.
-checksum md5|sha1
what type of checksum to be used: md5 or sha1. Default is md5.
Action options:
-makesymlinks true|false
Replace duplicate files with symbolic links
-makehardlinks true|false
Replace duplicate files with hard links
-makeresultsfile true|false
Make a results file results.txt (default) in the current directory.
-outputname name
Make the results file name to be "name" instead of the default results.txt.
-deleteduplicates true|false
Delete (unlink) files.
General options:
-sleep Xms
sleeps X milliseconds between reading each file, to reduce load. Default is 0 (no sleep). Note that only a few values are supported
at present: 0,1-5,10,25,50,100 milliseconds.
-n -dryrun
displays what should have been done, dont actually delete or link anything.
-h, -help, --help
displays brief help message.
-v, -version, --version
displays version number.
EXAMPLES
Search for duplicate files in home directory and a backup directory:
rdfind ~ /mnt/backup
Delete duplicate in a backup directory:
rdfind -deletefiles true /mnt/backup
Search for duplicate files in directories called foo:
find . -type d -name foo -print0 |xargs -0 rdfind
FILES
results.txt (the default name is results.txt and can be changed with option outputname, see above) The results file results.txt will con-
tain one row per duplicate file found, along with a header row explaining the columns. A text describes why the file is considered a
duplicate:
DUPTYPE_UNKNOWN some internal error
DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original.
DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in the same input argument as the original)
DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original.
ENVIRONMENT DIAGNOSTICS EXIT VALUES
0 on success, nonzero otherwise.
BUGS /FEATURES
When specifying the same directory twice, it keeps the first encountered as the most important (original), and the rest as duplicates. This
might not be what you want.
The symlink creates absolute links. This might not be what you want. To create relative links instead, you may use the symlinks (2) com-
mand, which is able to convert absolute links to relative links.
Older versions unfortunately contained a misspelling on the word occurrence. This is now corrected (since 1.3), which might affect user
scripts parsing the output file written by rdfind.
There are lots of enhancements left to do. Please contribute!
SECURITY CONSIDERATIONS
Avoid manipulating the directories while rdfind is reading. rdfind is quite brittle in that case. Especially, when deleting or making
links, rdfind can be subject to a symlink attack. Use with care!
AUTHOR
Paul Dreik 2006, reachable at rdfind@pauldreik.se Rdfind can be found at http://rdfind.pauldreik.se/
Do you find rdfind useful? Drop me a line! It is always fun to hear from people who actually use it and what data collections they run it
on.
THANKS
Several persons have helped with suggestions and improvements: Niels Moller, Carl Payne and Salvatore Ansani. Thanks also to you who tested
the program and sent me feedback.
VERSION
1.3.1 (release date 2012-05-07) svn id: $Id: rdfind.1 766 2012-05-07 17:26:17Z pauls $
COPYRIGHT
This program is distributed under GPLv2 or later, at your option.
SEE ALSO md5sum(1), find(1), symlinks(2)May 2012 1.3.1 rdfind(1)