12-04-2006
script that detects duplicate files in directory
I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same MD5 hash; i.e., are believed to be identical.
Others/optional
If the -s switch is specified, the script should not print a list of all duplicate files; instead, it should print the number of duplicates. (For example, in the example above, there are 4 duplicate copies of 3 files), and how much extra space the duplicates take up. (Note: this summary information should only be displayed if the -s switch is present; if it is not present, every line in the output should display a set of duplicate files.)
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi ppl.
I have to check for duplicate files in a directory .
the directory has following files
/the/folder /containing/the/file
a1.yyyymmddhhmmss
a1.yyyyMMddhhmmss
b1.yyyymmddhhmmss
b2.yyyymmddhhmmss
c.yyyymmddhhmmss
d.yyyymmddhhmmss
d.yyyymmddhhmmss
where the date time stamp can be... (1 Reply)
Discussion started by: asinha63
1 Replies
2. Shell Programming and Scripting
Hi,
I am new to unix scripting and I need to write a script that accepts a directory name as an argument, and inside the script to go through all the ".dat" files in that directory. For each ".dat" file in the directory, create a control file(.ctl) file containing the associated ".dat" file name... (0 Replies)
Discussion started by: Axis99
0 Replies
3. UNIX for Dummies Questions & Answers
Hello everyone,
I have been struggling to clean up a back-up mess I created when manually duplicating a directory structure and then working in both of them..
The structures now are significantly different and contain in the order of 15 k files of which most are duplicates.
Now I am trying to... (0 Replies)
Discussion started by: procreator
0 Replies
4. UNIX for Advanced & Expert Users
Hi,
I have found a directory on my web server that have 2 same directory names in the same location on the same partition. Is there a way to mkdir a name twice and be able to see them both in the same location?
Heres an example of the ouput:
# ls
access_log.1.bkup ... (10 Replies)
Discussion started by: maiku09
10 Replies
5. Shell Programming and Scripting
Hi all.
Am doing continuous backup of mailboxes using rsync.
So whenever a new mail arrives it is automatically copied on backup server.
When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S
Eventually , 2 copies of the same file exist on... (7 Replies)
Discussion started by: coolatt
7 Replies
6. Shell Programming and Scripting
Script must removes files from the first directory if there is a file with same name in the second directory
Script passed to the two directories, it lies with them in one directory:
sh script_name dir1 dir2
This is my version, but it does not work :wall:
set - $2/*
for i
do
set -... (6 Replies)
Discussion started by: SLAMUL
6 Replies
7. UNIX for Dummies Questions & Answers
Hi,
I'm running a RHEL6 machine on a VMWare platform and I have somehow created a duplicate /home directory. See below.
# pwd
/home/home/twood
# ls
Desktop Documents Downloads Music Pictures Public Templates Videos
#
I am currently working on some disk quota procedures and I... (2 Replies)
Discussion started by: tjwops
2 Replies
8. Shell Programming and Scripting
Hi ,
I had a requirement to compare two files whether the two files are same or different .... like(files contaisn of two columns each)
file1.txt
121343432213 1234
64564564646 2345
343423424234 2456
file2.txt
121343432213 1234
64564564646 2345
31231313123 3455
how to... (2 Replies)
Discussion started by: hemanthsaikumar
2 Replies
9. Shell Programming and Scripting
Hi,
Could you please assist how to move the gz files which are older than the 90 days from one folder to another folder ,before that it need to check the file system named "nfs" if size is less than 90 or not. If size is above 90 then it shouldn't perform file move and exit the script throwing... (4 Replies)
Discussion started by: venkat918
4 Replies
10. Shell Programming and Scripting
I am currently trying to do a PHP script that detects automatically if Apache Splunk authentication is required or not but I'm having a hard time since HTTP code 303 is always coming back, even if auth is required or not.
Here is the script so far;
<?php
/**
* Apache Splunk script to... (4 Replies)
Discussion started by: syrius
4 Replies
FDUPES(1) General Commands Manual FDUPES(1)
NAME
fdupes - finds duplicate files in a given set of directories
SYNOPSIS
fdupes [ options ] DIRECTORY ...
DESCRIPTION
Searches the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, followed by a byte-by-byte
comparison.
OPTIONS
-r --recurse
include files residing in subdirectories
-s --symlinks
follow symlinked directories
-H --hardlinks
normally, when two or more files point to the same disk area they are treated as non-duplicates; this option will change this behav-
ior
-n --noempty
exclude zero-length files from consideration
-f --omitfirst
omit the first file in each set of matches
-1 --sameline
list each set of matches on a single line
-S --size
show size of duplicate files
-q --quiet
hide progress indicator
-d --delete
prompt user for files to preserve, deleting all others (see CAVEATS below)
-v --version
display fdupes version
-h --help
displays help
SEE ALSO
md5sum(1)
NOTES
Unless -1 or --sameline is specified, duplicate files are listed together in groups, each file displayed on a separate line. The groups are
then separated from each other by blank lines.
When -1 or --sameline is specified, spaces and backslash characters () appearing in a filename are preceded by a backslash character.
CAVEATS
If fdupes returns with an error message such as fdupes: error invoking md5sum it means the program has been compiled to use an external
program to calculate MD5 signatures (otherwise, fdupes uses interal routines for this purpose), and an error has occurred while attempting
to execute it. If this is the case, the specified program should be properly installed prior to running fdupes.
When using -d or --delete, care should be taken to insure against accidental data loss.
When used together with options -s or --symlink, a user could accidentally preserve a symlink while deleting the file it points to.
Furthermore, when specifying a particular directory more than once, all files within that directory will be listed as their own duplicates,
leading to data loss should a user preserve a file without its "duplicate" (the file itself!).
AUTHOR
Adrian Lopez <adrian2@caribe.net>
FDUPES(1)