A duplicate file script Question


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers A duplicate file script Question
# 1  
Old 01-26-2011
A duplicate file script Question

Hello, my main goal is to find duplicate files, I dug up a simple code on the internet and my only question is will it work on large files and/or directories with numerous files.

Code:
#!/bin/bash  DIR="/pwd"
  for file1 in ${DIR}; do
      for file2 in ${DIR}; do
          if [ $file1 != $file2 ]; then
            DIFF=`diff "$file1" "$file2" -q`
          if [ "${DIFF%% *}" != "Files" ]; then
             echo "duplicate: $file1 $file2"
               break
          fi
          fi
      done
  done
  echo "done"

Thank you in advance !
# 2  
Old 01-26-2011
I don't think that script's going to work. It certainly won't work on directories with lots of files in them, or work very efficiently.

I'd suggest the fdupes utility, which makes an efficient tree to compare files with.
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 01-26-2011
You could do something like:
Code:
 find /dir -type f -exec md5sum {} \; | awk '{ if(cc[$1]) { print $2; } else { cc[$1]++; }}'

This would give you the name of the duplicate files (not the first, but the duplicates thereafter)
This will traverse down the whole tree from /dir - to stop it traversing child subdirs add the flag "-maxdepth 1"
# 4  
Old 01-26-2011
@BionicMonk
The code you posted is not of the best quality. Even ignoring the mispost on line 1 and the fact that it does not work.
It will fail with large numbers of files or if any filename contains a space character. How quickly it will fail depends on the Operating System.
On further examination we realise that it does not work at all. There is no code in the script to process a list of files.

What Operating System and version do you have?
What Shell do you use?

As previous poster implied the only way to determine whether a file is a duplicate is to checksum the file. The name of the checksum program has much variation depending on what Operating System you are running.

Can you give us a feel of the highest number of files you will be checking in a particular directory tree and how much disc workspace you have to store temporary files. I have a technique which does about 20,000 files in 5-6 minutes but which is impractical for say 200,000 files.

Hmm. Before anybody spends any time on this issue, please post definitive examples of duplicate files and files which are not duplicates stating clearly how you define a "duplicate".

Last edited by methyl; 01-26-2011 at 07:24 PM.. Reason: paranoia
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Shell script to get duplicate string

Hi All, I have a requirement where I have to get the duplicate string count and uniq error message. Below is my file: Rejected - Error on table TableA, column ColA. Error String 1. Rejected - Error on table TableA, column ColB. Error String 2. Rejected - Error on table TableA, column... (6 Replies)
Discussion started by: Deekhari
6 Replies

2. Shell Programming and Scripting

Shell script to compare two files for duplicate..??

Hi , I had a requirement to compare two files whether the two files are same or different .... like(files contaisn of two columns each) file1.txt 121343432213 1234 64564564646 2345 343423424234 2456 file2.txt 121343432213 1234 64564564646 2345 31231313123 3455 how to... (2 Replies)
Discussion started by: hemanthsaikumar
2 Replies

3. Shell Programming and Scripting

Script to find duplicate pattern in a file irrespective of case

We have a configuration file in Unix. In that we have entries like below. if it ends with ":", then it is the end of record. We need to find our if there is any duplicate entries like ABCD irrespective of the case. ABCD:\ :conn.retry.stwait=00.00.30:\ :sess.pnode.max=255:\ ... (9 Replies)
Discussion started by: johnjs
9 Replies

4. Shell Programming and Scripting

Script to duplicate lines

Hello, I'm trying to write an script that in a txt with lines with 2 or more columns separated by commas, like hello, one, two bye, goal first, second, third, fourth hard, difficult.strong, word.line will create another in which if a line has more than 2 columns, it will have another... (4 Replies)
Discussion started by: clinisbud
4 Replies

5. Shell Programming and Scripting

how to duplicate an output in shell script

how to duplicate an output from a shell command? for example: `date` will give the current date to the console. I want this to be displayed in console and also parallely store it in file or variable. user1@solaris4:~> date Tue Feb 28 17:48:31 EST 2012 user1@solaris4:~> date > file ... (3 Replies)
Discussion started by: Arun_Linux
3 Replies

6. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

7. Shell Programming and Scripting

Remove duplicate lines from first file comparing second file

Hi, I have two files with below data:: file1:- 123|aaa|ppp 445|fff|yyy 999|ttt|jjj 555|hhh|hhh file2:- 445|fff|yyy 555|hhh|hhh The records present in file1, not present in file 2 should be writtent to the out put file. output:- 123|aaa|ppp 999|ttt|jjj Is there any one line... (3 Replies)
Discussion started by: gani_85
3 Replies

8. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

9. Shell Programming and Scripting

Command/Script to remove duplicate lines from the file?

Hello, Can anyone tell Command/Script to remove duplicate lines from the file? (2 Replies)
Discussion started by: Rahulpict
2 Replies

10. Shell Programming and Scripting

Bash Script duplicate file names

I am trying to write a housekeeping bash script. Part of it involves searching all of my attached storage media for photographs and moving them into a single directory. The problem occurs when files have duplicate names, obviously a file called 001.jpg will get overwritten with another file... (6 Replies)
Discussion started by: stumpyuk
6 Replies
Login or Register to Ask a Question