Directory containing files,Print names of the files in the directory that are exactly same content.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Directory containing files,Print names of the files in the directory that are exactly same content.
# 1  
Old 03-22-2017
Directory containing files,Print names of the files in the directory that are exactly same content.

Given a directory containing say a few thousand files,
please output a list of all the names of the files in the directory that are exactly the same, i.e. have the same contents.
Code:
func(a_directory_name) output -> {“matches”: [[fn1, fn2 ...], [fn3, fn4 ...] ... ]}

e.g. func(“/home/my/files”) where the directory /home/ca31319/files might contain foo.txt, foo.iso, foo.jpeg, bar.txt, bar.doc, baz.csv, baz.ppt etc. and say the file foo.txt is the same as bar.doc and foo.iso is the same as baz.csv and baz.ppt then the output would be:

Code:
{
"matches": [
[
"foo.txt",
"bar.doc"
],
[
"foo.iso",
"baz.csv",
“baz.ppt”
]
]
}

# 2  
Old 03-22-2017
Where exactly are stuck?
# 3  
Old 03-22-2017
I tried the below code

Code:
for i in TEST/*;
do
for a in TEST/*;
do
if [[ $i == $a ]];then
echo "============"
else
comp=`comm -3 $i $a`;
if [[ $comp != "" ]];then
echo "=============="
else
echo "Matches the $i and $a"
fi
fi
done
done

# 4  
Old 03-22-2017
You are comparing every pair twice. How about
Code:
md5sum TEST/* | 
awk '
        {CS[NR] = $1
         FN[NR] = $2
        }
END     {for (i=1; i<=NR; i++)
          for (j=i+1; j<NR; j++) if (CS[i] == CS[j]) print FN[i] "=" FN[j]
        }
'

This User Gave Thanks to RudiC For This Post:
# 5  
Old 03-22-2017
Another route would be to run the sum command on everything in the directory and redirect the output through sort. If the output of sum is the same for a pair (trio, etc.) of files, they should be identical.
# 6  
Old 03-22-2017
Or, in pure shell:
Code:
for i in TEST/*
     do for a in $(ls -r TEST/*)
          do    [ $i == $a ] && break
                cmp -s $i $a && echo "Matches the $i and $a"
          done
     done


Last edited by RudiC; 03-22-2017 at 07:14 PM.. Reason: wrong file patterns
This User Gave Thanks to RudiC For This Post:
# 7  
Old 03-23-2017
Hi.

You may also wish to consider some programs from the class of utilities that deal with the general idea of differences::
Code:
        15) fdupes, rdfind, duff, jdupes find duplicate files

Some details:
Code:
rdfind  finds duplicate files (man)
Path    : /usr/bin/rdfind
Version : 1.3.4
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Help    : probably available with --help
Repo    : Debian 8.7 (jessie) 

fdupes  finds duplicate files in a given set of directories (man)
Path    : /usr/bin/fdupes
Version : 1.51
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Repo    : Debian 8.7 (jessie) 

jdupes  finds and performs actions upon duplicate files (man)
Path    : ~/executable/jdupes
Version : 1.5.1 (2016-11-01)
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)

duff    duplicate file finder (man)
Path    : /usr/bin/duff
Version : 0.5.2
Type    : ELF 64-bit LSB executable, x86-64, version 1 (SYSV ...)
Repo    : Debian 8.7 (jessie)

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Remove all files with specific file names in directory

If I have 5 files in a directory, what is the best way to remove specific files in it? For example, snps.ivg probes.ivg Desired output probes.ivg probes.txt all.txt Basically, removing those files with "snp" in the filename regardless of extension. Thank you :). (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Finding files in directory with similar names

So, I have a directory tree that has many files named thusly: X_REVY.PDF I need to find any files that have the same X portion (which can be nearly anything) as any another file (in any directory) but have different Y portions (which can be any number from 1-99). I then need it to return... (3 Replies)
Discussion started by: Kamezero
3 Replies

3. Shell Programming and Scripting

Edit names of files in a directory

Hi all, I have a directory with multiple (thousnads) of files, which are named this way ABCDEF.wo.im-1 OKRAME.ire.roi IOJEAFO01.irt.gfg IMNYBL05.REG.gkf I would like to keep the part of the name (everything before the first dot in the filename). The desired output: ABCDEF... (3 Replies)
Discussion started by: Error404
3 Replies

4. UNIX for Advanced & Expert Users

Extracts files names and write those to another file in different directory

Hi , Need to shell script to extracts files names and write those to another file in different directory. input file is inputfile.txt abc|1|bcd.dat 123 david123 123 rudy2345 124 tinku5634 abc|1|def.dat 123 jevid123 123 qwer2345 124 ghjlk5634 abc|1|pqr.txt 123 vbjnnjh435 123 jggdy876... (9 Replies)
Discussion started by: dssyadav
9 Replies

5. Shell Programming and Scripting

How to find all files which has names in uppercase in a directory

i want to display all the files which has their names in the Uppercase in a particular directory...guide.. (6 Replies)
Discussion started by: sheelsadan
6 Replies

6. Shell Programming and Scripting

Grepping file names, comparing them to a directory of files, and moving them into a new directory

got it figured out :) (1 Reply)
Discussion started by: sHockz
1 Replies

7. Shell Programming and Scripting

Comparing files names in directory over two servers

Hi folks I need to write a shell script to check whether source and the destination has the same files. The source and destination are over two servers and connecting through ssh. It should even compare the date i.e, the complete file name, date stamp and size should match. Should list out all the... (3 Replies)
Discussion started by: Olivia
3 Replies

8. Shell Programming and Scripting

How to store files names from a directory to an array

Hi I want to store the file names into an array. I have written like this but I am getting error. declare -A arr_Filenames ls -l *.log | set -A arr_Filenames $(awk '{print $9}') index=0 while (( $index < ${#arr_Filenames })); do Current_Filename=${arr_Filenames} ... (5 Replies)
Discussion started by: dgmm
5 Replies

9. UNIX for Dummies Questions & Answers

How can i copy a list of files with different names into others directory have the same name?

dear all. how can i copy a list of files with different names into others directory have the same name like i have 3 files 10_10 10_10_11 10_10_11_12 and i have 3 directories 10_10 10_10_11 10_10_11_12 how can i make a loop to cp this files into the directory have the same name like... (31 Replies)
Discussion started by: t17
31 Replies

10. Shell Programming and Scripting

how can i copy a list of files with different names into others directory have the same name

dear all. how can i copy a list of files with different names into others directory have the same name like i have 3 files 10_10 10_10_11 10_10_11_12 and i have 3 directories 10_10 10_10_11 10_10_11_12 how can i make a loop to cp this files into the directory have the same name like... (0 Replies)
Discussion started by: t17
0 Replies
Login or Register to Ask a Question