Sponsored Content
Top Forums Programming Finding duplicate files in two base directories Post 302918032 by migurus on Friday 19th of September 2014 09:09:58 PM
Old 09-19-2014
You can use SHA1 to identify identical files. See below a script to find and show similar files from two different directories:

Code:
DIR1=${1};
DIR2=${2};
TMP1=$(mktemp);
TMP2=$(mktemp);
trap "rm -f $TMP1 $TMP2" EXIT HUP INT QUIT TERM
  
for f1 in $( find $DIR1 -type f -name "*.[ch]" ); do
        shasum $f1 >> TMP1;
done
for f2 in $( find $DIR2 -type f -name "*.[ch]" ); do
        shasum $f2 >> TMP2;
done
 
cat TMP1 TMP2|cut -W -f1|sort|uniq -c|
 awk '{if($1>1)print $2;}'|
while read sha;
do
        grep $sha TMP1 TMP2 | cut -W -f2;
        echo;
done
 exit 0

I added
Code:
echo;

just to separate groups of identical files with an empty line, just for visibility.

This is a quick and dirty, no error checking etc... just to illustrate the idea.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Finding executable files in all directories

This is probably very easy but I would like to know a way to list all my files in all my directories that are readable and executable to everyone. I was told to use find or ls and I tried some stuff but couldnt get it to work. I understand that its dangerous to have files with these permissions for... (4 Replies)
Discussion started by: CSGUY
4 Replies

2. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

3. Shell Programming and Scripting

duplicate directories

Hi, I have file which users like filename ->"readfile", following entries peter john alaska abcd xyz and i have directory /var/ i want to do first cat of "readfile" line by line and first read peter in variable and also cross check with /var/ how many directories are avaialble... (8 Replies)
Discussion started by: learnbash
8 Replies

4. UNIX for Dummies Questions & Answers

finding largest files (not directories)?

hello all. i would like to be able to find the names of all files on a remote machine using ssh. i only want the names of files, not directories so far i'm stuck at "du -a | sort -n" also, is it possible to write them to a file on my machine? i know how to write it to a file on that... (2 Replies)
Discussion started by: user19190989
2 Replies

5. Shell Programming and Scripting

Finding Duplicate files

How do you delete and and find duplicate files? (1 Reply)
Discussion started by: Jicom4
1 Replies

6. Shell Programming and Scripting

Script for parsing directories one level and finding directories older than n days

Hello all, Here's the deal...I have one directory with many subdirs and files. What I want to find out is who is keeping old files and directories...say files and dirs that they didn't use since a number of n days, only one level under the initial dir. Output to a file. A script for... (5 Replies)
Discussion started by: ejianu
5 Replies

7. UNIX for Dummies Questions & Answers

[Solved] Finding the Files In the Same Name Directories

Hi, In the Unix Box, I have a situation, where there is folder name called "Projects" and in that i have 20 Folders S1,S2,S3...S20. In each of the Folders S1,S2,S3,...S20 , there is a same name folder named "MP". So Now, I want to get all the files in all the "MP" Folders and write all those... (6 Replies)
Discussion started by: Siva Sankar
6 Replies

8. Shell Programming and Scripting

finding matches between multiple files from different directories

Hi all... Can somebody pls help me with this... I have a directory (dir1) which has many subdirectories(vr001,vr002,vr003..) with each subdir containing similar text file(say ras.txt). I have another directory(dir2) which has again got some subdir(vr001c,vr002c,vr003c..) with each subdir... (0 Replies)
Discussion started by: bramya07
0 Replies

9. Shell Programming and Scripting

Finding non-existing words in a list of files in a directory and its sub-directories

Hi All, I have a list of words (these are actually a list of database table names separated by comma). Now, I want to find only the non-existing list of words in the *.java files of current directory and/or its sub-directories. Sample list of words:... (8 Replies)
Discussion started by: Bhanu Dhulipudi
8 Replies

10. Shell Programming and Scripting

Finding files deep in directories

i need to find a portable way to go through multiple directories to find a file. I've trid something like this: find /opt/oracle/diag/*/alert_HH2.log -printordinarily, i can run the ls command and it will find it: /opt/oracle/diag/*/*/*/*/alert_HH2.log The problem with this approach is... (3 Replies)
Discussion started by: SkySmart
3 Replies
SHASUM(1)						 Perl Programmers Reference Guide						 SHASUM(1)

NAME
shasum - Print or Check SHA Checksums SYNOPSIS
Usage: shasum [OPTION] [FILE]... or: shasum [OPTION] --check [FILE] Print or check SHA checksums. With no FILE, or when FILE is -, read standard input. -a, --algorithm 1 (default), 224, 256, 384, 512 -b, --binary read files in binary mode (default on DOS/Windows) -c, --check check SHA sums against given list -p, --portable read files in portable mode produces same digest on Windows/Unix/Mac -t, --text read files in text mode (default) The following two options are useful only when verifying checksums: -s, --status don't output anything, status code shows success -w, --warn warn about improperly formatted SHA checksum lines -h, --help display this help and exit -v, --version output version information and exit The sums are computed as described in FIPS PUB 180-2. When checking, the input should be a former output of this program. The default mode is to print a line with checksum, a character indicating type (`*' for binary, `?' for portable, ` ' for text), and name for each FILE. DESCRIPTION
The shasum script provides the easiest and most convenient way to compute SHA message digests. Rather than writing a program, the user simply feeds data to the script via the command line, and waits for the results to be printed on standard output. Data can be fed to shasum through files, standard input, or both. The following command shows how easy it is to compute digests for typical inputs such as the NIST test vector "abc": perl -e "print qw(abc)" | shasum Or, if you want to use SHA-256 instead of the default SHA-1, simply say: perl -e "print qw(abc)" | shasum -a 256 Since shasum uses the same interface employed by the familiar sha1sum program (and its somewhat outmoded anscestor md5sum), you can install this script as a convenient drop-in replacement. AUTHOR
Copyright (c) 2003-2008 Mark Shelor <mshelor@cpan.org>. SEE ALSO
shasum is implemented using the Perl module Digest::SHA or Digest::SHA::PurePerl. perl v5.12.4 2013-03-18 SHASUM(1)
All times are GMT -4. The time now is 10:08 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy