Finding duplicate files in two base directories


 
Thread Tools Search this Thread
Top Forums Programming Finding duplicate files in two base directories
# 1  
Old 09-19-2014
Finding duplicate files in two base directories

Hello All,
I have got some assignment to complete till this Monday and problem statement is as follow :-

Code:
Problem :- Find duplicate files (especially .c and .cpp) from two project base directories with following requirement :- 
1.Should be extendable to search in multiple base project directories
2.Must use STL container 
3.Should be portable to be used on Linux and Windows   
4.In advance search it should also look for contents of the file.

While surfing on net came across Boost::FileSystem which is portable on both OS.
Friends please provide me some inputs on this.
Thank you very much in advance.
# 2  
Old 09-19-2014
Do not post classroom or homework problems in the main forums. Homework and coursework questions can only be posted in this forum under special homework rules.

Please review the rules, which you agreed to when you registered, if you have not already done so.

More-than-likely, posting homework in the main forums has resulting in a forum infraction. If you did not post homework, please explain the company you work for and the nature of the problem you are working on.

If you did post homework in the main forums, please review the guidelines for posting homework and repost.

Thank You.

The UNIX and Linux Forums.
# 3  
Old 09-19-2014
Hello,
Surely its not gravitational assignment. Sorry for mentioning so. I did it just to bypass the strict restriction put on by service base companies on there employees. I am working as Software Engineer in one of the service base company and its task assigned to me but I am completely new to STL and Boost and hence opt for help here.
# 4  
Old 09-19-2014
You can use SHA1 to identify identical files. See below a script to find and show similar files from two different directories:

Code:
DIR1=${1};
DIR2=${2};
TMP1=$(mktemp);
TMP2=$(mktemp);
trap "rm -f $TMP1 $TMP2" EXIT HUP INT QUIT TERM
  
for f1 in $( find $DIR1 -type f -name "*.[ch]" ); do
        shasum $f1 >> TMP1;
done
for f2 in $( find $DIR2 -type f -name "*.[ch]" ); do
        shasum $f2 >> TMP2;
done
 
cat TMP1 TMP2|cut -W -f1|sort|uniq -c|
 awk '{if($1>1)print $2;}'|
while read sha;
do
        grep $sha TMP1 TMP2 | cut -W -f2;
        echo;
done
 exit 0

I added
Code:
echo;

just to separate groups of identical files with an empty line, just for visibility.

This is a quick and dirty, no error checking etc... just to illustrate the idea.
# 5  
Old 09-20-2014
Thanks Migurus for your prompt reply. But unfortunately I have to do this in C++.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Finding files deep in directories

i need to find a portable way to go through multiple directories to find a file. I've trid something like this: find /opt/oracle/diag/*/alert_HH2.log -printordinarily, i can run the ls command and it will find it: /opt/oracle/diag/*/*/*/*/alert_HH2.log The problem with this approach is... (3 Replies)
Discussion started by: SkySmart
3 Replies

2. Shell Programming and Scripting

Finding non-existing words in a list of files in a directory and its sub-directories

Hi All, I have a list of words (these are actually a list of database table names separated by comma). Now, I want to find only the non-existing list of words in the *.java files of current directory and/or its sub-directories. Sample list of words:... (8 Replies)
Discussion started by: Bhanu Dhulipudi
8 Replies

3. Shell Programming and Scripting

finding matches between multiple files from different directories

Hi all... Can somebody pls help me with this... I have a directory (dir1) which has many subdirectories(vr001,vr002,vr003..) with each subdir containing similar text file(say ras.txt). I have another directory(dir2) which has again got some subdir(vr001c,vr002c,vr003c..) with each subdir... (0 Replies)
Discussion started by: bramya07
0 Replies

4. UNIX for Dummies Questions & Answers

[Solved] Finding the Files In the Same Name Directories

Hi, In the Unix Box, I have a situation, where there is folder name called "Projects" and in that i have 20 Folders S1,S2,S3...S20. In each of the Folders S1,S2,S3,...S20 , there is a same name folder named "MP". So Now, I want to get all the files in all the "MP" Folders and write all those... (6 Replies)
Discussion started by: Siva Sankar
6 Replies

5. Shell Programming and Scripting

Script for parsing directories one level and finding directories older than n days

Hello all, Here's the deal...I have one directory with many subdirs and files. What I want to find out is who is keeping old files and directories...say files and dirs that they didn't use since a number of n days, only one level under the initial dir. Output to a file. A script for... (5 Replies)
Discussion started by: ejianu
5 Replies

6. Shell Programming and Scripting

Finding Duplicate files

How do you delete and and find duplicate files? (1 Reply)
Discussion started by: Jicom4
1 Replies

7. UNIX for Dummies Questions & Answers

finding largest files (not directories)?

hello all. i would like to be able to find the names of all files on a remote machine using ssh. i only want the names of files, not directories so far i'm stuck at "du -a | sort -n" also, is it possible to write them to a file on my machine? i know how to write it to a file on that... (2 Replies)
Discussion started by: user19190989
2 Replies

8. Shell Programming and Scripting

duplicate directories

Hi, I have file which users like filename ->"readfile", following entries peter john alaska abcd xyz and i have directory /var/ i want to do first cat of "readfile" line by line and first read peter in variable and also cross check with /var/ how many directories are avaialble... (8 Replies)
Discussion started by: learnbash
8 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies

10. UNIX for Dummies Questions & Answers

Finding executable files in all directories

This is probably very easy but I would like to know a way to list all my files in all my directories that are readable and executable to everyone. I was told to use find or ls and I tried some stuff but couldnt get it to work. I understand that its dangerous to have files with these permissions for... (4 Replies)
Discussion started by: CSGUY
4 Replies
Login or Register to Ask a Question