Find duplicates among 2 directories


 
Thread Tools Search this Thread
Operating Systems Linux Ubuntu Find duplicates among 2 directories
# 1  
Old 02-24-2019
Find duplicates among 2 directories

I have 2 directories,
Code:
/media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/

Code:
/media/andy/MAXTOR_SDB1/Linux_Files/.

I want to find which files are duplicates so I can delete them from one of those directories.
# 2  
Old 02-25-2019
Hello drew77,

After doing 100+ posts in UNIX.com, we expect you to show us at least whatever you have tried in order to solve your own problem. It is always good to add your efforts in questions as we all are here to learn.

Kindly do add your efforts with CODE TAGS and do let us know then.

Thanks,
R. Singh
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 02-25-2019
And, pls add a definition of what makes a "duplicate" - a common file name? Common meta data as e.g. size, time stamps? Identical contents / check sum?
# 4  
Old 02-25-2019
Quote:
Originally Posted by RudiC
And, pls add a definition of what makes a "duplicate" - a common file name? Common meta data as e.g. size, time stamps? Identical contents / check sum?

Yes, the same file name.



I want to put programs in their own directory while putting documents, and other changing files in another directory.

--- Post updated at 03:33 AM ---

R. Singh I did use code tags in my post.


Code:
Code:
/media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/


Code:
diff /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04 /media/andy/MAXTOR_SDB1/Linux_File

Code:
Only in /media/andy/MAXTOR_SDB1/Linux_Files: Briggs_Stratton_Generator.zip
Only in /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04: Brinkmann_8109415-W.zip
Only in /media/andy/MAXTOR_SDB1/Linux_Files: Brother_2240_Drivers.zip

I do not know what "Only in" means.


I have some html files in those directories.


My diff command showed this. (partial list)


Is there a way to not show the internals of those files?


Code:
sonType%3DcheckForloginAndRegister%26WT.z_eCTAid%3Dct1_eml_ViewPlan__ct1_eml_tra_eml_0day%26WT.z_edatesent%3D12082017&reasonCode=-1&appid=TRK_MC_CTA" ADD_DATE="1512750666" LAST_MODIFIED="1512750875" LAST_CHARSET="UTF-8">Log in | UPS Andy77586 Mar...7</A>
---
>         <DT><A HREF="https://beautifultaiwantea.com/collections/white-tea/products/silver-needle" ADD_DATE="1509652385" LAST_MODIFIED="1515190477" ICON_URI="https://beautifultaiwantea.com/favicon.ico" ICON="data:image/png;base64,R0lGODlhAQABAIABAAAAAP///yH5BAEAAAEALAAAAAABAAEAAAICTAEAOw==" LAST_CHARSET="UTF-8">Silver Needle White Tea | Beautiful Taiwan Tea Company</A>


Last edited by RudiC; 02-25-2019 at 05:47 AM..
# 5  
Old 02-25-2019
There is a tool that can determine the identity of files using the md5 sum.
Code:
apt install fdupes

Look at the listing
Code:
fdupes dir1/ dir2/

Interactive mode with a choice to remove
Code:
fdupes -d dir1/ dir2/

The following case is suitable for use in the script
All duplicates of the file will be deleted exclude only the first file (in order of sorting name files and then name dirs!) will be saved.
A simple way to change the directory with the saved file try to use -i option. It does not change the save directory, but in the reorganized sort order, the upper file may be in the folder you need
try
fdupes -i dir1/ dir2/
and then use
fdupes -Nd dir1/ dir2/
Well, before you delete something, be sure to read the man pages on the command and make training tests on its use.
This User Gave Thanks to nezabudka For This Post:
# 6  
Old 02-25-2019
Sure that filenames are enough? Try
Code:
diff <(ls /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04) <(ls /media/andy/MAXTOR_SDB1/Linux_File)

given your system (which you fail to mention, btw) has a shell that provides "process substitution".
# 7  
Old 02-25-2019
Quote:
Originally Posted by drew77
Code:
diff /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04 /media/andy/MAXTOR_SDB1/Linux_File

Code:
Only in /media/andy/MAXTOR_SDB1/Linux_Files: Briggs_Stratton_Generator.zip
Only in /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04: Brinkmann_8109415-W.zip
Only in /media/andy/MAXTOR_SDB1/Linux_Files: Brother_2240_Drivers.zip

I do not know what "Only in" means.
diff is a utility that compares text files: you give it two text files and it will tell you the differences between these two. Up to now i didn't know that the GNU-version can compare directories too but obviously it can. I have learned something new today.

Two understand how diff works let us suppose for the moment it works on lines only (it doesn't). In principle there are three possibilities:

1) a line is present in both files
2) a line is present in file 1 (only) but not in file 2
3) a line is present in file 2 (only) but not in file 1

This is the situation you have here. Your output means the two directories will contain the same files once you:

1) copy Briggs_Stratton_Generator.zip from /media/andy/MAXTOR_SDB1/Linux_Files to /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04
2) copy Brinkmann_8109415-W.zip from /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04 to /media/andy/MAXTOR_SDB1/Linux_Files
3) copy Brother_2240_Drivers.zip also from /media/andy/MAXTOR_SDB1/Linux_Files to /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04

Notice, though, that two files which have the same name (all others not mentioned in the output) do not necessarily be the same: they still could differ in content, so you would have to compare file sizes too as a first step and even if sizes are the same it might be that the content is different. You would have to use diff again (this time on the two individual files) to find out.

I hope this helps.

bakunin

Last edited by bakunin; 02-25-2019 at 05:59 AM..
This User Gave Thanks to bakunin For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Find duplicates in file with line numbers

Hello All, This is a noob question. I tried searching for the answer but the answer found did not help me . I have a file that can have duplicates. 100 200 300 400 100 150 the number 100 is duplicated twice. I want to find the duplicate along with the line number. expected... (4 Replies)
Discussion started by: vatigers
4 Replies

2. Shell Programming and Scripting

Find duplicates in 2 & 3rd column and their ID

with below given format, I have been trying to find out all IDs for those entries with duplicate names in 2nd and 3rd columns and their count like how many time duplication happened for any name if any, 0.237788 Aaban Aahva 0.291066 Aabheer Aahlaad 0.845814 Aabid Aahan 0.152208 Aadam... (6 Replies)
Discussion started by: busyboy
6 Replies

3. Shell Programming and Scripting

Find All duplicates based on multiple keys

Hi All, Input.txt 123,ABC,XYZ1,A01,IND,I68,IND,NN 123,ABC,XYZ1,A01,IND,I67,IND,NN 998,SGR,St,R834,scot,R834,scot,NN 985,SGR0399,St,R180,T15,R180,T1,YY 985,SGR0399,St,R180,T15,R180,T1,NN 985,SGR0399,St,R180,T15,R180,T1,NN 2943,SGR?99,St,R68,Scot,R77,Scot,YY... (2 Replies)
Discussion started by: unme
2 Replies

4. Shell Programming and Scripting

find numeric duplicates from 300 million lines....

these are numeric ids.. 222932017099186177 222932014385467392 222932017371820032 222932017409556480 I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way.. sort | uniq -d will... (3 Replies)
Discussion started by: pamu
3 Replies

5. UNIX for Dummies Questions & Answers

Using grep command to find the pattern of text in all directories and sub-directories.

Hi all, Using grep command, i want to find the pattern of text in all directories and sub-directories. e.g: if i want to search for a pattern named "parmeter", i used the command grep -i "param" ../* is this correct? (1 Reply)
Discussion started by: vinothrajan55
1 Replies

6. UNIX for Dummies Questions & Answers

sort and find duplicates for files with no white space

example data 5666700842511TAfmoham03151008075205999900000001000001000++ 5666700843130MAfmoham03151008142606056667008390315100005001 6666666663130MAfmoham03151008142606056667008390315100005001 I'd like to sort on position 10-14 where the characters are eq "130MA". Then based on positions... (0 Replies)
Discussion started by: mmarshall
0 Replies

7. Shell Programming and Scripting

Find duplicates in the first column of text file

Hello, My text file has input of the form abc dft45.xml ert rt653.xml abc ert57.xml I need to write a perl script/shell script to find duplicates in the first column and write it into a text file of the form... abc dft45.xml abc ert57.xml Can some one help me plz? (5 Replies)
Discussion started by: gameboy87
5 Replies

8. Shell Programming and Scripting

How to find 777 permisson is there or not for Directories and sub-directories

Hi All, I am Oracle Apps Tech guy, I have a requirement to find 777 permission is there or not for all Folders and Sub-folders Under APPL_TOP (Folder/directory) with below conditions i) the directory names should start with xx..... (like xxau,xxcfi,xxcca...etc) and exclude the directory... (11 Replies)
Discussion started by: gagan4599
11 Replies

9. Shell Programming and Scripting

Shellscript to find duplicates according to size

I have a folder which in turn has numerous sub folders all containing pdf files with same file named in different ways. So I need a script if it can be written to find and print the duplicate files (That is files with same size) along with the respective paths. So I assume here that same file... (5 Replies)
Discussion started by: deaddevil
5 Replies

10. Shell Programming and Scripting

Awk to find duplicates in 2nd field

I want to find duplicates in file on 2nd field i wrote this code: nawk '{a++} END{for i in a {if (a>1) print}}' temp Could not find whats wrong with this. Appreciate help (5 Replies)
Discussion started by: pinnacle
5 Replies
Login or Register to Ask a Question