Sponsored Content
Operating Systems Linux Ubuntu Find duplicates among 2 directories Post 303031237 by RavinderSingh13 on Sunday 24th of February 2019 11:04:42 PM
Old 02-25-2019
Hello drew77,

After doing 100+ posts in UNIX.com, we expect you to show us at least whatever you have tried in order to solve your own problem. It is always good to add your efforts in questions as we all are here to learn.

Kindly do add your efforts with CODE TAGS and do let us know then.

Thanks,
R. Singh
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk to find duplicates in 2nd field

I want to find duplicates in file on 2nd field i wrote this code: nawk '{a++} END{for i in a {if (a>1) print}}' temp Could not find whats wrong with this. Appreciate help (5 Replies)
Discussion started by: pinnacle
5 Replies

2. Shell Programming and Scripting

Shellscript to find duplicates according to size

I have a folder which in turn has numerous sub folders all containing pdf files with same file named in different ways. So I need a script if it can be written to find and print the duplicate files (That is files with same size) along with the respective paths. So I assume here that same file... (5 Replies)
Discussion started by: deaddevil
5 Replies

3. Shell Programming and Scripting

How to find 777 permisson is there or not for Directories and sub-directories

Hi All, I am Oracle Apps Tech guy, I have a requirement to find 777 permission is there or not for all Folders and Sub-folders Under APPL_TOP (Folder/directory) with below conditions i) the directory names should start with xx..... (like xxau,xxcfi,xxcca...etc) and exclude the directory... (11 Replies)
Discussion started by: gagan4599
11 Replies

4. Shell Programming and Scripting

Find duplicates in the first column of text file

Hello, My text file has input of the form abc dft45.xml ert rt653.xml abc ert57.xml I need to write a perl script/shell script to find duplicates in the first column and write it into a text file of the form... abc dft45.xml abc ert57.xml Can some one help me plz? (5 Replies)
Discussion started by: gameboy87
5 Replies

5. UNIX for Dummies Questions & Answers

sort and find duplicates for files with no white space

example data 5666700842511TAfmoham03151008075205999900000001000001000++ 5666700843130MAfmoham03151008142606056667008390315100005001 6666666663130MAfmoham03151008142606056667008390315100005001 I'd like to sort on position 10-14 where the characters are eq "130MA". Then based on positions... (0 Replies)
Discussion started by: mmarshall
0 Replies

6. UNIX for Dummies Questions & Answers

Using grep command to find the pattern of text in all directories and sub-directories.

Hi all, Using grep command, i want to find the pattern of text in all directories and sub-directories. e.g: if i want to search for a pattern named "parmeter", i used the command grep -i "param" ../* is this correct? (1 Reply)
Discussion started by: vinothrajan55
1 Replies

7. Shell Programming and Scripting

find numeric duplicates from 300 million lines....

these are numeric ids.. 222932017099186177 222932014385467392 222932017371820032 222932017409556480 I have text file having 300 millions of line as shown above. I want to find duplicates from this file. Please suggest the quicker way.. sort | uniq -d will... (3 Replies)
Discussion started by: pamu
3 Replies

8. Shell Programming and Scripting

Find All duplicates based on multiple keys

Hi All, Input.txt 123,ABC,XYZ1,A01,IND,I68,IND,NN 123,ABC,XYZ1,A01,IND,I67,IND,NN 998,SGR,St,R834,scot,R834,scot,NN 985,SGR0399,St,R180,T15,R180,T1,YY 985,SGR0399,St,R180,T15,R180,T1,NN 985,SGR0399,St,R180,T15,R180,T1,NN 2943,SGR?99,St,R68,Scot,R77,Scot,YY... (2 Replies)
Discussion started by: unme
2 Replies

9. Shell Programming and Scripting

Find duplicates in 2 & 3rd column and their ID

with below given format, I have been trying to find out all IDs for those entries with duplicate names in 2nd and 3rd columns and their count like how many time duplication happened for any name if any, 0.237788 Aaban Aahva 0.291066 Aabheer Aahlaad 0.845814 Aabid Aahan 0.152208 Aadam... (6 Replies)
Discussion started by: busyboy
6 Replies

10. UNIX for Beginners Questions & Answers

Find duplicates in file with line numbers

Hello All, This is a noob question. I tried searching for the answer but the answer found did not help me . I have a file that can have duplicates. 100 200 300 400 100 150 the number 100 is duplicated twice. I want to find the duplicate along with the line number. expected... (4 Replies)
Discussion started by: vatigers
4 Replies
rdfind(1)							      rdfind								 rdfind(1)

NAME
rdfind - finds duplicate files SYNOPSIS
rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ... DESCRIPTION
rdfind finds duplicate files across and/or within several directories. It calculates checksum only if necessary. rdfind runs in O(Nlog(N)) time with N being the number of files. If two (or more) equal files are found, the program decides which of them is the original and the rest are considered duplicates. This is done by ranking the files to each other and deciding which has the highest rank. See section RANKING for details. If you need better control over the ranking than given, you can use some preprocessor which sorts the file names in desired order and then run the program using xargs. See examples below for how to use find and xargs in conjunction with rdfind. To include files or directories that have names starting with -, use rdfind ./- to not confuse them with options. RANKING
Given two or more equal files, the one with the highest rank is selected to be the original and the rest are duplicates. The rules of rank- ing are given below, where the rules are executed from start until an original has been found. Given two files A and B which have equal content, the ranking is as follows: If A was found while scanning an input argument earlier than than B, A is higher ranked. If A was found at a depth lower than B, A is higher ranked (A closer to the root) If A was found earlier than B, A is higher ranked. The last rule is needed when two files are found in the same directory (obviously not given in separate arguments, otherwise the first rule applies) and gives the same order between the files as the operating system delivers the files while listing the directory. This is operat- ing system specific behaviour. OPTIONS
Searching options etc: -ignoreempty true|false Ignore empty files. (default) -followsymlinks true|false Follow symlinks. Default is false. -removeidentinode true|false removes items found which have identical inode and device ID. Default is true. -checksum md5|sha1 what type of checksum to be used: md5 or sha1. Default is md5. Action options: -makesymlinks true|false Replace duplicate files with symbolic links -makehardlinks true|false Replace duplicate files with hard links -makeresultsfile true|false Make a results file results.txt (default) in the current directory. -outputname name Make the results file name to be "name" instead of the default results.txt. -deleteduplicates true|false Delete (unlink) files. General options: -sleep Xms sleeps X milliseconds between reading each file, to reduce load. Default is 0 (no sleep). Note that only a few values are supported at present: 0,1-5,10,25,50,100 milliseconds. -n -dryrun displays what should have been done, dont actually delete or link anything. -h, -help, --help displays brief help message. -v, -version, --version displays version number. EXAMPLES
Search for duplicate files in home directory and a backup directory: rdfind ~ /mnt/backup Delete duplicate in a backup directory: rdfind -deletefiles true /mnt/backup Search for duplicate files in directories called foo: find . -type d -name foo -print0 |xargs -0 rdfind FILES
results.txt (the default name is results.txt and can be changed with option outputname, see above) The results file results.txt will con- tain one row per duplicate file found, along with a header row explaining the columns. A text describes why the file is considered a duplicate: DUPTYPE_UNKNOWN some internal error DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original. DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing the directory in the same input argument as the original) DUPTYPE_OUTSIDE_TREE the file is found during processing another input argument than the original. ENVIRONMENT
DIAGNOSTICS
EXIT VALUES
0 on success, nonzero otherwise. BUGS
/FEATURES When specifying the same directory twice, it keeps the first encountered as the most important (original), and the rest as duplicates. This might not be what you want. The symlink creates absolute links. This might not be what you want. To create relative links instead, you may use the symlinks (2) com- mand, which is able to convert absolute links to relative links. Older versions unfortunately contained a misspelling on the word occurrence. This is now corrected (since 1.3), which might affect user scripts parsing the output file written by rdfind. There are lots of enhancements left to do. Please contribute! SECURITY CONSIDERATIONS
Avoid manipulating the directories while rdfind is reading. rdfind is quite brittle in that case. Especially, when deleting or making links, rdfind can be subject to a symlink attack. Use with care! AUTHOR
Paul Dreik 2006, reachable at rdfind@pauldreik.se Rdfind can be found at http://rdfind.pauldreik.se/ Do you find rdfind useful? Drop me a line! It is always fun to hear from people who actually use it and what data collections they run it on. THANKS
Several persons have helped with suggestions and improvements: Niels Moller, Carl Payne and Salvatore Ansani. Thanks also to you who tested the program and sent me feedback. VERSION
1.3.1 (release date 2012-05-07) svn id: $Id: rdfind.1 766 2012-05-07 17:26:17Z pauls $ COPYRIGHT
This program is distributed under GPLv2 or later, at your option. SEE ALSO
md5sum(1), find(1), symlinks(2) May 2012 1.3.1 rdfind(1)
All times are GMT -4. The time now is 09:01 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy