Sponsored Content
Top Forums Shell Programming and Scripting List duplicate files based on Name and size Post 302881795 by ctsgnb on Wednesday 1st of January 2014 03:13:13 PM
Old 01-01-2014
Proceed in 2 steps :
1. Log size and filename in a tempfile (removing the path from the filename).
2. Then sort it and get the duplicates
Code:
find /huge_dir -type f -printf "%s %p\n" | sed 's:/.*/::' >/tmp/mytmp
sort /tmp/mytmp | uniq -d

Note that for processing such a number of objects it would be advisable to use a database instead.

Code:
find /huge_dir -type f -printf "%s %f\n" >/tmp/mytmp
sort /tmp/mytmp | uniq -d


Last edited by ctsgnb; 01-01-2014 at 05:32 PM.. Reason: Remove sed clause (Thx Rudi)
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Report of duplicate files based on part of the filename

I have the files logged in the file system with names in the format of : filename_ordernumber_date_time eg: file_1_12012007_1101.txt file_2_12022007_1101.txt file_1_12032007_1101.txt I need to find out all the files that are logged multiple times with same order number. In the above eg, I... (1 Reply)
Discussion started by: sudheshnaiyer
1 Replies

2. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I want to duplicate a row if found two or more values in a particular column for corresponding row which is delimitted by comma. Input abc,line one,value1 abc,line two, value1, value2 abc,line three,value1 needs to converted to abc,line one,value1 abc,line two, value1 abc,line... (8 Replies)
Discussion started by: Incrediblian
8 Replies

3. UNIX for Dummies Questions & Answers

split files based on size

I have a few txt files in some directory and I need to check their sizes one by one. If any of them are greater than 5mb then I need to split the file in two. Can someone help? Thanks. (6 Replies)
Discussion started by: khanvader
6 Replies

4. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

5. Shell Programming and Scripting

Deleting files based on their size

I have several files in a folder and I would like to delete the ones that do not contain all the required information (size) let say 1kb. Any ideas? (4 Replies)
Discussion started by: Xterra
4 Replies

6. Shell Programming and Scripting

Find duplicate files by file size

Hi! I want to find duplicate files (criteria: file size) in my download folder. I try it like this: find /Users/frodo/Downloads \! -type d -exec du {} \; | sort > /Users/frodo/Desktop/duplicates_1.txt; cut -f 1 /Users/frodo/Desktop/duplicates_1.txt | uniq -d | grep -hif -... (9 Replies)
Discussion started by: Dirk Einecke
9 Replies

7. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I am new to this forum and this is my first post. I am looking at an old post with exactly the same name. Can not paste URL because I do not have 5 posts My requirement is exactly opposite. I want to get rid of duplicate rows and try to append the values of columns in those rows ... (10 Replies)
Discussion started by: vbhonde11
10 Replies

8. Shell Programming and Scripting

Delete Files based on size

Hello Community! Im newbie on shell programming and its my first post. Im trying to make a bash shell script that it removes files of subdirectory. it is called : rms -{g|l|b} size1 dir -g means : remove file or files in dir that is above size1 -l means: remove file or files in dir that... (1 Reply)
Discussion started by: BTKBaaMMM
1 Replies

9. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

10. Shell Programming and Scripting

Duplicate files and output list

Gents, I have a file like this. 1 1 1 2 2 3 2 4 2 5 3 6 3 7 4 8 5 9 I would like to get something like it 1 1 2 2 3 4 5 3 6 7 Thanks in advance for your support :b: (8 Replies)
Discussion started by: jiam912
8 Replies
File::Find::Rule::Procedural(3) 			User Contributed Perl Documentation			   File::Find::Rule::Procedural(3)

NAME
File::Find::Rule::Procedural - File::Find::Rule's procedural interface SYNOPSIS
use File::Find::Rule; # find all .pm files, procedurally my @files = find(file => name => '*.pm', in => @INC); DESCRIPTION
In addition to the regular object-oriented interface, File::Find::Rule provides two subroutines for you to use. "find( @clauses )" "rule( @clauses )" "find" and "rule" can be used to invoke any methods available to the OO version. "rule" is a synonym for "find" Passing more than one value to a clause is done with an anonymous array: my $finder = find( name => [ '*.mp3', '*.ogg' ] ); "find" and "rule" both return a File::Find::Rule instance, unless one of the arguments is "in", in which case it returns a list of things that match the rule. my @files = find( name => [ '*.mp3', '*.ogg' ], in => $ENV{HOME} ); Please note that "in" will be the last clause evaluated, and so this code will search for mp3s regardless of size. my @files = find( name => '*.mp3', in => $ENV{HOME}, size => '<2k' ); ^ | Clause processing stopped here ------/ It is also possible to invert a single rule by prefixing it with "!" like so: # large files that aren't videos my @files = find( file => '!name' => [ '*.avi', '*.mov' ], size => '>20M', in => $ENV{HOME} ); AUTHOR
Richard Clamp <richardc@unixbeard.net> COPYRIGHT
Copyright (C) 2003 Richard Clamp. All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. SEE ALSO
File::Find::Rule perl v5.16.2 2011-09-19 File::Find::Rule::Procedural(3)
All times are GMT -4. The time now is 05:42 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy