Sponsored Content
Top Forums Shell Programming and Scripting List duplicate files based on Name and size Post 302881790 by prvnrk on Wednesday 1st of January 2014 01:06:01 PM
Old 01-01-2014
List duplicate files based on Name and size

Hello,

I have a huge directory (with millions of files) and need to find out duplicates based on BOTH file name and File size.

I know
Code:
fdupes

but it calculates MD5 which is very time-consuming and especially it takes forever as I have millions of files.

Can anyone please suggest a script or tool to find duplicates just based on file name "and" file size. It would be nice to be able to filter based on minimum file size.

Thanks

Last edited by prvnrk; 01-01-2014 at 02:31 PM..
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Report of duplicate files based on part of the filename

I have the files logged in the file system with names in the format of : filename_ordernumber_date_time eg: file_1_12012007_1101.txt file_2_12022007_1101.txt file_1_12032007_1101.txt I need to find out all the files that are logged multiple times with same order number. In the above eg, I... (1 Reply)
Discussion started by: sudheshnaiyer
1 Replies

2. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I want to duplicate a row if found two or more values in a particular column for corresponding row which is delimitted by comma. Input abc,line one,value1 abc,line two, value1, value2 abc,line three,value1 needs to converted to abc,line one,value1 abc,line two, value1 abc,line... (8 Replies)
Discussion started by: Incrediblian
8 Replies

3. UNIX for Dummies Questions & Answers

split files based on size

I have a few txt files in some directory and I need to check their sizes one by one. If any of them are greater than 5mb then I need to split the file in two. Can someone help? Thanks. (6 Replies)
Discussion started by: khanvader
6 Replies

4. Shell Programming and Scripting

Remove duplicate files based on text string?

Hi I have been struggling with a script for removing duplicate messages from a shared mailbox. I would like to search for duplicate messages based on the “Message-ID” string within the messages files. I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Discussion started by: spangberg
1 Replies

5. Shell Programming and Scripting

Deleting files based on their size

I have several files in a folder and I would like to delete the ones that do not contain all the required information (size) let say 1kb. Any ideas? (4 Replies)
Discussion started by: Xterra
4 Replies

6. Shell Programming and Scripting

Find duplicate files by file size

Hi! I want to find duplicate files (criteria: file size) in my download folder. I try it like this: find /Users/frodo/Downloads \! -type d -exec du {} \; | sort > /Users/frodo/Desktop/duplicates_1.txt; cut -f 1 /Users/frodo/Desktop/duplicates_1.txt | uniq -d | grep -hif -... (9 Replies)
Discussion started by: Dirk Einecke
9 Replies

7. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I am new to this forum and this is my first post. I am looking at an old post with exactly the same name. Can not paste URL because I do not have 5 posts My requirement is exactly opposite. I want to get rid of duplicate rows and try to append the values of columns in those rows ... (10 Replies)
Discussion started by: vbhonde11
10 Replies

8. Shell Programming and Scripting

Delete Files based on size

Hello Community! Im newbie on shell programming and its my first post. Im trying to make a bash shell script that it removes files of subdirectory. it is called : rms -{g|l|b} size1 dir -g means : remove file or files in dir that is above size1 -l means: remove file or files in dir that... (1 Reply)
Discussion started by: BTKBaaMMM
1 Replies

9. Shell Programming and Scripting

Find duplicate based on 'n' fields and mark the duplicate as 'D'

Hi, In a file, I have to mark duplicate records as 'D' and the latest record alone as 'C'. In the below file, I have to identify if duplicate records are there or not based on Man_ID, Man_DT, Ship_ID and I have to mark the record with latest Ship_DT as "C" and other as "D" (I have to create... (7 Replies)
Discussion started by: machomaddy
7 Replies

10. Shell Programming and Scripting

Duplicate files and output list

Gents, I have a file like this. 1 1 1 2 2 3 2 4 2 5 3 6 3 7 4 8 5 9 I would like to get something like it 1 1 2 2 3 4 5 3 6 7 Thanks in advance for your support :b: (8 Replies)
Discussion started by: jiam912
8 Replies
srec_binary(5)							File Formats Manual						    srec_binary(5)

NAME
srec_binary - binary file format DESCRIPTION
It is possible to read and write binary files using srec_cat(1). File Holes A file hole is a portion of a regular file that contains NUL characters and is not stored in any data block on disk. Holes are a long- standing feature of Unix files. For instance, the following Unix command creates a file in which the first bytes are a hole: $ echo -n "X" | dd of=/tmp/hole bs=1024 seek=6 $ Now /tmp/hole has 6,145 characters (6,144 NUL characters plus an X character), yet the file occupies just one data block on disk. File holes were introduced to avoid wasting disk space. They are used extensively by database applications and, more generally, by all applications that perform hashing on files. See http://www.oreilly.com/catalog/linuxkernel2/chapter/ch17.pdf for more information. Reading The size of binary files is taken from the size of the file on the file system. If the file has holes these will read as blocks of NUL (zero) data, as there is no elegant way to detect Unix file holes. In general, you probably want to use the -unfill filter to find and remove large swathes of zero bytes. Writing In producing a binary file, srec_cat(1) honours the address information and places the data into the binary file at the addresses specified in the hex file. This usually results on holes in the file. Sometimes alarmingly large file sizes are reported as a result. If you are on a brain-dead operating system without file holes then there are going to be real data blocks containing real zero bytes, and consuming real amounts of disk space. Upgrade - I suggest Linux. To make a file of the size you expect, use srec_info foo.s19 to find the lowest address, then use srec_cat foo.s19 -intel -offset -n -o foo.bin -binary where n is the lowest address present in the foo.s19 file, as reported by srec_info(1). The negative offset serves to move the data down to have an origin of zero. SEE ALSO
srec_input(1) for a description of the -unfill filter srec_examples(1) has a section about binary files, and ways of automagically offseting the data back to zero in a single command. COPYRIGHT
SRrecord version 1.58 Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011 Peter Miller The SRrecord program comes with ABSOLUTELY NO WARRANTY; for details use the 'SRrecord -VERSion License' command. This is free software and you are welcome to redistribute it under certain conditions; for details use the 'SRrecord -VERSion License' command. AUTHOR
Peter Miller E-Mail: pmiller@opensource.org.au //* WWW: http://miller.emu.id.au/pmiller/ Reference Manual SRecord srec_binary(5)
All times are GMT -4. The time now is 10:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy