Sponsored Content
Top Forums Shell Programming and Scripting Find duplicate files by file size Post 302510144 by jim mcnamara on Friday 1st of April 2011 04:03:47 PM
Old 04-01-2011
how about cksum - that is far easier to use. It gives a filesize. Or you can use the check sum, either way.
This code assumes your cksum implmentation gives:
Code:
cksum filename
checksum  filesize filename

Code:
cksum /path/to/files/* |
  awk ' { if( $2 in arr) 
            {print "duplicates ", $3, arr[$2], "duplicate filesize = ", $2} 
              else 
            {arr[$2]=$3} }'

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

how to find duplicate files with find ?

hello all I like to make search on files , and the result need to be the files that are duplicated? (8 Replies)
Discussion started by: umen
8 Replies

2. Solaris

command to find out total size of a specific file size (spread over the server)

hi all, in my server there are some specific application files which are spread through out the server... these are spread in folders..sub-folders..chid folders... please help me, how can i find the total size of these specific files in the server... (3 Replies)
Discussion started by: abhinov
3 Replies

3. Shell Programming and Scripting

Find Duplicate files, not by name

I have a directory with images: -rw-r--r-- 1 root root 26216 Mar 19 21:00 020109.210001.jpg -rw-r--r-- 1 root root 21760 Mar 19 21:15 020109.211502.jpg -rw-r--r-- 1 root root 23144 Mar 19 21:30 020109.213002.jpg -rw-r--r-- 1 root root 31350 Mar 20 00:45 020109.004501.jpg -rw-r--r-- 1 root... (2 Replies)
Discussion started by: Ikon
2 Replies

4. Shell Programming and Scripting

Find duplicate files

What utility do you recommend for simply finding all duplicate files among all files? (4 Replies)
Discussion started by: kiasas
4 Replies

5. Shell Programming and Scripting

Find file size difference in two files using awk

Hi, Could anyone help me to solve this problem? I have two files "f1" and "f2" having 2 fields in each, a) file size and b) file name. The data are almost same in both the files except for few and new additional lines. Now, I have to find out and print the output as, the difference in the... (3 Replies)
Discussion started by: royalibrahim
3 Replies

6. Shell Programming and Scripting

Remove duplicate lines from a 50 MB file size

hi, Please help me to write a command to delete duplicate lines from a file. And the size of file is 50 MB. How to remove duplicate lins from such a big file. (6 Replies)
Discussion started by: vsachan
6 Replies

7. Shell Programming and Scripting

find duplicate string in many different files

I have more than 100 files like this: SVEAVLTGPYGYT 2 SVEGNFEETQY 10 SVELGQGYEQY 28 SVERTGTGYT 6 SVGLADYNEQF 21 SVGQGYEQY 32 SVKTVLGYEQF 2 SVNNEQF 12 SVRDGLTNSPLH 3 SVRRDREGLEQF 11 SVRTSGSYEQY 17 SVSVSGSPLQETQY 78 SVVHSTSPEAF 59 SVVPGNGYT 75 (4 Replies)
Discussion started by: xshang
4 Replies

8. Shell Programming and Scripting

Find duplicate files but with different extensions

Hi ! I wonder if anyone can help on this : I have a directory: /xyz that has the following files: chsLog.107.20130603.gz chsLog.115.20130603 chsLog.111.20130603.gz chsLog.107.20130603 chsLog.115.20130603.gz As you ca see there are two files that are the same but only with a minor... (10 Replies)
Discussion started by: fretagi
10 Replies

9. Shell Programming and Scripting

Find duplicate rows between files

Hi champs, I have one of the requirement, where I need to compare two files line by line and ignore duplicates. Note, I hav files in sorted order. I have tried using the comm command, but its not working for my scenario. Input file1 srv1..development..employee..empname,empid,empdesg... (1 Reply)
Discussion started by: Selva_2507
1 Replies

10. Shell Programming and Scripting

List duplicate files based on Name and size

Hello, I have a huge directory (with millions of files) and need to find out duplicates based on BOTH file name and File size. I know fdupes but it calculates MD5 which is very time-consuming and especially it takes forever as I have millions of files. Can anyone please suggest a script or... (7 Replies)
Discussion started by: prvnrk
7 Replies
cksum(n)						     Cyclic Redundancy Checks							  cksum(n)

__________________________________________________________________________________________________________________________________________________

NAME
cksum - Calculate a cksum(1) compatible checksum SYNOPSIS
package require Tcl 8.2 package require cksum ?1.1.2? ::crc::cksum ?-format format? ?-chunksize size? [ -channel chan | -filename file | string ] ::crc::CksumInit ::crc::CksumUpdate token data ::crc::CksumFinal token _________________________________________________________________ DESCRIPTION
This package provides a Tcl implementation of the cksum(1) algorithm based upon information provided at in the GNU implementation of this program as part of the GNU Textutils 2.0 package. COMMANDS
::crc::cksum ?-format format? ?-chunksize size? [ -channel chan | -filename file | string ] The command takes string data or a channel or file name and returns a checksum value calculated using the cksum(1) algorithm. The result is formatted using the format(n) specifier provided or as an unsigned integer (%u) by default. OPTIONS
-channel name Return a checksum for the data read from a channel. The command will read data from the channel until the eof is true. If you need to be able to process events during this calculation see the PROGRAMMING INTERFACE section -filename name This is a convenience option that opens the specified file, sets the encoding to binary and then acts as if the -channel option had been used. The file is closed on completion. -format string Return the checksum using an alternative format template. PROGRAMMING INTERFACE
The cksum package implements the checksum using a context variable to which additional data can be added at any time. This is expecially useful in an event based environment such as a Tk application or a web server package. Data to be checksummed may be handled incrementally during a fileevent handler in discrete chunks. This can improve the interactive nature of a GUI application and can help to avoid excessive memory consumption. ::crc::CksumInit Begins a new cksum context. Returns a token ID that must be used for the remaining functions. An optional seed may be specified if required. ::crc::CksumUpdate token data Add data to the checksum identified by token. Calling CksumUpdate $token "abcd" is equivalent to calling CksumUpdate $token "ab" followed by CksumUpdate $token "cb". See EXAMPLES. ::crc::CksumFinal token Returns the checksum value and releases any resources held by this token. Once this command completes the token will be invalid. The result is a 32 bit integer value. EXAMPLES
% crc::cksum "Hello, World!" 2609532967 % crc::cksum -format 0x%X "Hello, World!" 0x9B8A5027 % crc::cksum -file cksum.tcl 1828321145 % set tok [crc::CksumInit] % crc::CksumUpdate $tok "Hello, " % crc::CksumUpdate $tok "World!" % crc::CksumFinal $tok 2609532967 AUTHORS
Pat Thoyts BUGS, IDEAS, FEEDBACK This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category crc of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation. SEE ALSO
crc32(n), sum(n) KEYWORDS
checksum, cksum, crc, crc32, cyclic redundancy check, data integrity, security COPYRIGHT
Copyright (c) 2002, Pat Thoyts crc 1.1.2 cksum(n)
All times are GMT -4. The time now is 07:20 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy