Shellscript to find duplicates according to size Post: 302368655

10 More Discussions You Might Find Interesting

1. Solaris

command to find out total size of a specific file size (spread over the server)

hi all, in my server there are some specific application files which are spread through out the server... these are spread in folders..sub-folders..chid folders... please help me, how can i find the total size of these specific files in the server...

2. Shell Programming and Scripting

shellscript to find a line in between a particular set of lines of a text file

i have a file a.txt and following is only one portion. I want to search <branch value="/dev36/AREA/" include="yes"></branch> present in between <template_file name="Approve External" path="core/approve/bin" and </template_file> where the no of lines containing "<branch value= " is increasing ...

3. Shell Programming and Scripting

use shellscript to find the count of a line in a set of lines

I have a file a.xml some portion of the file is given below.But the file format is same. CTYPE available_templates SYSTEM './available_templates.dtd'> <available_templates> <template_file name="Approve External" path="core/approve/bin" <command_list> <command...

4. Shell Programming and Scripting

ShellScript that emails you size of dir

I have this so far: #!/bin/sh FOLDER='/home'; MAXSIZE='50'; MAILADRES='username@server.com'; if ; then echo "$FOLDER too big" | /usr/sbin/sendmail $MAILADRES echo "test"; fi But i need to figure out how to have it search all the users on the system and then find...

5. Shell Programming and Scripting

find with file size and show the size

Hi All... is the below command be modified in sucha way that i can get the file size along with the name and path of the file the below command only gives me the file location which are more than 100000k...but I want the exact size of the file also.. find / -name "*.*" -size +100000k ...

6. Shell Programming and Scripting

find digit which is greater than 1000 in text -using shellscript

Hi All, I am having an abc.txt , which contains some digits Eg:abc.txt 145 566 355 I want write shellscript in suchway that if any digit is greter than 1000 then it shuld display " text files contain digit, which is greater than 1000" Please help me to do so Thanks..

7. Shell Programming and Scripting

Removing duplicates depending on file size

Hi all, I am working with a huge amount of files in a Linux environment and I was trying to filter my data. Here's what my data looks like Name............................Size OLUSDN.gf.gif-1.JPEG.......5 kb LKJFDA01.gf.gif-1.JPEG.....3 kb LKJFDA01.gf.gif-2.JPEG.....1 kb...

8. Shell Programming and Scripting

How to find the shellscript which is running In background is completed or not?

HI All, I need the answer of below question? 1) how to find the shellscript which is running In background is completed or not ? ex: I know the shellscript name abc.sh which is running in background through cronjob. I want to know this is job is still running or stopped, how to...

9. UNIX for Beginners Questions & Answers

Find duplicates in file with line numbers

Hello All, This is a noob question. I tried searching for the answer but the answer found did not help me . I have a file that can have duplicates. 100 200 300 400 100 150 the number 100 is duplicated twice. I want to find the duplicate along with the line number. expected...

10. Ubuntu

Find duplicates among 2 directories

I have 2 directories, /media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/ /media/andy/MAXTOR_SDB1/Linux_Files/. I want to find which files are duplicates so I can delete them from one of those directories.

LEARN ABOUT OPENDARWIN

sum

CKSUM(1)						    BSD General Commands Manual 						  CKSUM(1)

NAME

     cksum, sum -- display file checksums and block counts

SYNOPSIS

     cksum [-o 1 | 2 | 3] [file ...]
     sum [file ...]

DESCRIPTION

     The cksum utility writes to the standard output three whitespace separated fields for each input file.  These fields are a checksum CRC, the
     total number of octets in the file and the file name.  If no file name is specified, the standard input is used and no file name is written.

     The sum utility is identical to the cksum utility, except that it defaults to using historic algorithm 1, as described below.  It is provided
     for compatibility only.

     The options are as follows:

     -o      Use historic algorithms instead of the (superior) default one.

	     Algorithm 1 is the algorithm used by historic BSD systems as the sum(1) algorithm and by historic AT&T System V UNIX systems as the
	     sum(1) algorithm when using the -r option.  This is a 16-bit checksum, with a right rotation before each addition; overflow is dis-
	     carded.

	     Algorithm 2 is the algorithm used by historic AT&T System V UNIX systems as the default sum(1) algorithm.	This is a 32-bit checksum,
	     and is defined as follows:

		   s = sum of all bytes;
		   r = s % 2^16 + (s % 2^32) / 2^16;
		   cksum = (r % 2^16) + r / 2^16;

	     Algorithm 3 is what is commonly called the '32bit CRC' algorithm.	This is a 32-bit checksum.

	     Both algorithm 1 and 2 write to the standard output the same fields as the default algorithm except that the size of the file in
	     bytes is replaced with the size of the file in blocks.  For historic reasons, the block size is 1024 for algorithm 1 and 512 for
	     algorithm 2.  Partial blocks are rounded up.

     The default CRC used is based on the polynomial used for CRC error checking in the networking standard ISO/IEC 8802-3:1989.  The CRC checksum
     encoding is defined by the generating polynomial:

	   G(x) = x^32 + x^26 + x^23 + x^22 + x^16 + x^12 +
		x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1

     Mathematically, the CRC value corresponding to a given file is defined by the following procedure:

	   The n bits to be evaluated are considered to be the coefficients of a mod 2 polynomial M(x) of degree n-1.  These n bits are the bits
	   from the file, with the most significant bit being the most significant bit of the first octet of the file and the last bit being the
	   least significant bit of the last octet, padded with zero bits (if necessary) to achieve an integral number of octets, followed by one
	   or more octets representing the length of the file as a binary value, least significant octet first.  The smallest number of octets
	   capable of representing this integer are used.

	   M(x) is multiplied by x^32 (i.e., shifted left 32 bits) and divided by G(x) using mod 2 division, producing a remainder R(x) of degree
	   <= 31.

	   The coefficients of R(x) are considered to be a 32-bit sequence.

	   The bit sequence is complemented and the result is the CRC.

DIAGNOSTICS

     The cksum and sum utilities exit 0 on success, and >0 if an error occurs.

SEE ALSO

     md5(1)

     The default calculation is identical to that given in pseudo-code in the following ACM article.

     Dilip V. Sarwate, "Computation of Cyclic Redundancy Checks Via Table Lookup", Communications of the ACM, August 1988.

STANDARDS

     The cksum utility is expected to conform to IEEE Std 1003.2-1992 (``POSIX.2'').

HISTORY

     The cksum utility appeared in 4.4BSD.

BSD
								  April 28, 1995							       BSD