define compare. How similar are we talking here? Exactly the same means they are similar? If so use checksums.
matching files will all have the same checksum. AIX cksum example output:
where 3995432187 is the checksum, 1390 is the file size in bytes, file1.pdf is the filename. This is why sorting by checksum finds multiple duplicates.
Just wanted to point out a gotcha here: even a single extra blank line or a space will produce dissimilar cksums. Otherwise this is okay to find similar files.
I got many pair files, which only have small difference, such as more space, or more empty line, and some unreadable characters.
If list by commend "diff", I can see many many difference.
So I'd like to write a script to compare the pair files, if 95% contents are same, I will think they are... (2 Replies)
May i know how do i go along finding similar entry in a .txt file, which is used a as a "database" and post and error saying the entry existed when we key in the entry.
---------- Post updated at 05:18 PM ---------- Previous update was at 05:16 PM ----------
i mean post an error saying the... (5 Replies)
I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL.
I need the whole word... (6 Replies)
Hi
I have one directory whose name i don't remember exactly only starting letter i know which is Resp.
Can you please let me know the command to find the similar directory in the root.
Rajesh (3 Replies)
Hi,
I have file in my $datadir as below :-
SAT_1.txt
SAT_2.txt
BAT_UD.lst
BAT_DD1.lst
DUTT_1.txt
DUTT_la.txt
Expected result :-
should get all the above file in $<Filename>_file.lst
Below is my code :-
for i in SAT BAT DUTT
do
touch a.lst
cd $datadir (1 Reply)
Hi,
I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no... (1 Reply)
Today I change the DB and the PHP code and rebuilt the database for similar threads at the end of each post, increasing from a max of 5 to a max of 10 similar threads per post:
More UNIX and Linux Forum Topics You Might Find Helpful
It was quite easy to do:
1. Increased the max size of... (17 Replies)
Discussion started by: Neo
17 Replies
LEARN ABOUT SUNOS
cksum
cksum(1) User Commands cksum(1)NAME
cksum - write file checksums and sizes
SYNOPSIS
cksum [file...]
DESCRIPTION
The cksum command calculates and writes to standard output a cyclic redundancy check (CRC) for each input file, and also writes to standard
output the number of octets in each file.
For each file processed successfully, cksum will write in the following format:
"%u %d %s
" <checksum>, <# of octets>, <path name>
If no file operand was specified, the path name and its leading space will be omitted.
The CRC used is based on the polynomial used for CRC error checking in the referenced Ethernet standard.
The encoding for the CRC checksum is defined by the generating polynomial:
G(x) = x**32 + x**26 + x**23 + x**22 + x**16 + x**12 + x**11 + x**10 + x**8 + x**7 + x**5 + x**4 + x**2 + x + 1
Mathematically, the CRC value corresponding to a given file is defined by the following procedure:
1. The n bits to be evaluated are considered to be the coefficients of a mod 2 polynomial M(x) of degree n-1. These n bits are the bits
from the file, with the most significant bit being the most significant bit of the first octet of the file and the last bit being the
least significant bit of the last octet, padded with zero bits (if necessary) to achieve an integral number of octets, followed by one
or more octets representing the length of the file as a binary value, least significant octet first. The smallest number of octets
capable of representing this integer is used.
2. M(x) is multiplied by x**32 (that is, shifted left 32 bits) and divided by G(x) using mod 2 division, producing a remainder R(x) of
degree <= 31.
3. The coefficients of R(x) are considered to be a 32-bit sequence.
4. The bit sequence is complemented and the result is the CRC.
OPERANDS
The following operand is supported:
file A path name of a file to be checked. If no file operands are specified, the standard input is used.
USAGE
The cksum command is typically used to quickly compare a suspect file against a trusted version of the same, such as to ensure that files
transmitted over noisy media arrive intact. However, this comparison cannot be considered cryptographically secure. The chances of a dam-
aged file producing the same CRC as the original are astronomically small; deliberate deception is difficult, but probably not impossible.
Although input files to cksum can be any type, the results need not be what would be expected on character special device files. Since this
document does not specify the block size used when doing input, checksums of character special files need not process all of the data in
those files.
The algorithm is expressed in terms of a bitstream divided into octets. If a file is transmitted between two systems and undergoes any data
transformation (such as moving 8-bit characters into 9-bit bytes or changing "Little Endian" byte ordering to "Big Endian"), identical CRC
values cannot be expected. Implementations performing such transformations may extend cksum to handle such situations.
See largefile(5) for the description of the behavior of cksum when encountering files greater than or equal to 2 Gbyte ( 2**31 bytes).
ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of cksum: LANG, LC_ALL, LC_CTYPE, LC_MES-
SAGES, and NLSPATH.
EXIT STATUS
The following exit values are returned:
0 All files were processed successfully.
>0 An error occurred.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWcsu |
+-----------------------------+-----------------------------+
|Interface Stability |Standard |
+-----------------------------+-----------------------------+
SEE ALSO sum(1), attributes(5), environ(5), largefile(5), standards(5)SunOS 5.10 1 Feb 1995 cksum(1)