define compare. How similar are we talking here? Exactly the same means they are similar? If so use checksums.
matching files will all have the same checksum. AIX cksum example output:
where 3995432187 is the checksum, 1390 is the file size in bytes, file1.pdf is the filename. This is why sorting by checksum finds multiple duplicates.
Just wanted to point out a gotcha here: even a single extra blank line or a space will produce dissimilar cksums. Otherwise this is okay to find similar files.
I got many pair files, which only have small difference, such as more space, or more empty line, and some unreadable characters.
If list by commend "diff", I can see many many difference.
So I'd like to write a script to compare the pair files, if 95% contents are same, I will think they are... (2 Replies)
May i know how do i go along finding similar entry in a .txt file, which is used a as a "database" and post and error saying the entry existed when we key in the entry.
---------- Post updated at 05:18 PM ---------- Previous update was at 05:16 PM ----------
i mean post an error saying the... (5 Replies)
I have a file that has the words I want to find in other files (but lets say I just want to find my words in a single file). Those words are IDs, so if my word is ZZZ4, outputs like aaZZZ4, ZZZ4bb, aaZZZ4bb, ZZ4, ZZZ, ZyZ4, ZZZ4.8 (or anything like that) WON'T BE USEFUL.
I need the whole word... (6 Replies)
Hi
I have one directory whose name i don't remember exactly only starting letter i know which is Resp.
Can you please let me know the command to find the similar directory in the root.
Rajesh (3 Replies)
Hi,
I have file in my $datadir as below :-
SAT_1.txt
SAT_2.txt
BAT_UD.lst
BAT_DD1.lst
DUTT_1.txt
DUTT_la.txt
Expected result :-
should get all the above file in $<Filename>_file.lst
Below is my code :-
for i in SAT BAT DUTT
do
touch a.lst
cd $datadir (1 Reply)
Hi,
I need to compare the /etc/passwd files from 2 servers, and extract the users that are similar in these two files. I sorted the 2 files based on the user IDs (UID) (3rd column). I first sorted the files using the username (1st column), however when I use comm to compare the files there is no... (1 Reply)
Today I change the DB and the PHP code and rebuilt the database for similar threads at the end of each post, increasing from a max of 5 to a max of 10 similar threads per post:
More UNIX and Linux Forum Topics You Might Find Helpful
It was quite easy to do:
1. Increased the max size of... (17 Replies)
Discussion started by: Neo
17 Replies
LEARN ABOUT DEBIAN
cksum
cksum(3tcl) Cyclic Redundancy Checks cksum(3tcl)__________________________________________________________________________________________________________________________________________________NAME
cksum - Calculate a cksum(1) compatible checksum
SYNOPSIS
package require Tcl 8.2
package require cksum ?1.1.3?
::crc::cksum ?-format format? ?-chunksize size? [ -channel chan | -filename file | string ]
::crc::CksumInit
::crc::CksumUpdate token data
::crc::CksumFinal token
_________________________________________________________________DESCRIPTION
This package provides a Tcl implementation of the cksum(1) algorithm based upon information provided at in the GNU implementation of this
program as part of the GNU Textutils 2.0 package.
COMMANDS
::crc::cksum ?-format format? ?-chunksize size? [ -channel chan | -filename file | string ]
The command takes string data or a channel or file name and returns a checksum value calculated using the cksum(1) algorithm. The
result is formatted using the format(3tcl) specifier provided or as an unsigned integer (%u) by default.
OPTIONS -channel name
Return a checksum for the data read from a channel. The command will read data from the channel until the eof is true. If you need
to be able to process events during this calculation see the PROGRAMMING INTERFACE section
-filename name
This is a convenience option that opens the specified file, sets the encoding to binary and then acts as if the -channel option had
been used. The file is closed on completion.
-format string
Return the checksum using an alternative format template.
PROGRAMMING INTERFACE
The cksum package implements the checksum using a context variable to which additional data can be added at any time. This is expecially
useful in an event based environment such as a Tk application or a web server package. Data to be checksummed may be handled incrementally
during a fileevent handler in discrete chunks. This can improve the interactive nature of a GUI application and can help to avoid excessive
memory consumption.
::crc::CksumInit
Begins a new cksum context. Returns a token ID that must be used for the remaining functions. An optional seed may be specified if
required.
::crc::CksumUpdate token data
Add data to the checksum identified by token. Calling CksumUpdate $token "abcd" is equivalent to calling CksumUpdate $token "ab"
followed by CksumUpdate $token "cb". See EXAMPLES.
::crc::CksumFinal token
Returns the checksum value and releases any resources held by this token. Once this command completes the token will be invalid. The
result is a 32 bit integer value.
EXAMPLES
% crc::cksum "Hello, World!"
2609532967
% crc::cksum -format 0x%X "Hello, World!"
0x9B8A5027
% crc::cksum -file cksum.tcl
1828321145
% set tok [crc::CksumInit]
% crc::CksumUpdate $tok "Hello, "
% crc::CksumUpdate $tok "World!"
% crc::CksumFinal $tok
2609532967
AUTHORS
Pat Thoyts
BUGS, IDEAS, FEEDBACK
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category crc of
the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for
either package and/or documentation.
SEE ALSO crc32(3tcl), sum(3tcl)KEYWORDS
checksum, cksum, crc, crc32, cyclic redundancy check, data integrity, security
CATEGORY
Hashes, checksums, and encryption
COPYRIGHT
Copyright (c) 2002, Pat Thoyts
crc 1.1.3 cksum(3tcl)