I need help with a script which accepts one argument and goes through all the files under a directory and prints a list of possible duplicate files As its output, it prints zero or more lines, each one containing a space-separated list of filenames. All the files listed on one line have the same... (1 Reply)
Hi,
is it possible to remove all duplicate lines from all txt files in a specific folder?
This is too hard for me maybe someone could help.
lets say we have an amount of textfiles 1 or 2 or 3 or... maximum 50
each textfile has lines with text.
I want all lines of all textfiles... (8 Replies)
Hi
I have been struggling with a script for removing duplicate messages from a shared mailbox.
I would like to search for duplicate messages based on the “Message-ID” string within the messages files.
I have managed to find the duplicate “Message-ID” strings and (if I would like) delete... (1 Reply)
Hi all.
Am doing continuous backup of mailboxes using rsync.
So whenever a new mail arrives it is automatically copied on backup server.
When a new mail arrives it is named as xyz:2, when it is read by the email client an S is appended xyz:2,S
Eventually , 2 copies of the same file exist on... (7 Replies)
Hello,
I wrote a basic script that works however I am was wondering if it could be sped up. I am comparing files over ssh to remove the file from the source server directory if a match occurs. Please Advise me on my mistakes.
#!/bin/bash
for file in `ls /export/home/podcast2/"$1" ` ; do
... (5 Replies)
Dear All,
I have multiple files having number of records, consist of more than 10 columns some column values are duplicate and i want to remove these duplicate values from these files.
Duplicate values may come in different files.... all files laying in single directory..
Need help to... (3 Replies)
Hello again, I am wanting to remove all duplicate blocks of XML code in a file. This is an example:
input:
<string-array name="threeItems">
<item>item1</item>
<item>item2</item>
<item>item3</item>
</string-array>
<string-array name="twoItems">
<item>item1</item>
<item>item2</item>... (19 Replies)
Hi,
In a directory, e.g. ~/corpus is a lot of files and subdirectories. Some of the files are named:
12345___PP___0902___AA.txt
12346___PP___0902___AA. txt
12347___PP___0902___AA. txt
The amount of files varies. I need to keep the highest (12347___PP___0902___AA. txt) and remove... (5 Replies)
So, I have text files,
one "fail.txt"
And one
"color.txt"
I now want to use a command line (DOS) to remove ANY line that is PRESENT IN BOTH from each text file.
Afterwards there shall be no duplicate lines. (1 Reply)
TARGET_DIR='/media/andy/MAXTOR_SDB1/Ubuntu_Mate_18.04/'
REGEX='{4}-{2}-{2}_{2}:{2}' # regular expression that match to: date '+%Y-%m-%d_%H:%M'
LATEST_FILE="$(ls "$TARGET_DIR" | egrep "^${REGEX}$" | tail -1)"
find "$TARGET_DIR" ! -name "$LATEST_FILE" -type f -regextype egrep -regex... (7 Replies)
Discussion started by: drew77
7 Replies
LEARN ABOUT OPENSOLARIS
cksum
cksum(1) User Commands cksum(1)NAME
cksum - write file checksums and sizes
SYNOPSIS
cksum [file]...
DESCRIPTION
The cksum command calculates and writes to standard output a cyclic redundancy check (CRC) for each input file, and also writes to standard
output the number of octets in each file.
For each file processed successfully, cksum will write in the following format:
"%u %d %s
" <checksum>, <# of octets>, <path name>
If no file operand was specified, the path name and its leading space will be omitted.
The CRC used is based on the polynomial used for CRC error checking in the referenced Ethernet standard.
The encoding for the CRC checksum is defined by the generating polynomial:
G(x) = x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1
Mathematically, the CRC value corresponding to a given file is defined by the following procedure:
1. The n bits to be evaluated are considered to be the coefficients of a mod 2 polynomial M(x) of degree n-1. These n bits are the
bits from the file, with the most significant bit being the most significant bit of the first octet of the file and the last bit
being the least significant bit of the last octet, padded with zero bits (if necessary) to achieve an integral number of octets,
followed by one or more octets representing the length of the file as a binary value, least significant octet first. The small-
est number of octets capable of representing this integer is used.
2. M(x) is multiplied by x ^32 (that is, shifted left 32 bits) and divided by G(x) using mod 2 division, producing a remainder R(x)
of degree <= 31.
3. The coefficients of R(x) are considered to be a 32-bit sequence.
4. The bit sequence is complemented and the result is the CRC.
OPERANDS
The following operand is supported:
file A path name of a file to be checked. If no file operands are specified, the standard input is used.
USAGE
The cksum command is typically used to quickly compare a suspect file against a trusted version of the same, such as to ensure that files
transmitted over noisy media arrive intact. However, this comparison cannot be considered cryptographically secure. The chances of a dam-
aged file producing the same CRC as the original are astronomically small; deliberate deception is difficult, but probably not impossible.
Although input files to cksum can be any type, the results need not be what would be expected on character special device files. Since this
document does not specify the block size used when doing input, checksums of character special files need not process all of the data in
those files.
The algorithm is expressed in terms of a bitstream divided into octets. If a file is transmitted between two systems and undergoes any data
transformation (such as moving 8-bit characters into 9-bit bytes or changing "Little Endian" byte ordering to "Big Endian"), identical CRC
values cannot be expected. Implementations performing such transformations may extend cksum to handle such situations.
See largefile(5) for the description of the behavior of cksum when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).
ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of cksum: LANG, LC_ALL, LC_CTYPE, LC_MES-
SAGES, and NLSPATH.
EXIT STATUS
The following exit values are returned:
0 All files were processed successfully.
>0 An error occurred.
ATTRIBUTES
See attributes(5) for descriptions of the following attributes:
+-----------------------------+-----------------------------+
| ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+-----------------------------+-----------------------------+
|Availability |SUNWcsu |
+-----------------------------+-----------------------------+
|Interface Stability |Standard |
+-----------------------------+-----------------------------+
SEE ALSO digest(1), sum(1), bart(1M), attributes(5), environ(5), largefile(5), standards(5)SunOS 5.11 1 Feb 1995 cksum(1)