Why check for duplicate files if you can avoid producing them in the first place? Try
This little script keeps an LCNT (here: 10) deep cyclic buffer of the lines encountered, and, if the search pattern is matched, prints these buffered LCNT lines, the actual line, and LCNT lines to come. Caveat: if the pattern is encountered again BEFORE the latter have been printed, they will stop, and the cycle starts anew with printing the buffer. You may redirect - immediately in awk itself - the results to individual files belonging to the originals.
The actual file name, when first encountered, adorned with BOL and EOL anchors, is retained in a, say, "control file" and will never be treated again. Feel free to put the "control file" anywhere else. Little drawback: you have to touch the "control file" once before the first run to make sure it exists.
The list of files presented to awk is the lsed directory contents with the "already done files" removed by grep's -v option. The /dev/null empty file serves as a dummy to avoid awk reading from terminal / stdin when no new files exist, and all old files fall victim to this procedure.
Hi,
I am trying to remove duplicate lines from a file. For example the contents of example.txt is:
this is a test
2342
this is a test
34343
this is a test
43434
and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with:
2342... (4 Replies)
Hi,
I need to concatenate three files in to one destination file.In this if some duplicate data occurs it should be deleted.
eg:
file1:
-----
data1 value1
data2 value2
data3 value3
file2:
-----
data1 value1
data4 value4
data5 value5
file3:
-----
data1 value1
data4 value4 (3 Replies)
Hello,
I am in need of removing duplicate lines from within a file per section.
File:
ABC1 012345 header
ABC2 7890-000
ABC3 012345 Header Table
ABC4
ABC5 593.0000 587.4800
ABC5 593.5000 587.6580 <= dup need to remove
ABC5 593.5000 ... (5 Replies)
So I have two files. The first file, file1.txt, has lines of numbers separated by commas.
file1.txt
10,2,30,50
22,6,3,15,16,100
73,55
78,40,33,30,11
73,55
99,82,85
22,6,3,15,16,100
The second file, file2.txt, has sentences.
file2.txt
"the cat is fat"
"I like eggs"
"fish live in... (6 Replies)
Hi,
I have attached an output file which is some kind of database file mapping. It is basically like an allocation mapping of a tablespace and its datafile/s.
The output is generated by the SQL script that I found from 401 Authorization Required
Excerpts of the file are as below:
... (2 Replies)
Discussion started by: newbie_01
2 Replies
LEARN ABOUT REDHAT
pcregrep
PCREGREP(1) General Commands Manual PCREGREP(1)NAME
pcregrep - a grep with Perl-compatible regular expressions.
SYNOPSIS
pcregrep [-Vcfhilnrsvx] pattern [file] ...
DESCRIPTION
pcregrep searches files for character patterns, in the same way as other grep commands do, but it uses the PCRE regular expression library
to support patterns that are compatible with the regular expressions of Perl 5. See pcre(3) for a full description of syntax and semantics.
If no files are specified, pcregrep reads the standard input. By default, each line that matches the pattern is copied to the standard out-
put, and if there is more than one file, the file name is printed before each line of output. However, there are options that can change
how pcregrep behaves.
Lines are limited to BUFSIZ characters. BUFSIZ is defined in <stdio.h>. The newline character is removed from the end of each line before
it is matched against the pattern.
OPTIONS -V Write the version number of the PCRE library being used to the standard error stream.
-c Do not print individual lines; instead just print a count of the number of lines that would otherwise have been printed. If sev-
eral files are given, a count is printed for each of them.
-ffilename
Read patterns from the file, one per line, and match all patterns against each line. There is a maximum of 100 patterns. Trailing
white space is removed, and blank lines are ignored. An empty file contains no patterns and therefore matches nothing.
-h Suppress printing of filenames when searching multiple files.
-i Ignore upper/lower case distinctions during comparisons.
-l Instead of printing lines from the files, just print the names of the files containing lines that would have been printed. Each
file name is printed once, on a separate line.
-n Precede each line by its line number in the file.
-r If any file is a directory, recursively scan the files it contains. Without -r a directory is scanned as a normal file.
-s Work silently, that is, display nothing except error messages. The exit status indicates whether any matches were found.
-v Invert the sense of the match, so that lines which do not match the pattern are now the ones that are found.
-x Force the pattern to be anchored (it must start matching at the beginning of the line) and in addition, require it to match the
entire line. This is equivalent to having ^ and $ characters at the start and end of each alternative branch in the regular
expression.
SEE ALSO pcre(3), Perl 5 documentation
DIAGNOSTICS
Exit status is 0 if any matches were found, 1 if no matches were found, and 2 for syntax errors or inacessible files (even if matches were
found).
AUTHOR
Philip Hazel <ph10@cam.ac.uk>
Last updated: 15 August 2001
Copyright (c) 1997-2001 University of Cambridge.
PCREGREP(1)