Visit Our UNIX and Linux User Community

Top Forums UNIX for Beginners Questions & Answers Advise to print lines before and after patterh match and checking and removing duplicate files Post 303046126 by RudiC on Friday 24th of April 2020 05:18:42 PM
Old 04-24-2020
Why check for duplicate files if you can avoid producing them in the first place? Try
Code:
$ touch filesdone
$ awk -vLCNT=10 -vPAT="CORRUPTION DETECTED" '
BEGIN           {LCNT++
                }
FNR == 1        {PR = 0
                 print "^" FILENAME "$" >> "filesdone"
                }
                {T[FNR%LCNT] = $0
                }
$0 ~ PAT        {print ""
                 PR = FNR + LCNT
                 for (i=1; i<LCNT; i++) print T[(FNR+i)%LCNT]
                }
FNR < PR
' $(ls $DIR_PATH/alert_${sid}* | grep -vf filesdone) /dev/null

This little script keeps an LCNT (here: 10) deep cyclic buffer of the lines encountered, and, if the search pattern is matched, prints these buffered LCNT lines, the actual line, and LCNT lines to come. Caveat: if the pattern is encountered again BEFORE the latter have been printed, they will stop, and the cycle starts anew with printing the buffer. You may redirect - immediately in awk itself - the results to individual files belonging to the originals.

The actual file name, when first encountered, adorned with BOL and EOL anchors, is retained in a, say, "control file" and will never be treated again. Feel free to put the "control file" anywhere else. Little drawback: you have to touch the "control file" once before the first run to make sure it exists.
The list of files presented to awk is the lsed directory contents with the "already done files" removed by grep's -v option. The /dev/null empty file serves as a dummy to avoid awk reading from terminal / stdin when no new files exist, and all old files fall victim to this procedure.


Give it a shot and report back.

Last edited by RudiC; 04-24-2020 at 07:25 PM..
This User Gave Thanks to RudiC For This Post:
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing duplicate lines ignore case

hi, I have the following input in file: abc ab a AB b c a C B When I use uniq -u file,the out put file is: abc ab AB c v B C (17 Replies)
Discussion started by: hellsd
17 Replies

2. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Hi, I am trying to remove duplicate lines from a file. For example the contents of example.txt is: this is a test 2342 this is a test 34343 this is a test 43434 and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with: 2342... (4 Replies)
Discussion started by: ocelot
4 Replies

3. Shell Programming and Scripting

removing duplicate blank lines

Hi, how to remove the blank lines from the file only If we have more than one blank line. thanks rameez (8 Replies)
Discussion started by: rameezrajas
8 Replies

4. Shell Programming and Scripting

removing the duplicate lines in a file

Hi, I need to concatenate three files in to one destination file.In this if some duplicate data occurs it should be deleted. eg: file1: ----- data1 value1 data2 value2 data3 value3 file2: ----- data1 value1 data4 value4 data5 value5 file3: ----- data1 value1 data4 value4 (3 Replies)
Discussion started by: Sharmila_P
3 Replies

5. Shell Programming and Scripting

Removing duplicates from string (not duplicate lines)

please help me in getting following: Input Desired output x="foo" foo x="foo foo" foo x="foo foo" foo x="foo abc foo" foo abc x="foo foo1 foo2" foo foo1 foo2 I need to remove duplicated from string.. (8 Replies)
Discussion started by: vickylife
8 Replies

6. Shell Programming and Scripting

Removing Duplicate Lines per Section

Hello, I am in need of removing duplicate lines from within a file per section. File: ABC1 012345 header ABC2 7890-000 ABC3 012345 Header Table ABC4 ABC5 593.0000 587.4800 ABC5 593.5000 587.6580 <= dup need to remove ABC5 593.5000 ... (5 Replies)
Discussion started by: petersf
5 Replies

7. Shell Programming and Scripting

removing duplicate lines while maintaing coherence with second file

So I have two files. The first file, file1.txt, has lines of numbers separated by commas. file1.txt 10,2,30,50 22,6,3,15,16,100 73,55 78,40,33,30,11 73,55 99,82,85 22,6,3,15,16,100 The second file, file2.txt, has sentences. file2.txt "the cat is fat" "I like eggs" "fish live in... (6 Replies)
Discussion started by: adrunknarwhal
6 Replies

8. Shell Programming and Scripting

Removing a block of duplicate lines from a file

Hi all, I have a file with the data 1 abc 2 123 3 ; 4 rao 5 bell 6 ; 7 call 8 abc 9 123 10 ; 11 rao 12 bell 13 ; (10 Replies)
Discussion started by: raosr020
10 Replies

9. UNIX for Dummies Questions & Answers

Removing a set of Duplicate lines from a file

Hi, How do i remove a set of duplicate lines from a file. My file contains the lines: abc def ghi abc def ghi jkl mno pqr jkl mno (1 Reply)
Discussion started by: raosr020
1 Replies

10. UNIX for Beginners Questions & Answers

Advise on how to print range of lines above and below a number?

Hi, I have attached an output file which is some kind of database file mapping. It is basically like an allocation mapping of a tablespace and its datafile/s. The output is generated by the SQL script that I found from 401 Authorization Required Excerpts of the file are as below: ... (2 Replies)
Discussion started by: newbie_01
2 Replies
ZIPGREP(1L)															       ZIPGREP(1L)

NAME
zipgrep - search files in a ZIP archive for lines matching a pattern SYNOPSIS
zipgrep [egrep_options] pattern file[.zip] [file(s) ...] [-x xfile(s) ...] DESCRIPTION
zipgrep will search files within a ZIP archive for lines matching the given string or pattern. zipgrep is a shell script and requires egrep(1) and unzip(1L) to function. Its output is identical to that of egrep(1). ARGUMENTS
pattern The pattern to be located within a ZIP archive. Any string or regular expression accepted by egrep(1) may be used. file[.zip] Path of the ZIP archive. (Wildcard expressions for the ZIP archive name are not supported.) If the literal filename is not found, the suffix .zip is appended. Note that self-extracting ZIP files are supported, as with any other ZIP archive; just specify the .exe suffix (if any) explicitly. [file(s)] An optional list of archive members to be processed, separated by spaces. If no member files are specified, all members of the ZIP archive are searched. Regular expressions (wildcards) may be used to match multiple members: * matches a sequence of 0 or more characters ? matches exactly 1 character [...] matches any single character found inside the brackets; ranges are specified by a beginning character, a hyphen, and an end- ing character. If an exclamation point or a caret (`!' or `^') follows the left bracket, then the range of characters within the brackets is complemented (that is, anything except the characters inside the brackets is considered a match). (Be sure to quote any character that might otherwise be interpreted or modified by the operating system.) [-x xfile(s)] An optional list of archive members to be excluded from processing. Since wildcard characters match directory separators (`/'), this option may be used to exclude any files that are in subdirectories. For example, ``zipgrep grumpy foo *.[ch] -x */*'' would search for the string ``grumpy'' in all C source files in the main directory of the ``foo'' archive, but none in any subdirectories. Without the -x option, all C source files in all directories within the zipfile would be searched. OPTIONS
All options prior to the ZIP archive filename are passed to egrep(1). SEE ALSO
egrep(1), unzip(1L), zip(1L), funzip(1L), zipcloak(1L), zipinfo(1L), zipnote(1L), zipsplit(1L) URL
The Info-ZIP home page is currently at http://www.info-zip.org/pub/infozip/ or ftp://ftp.info-zip.org/pub/infozip/ . AUTHORS
zipgrep was written by Jean-loup Gailly. Info-ZIP 20 April 2009 ZIPGREP(1L)

Featured Tech Videos

All times are GMT -4. The time now is 02:34 PM.
Unix & Linux Forums Content Copyright 1993-2021. All Rights Reserved.
Privacy Policy