Advise to print lines before and after patterh match and checking and removing duplicate files

04-24-2020

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Why check for duplicate files if you can avoid producing them in the first place? Try

Code:

$ touch filesdone
$ awk -vLCNT=10 -vPAT="CORRUPTION DETECTED" '
BEGIN           {LCNT++
                }
FNR == 1        {PR = 0
                 print "^" FILENAME "$" >> "filesdone"
                }
                {T[FNR%LCNT] = $0
                }
$0 ~ PAT        {print ""
                 PR = FNR + LCNT
                 for (i=1; i<LCNT; i++) print T[(FNR+i)%LCNT]
                }
FNR < PR
' $(ls $DIR_PATH/alert_${sid}* | grep -vf filesdone) /dev/null

This little script keeps an LCNT (here: 10) deep cyclic buffer of the lines encountered, and, if the search pattern is matched, prints these buffered LCNT lines, the actual line, and LCNT lines to come. Caveat: if the pattern is encountered again BEFORE the latter have been printed, they will stop, and the cycle starts anew with printing the buffer. You may redirect - immediately in awk itself - the results to individual files belonging to the originals.

The actual file name, when first encountered, adorned with BOL and EOL anchors, is retained in a, say, "control file" and will never be treated again. Feel free to put the "control file" anywhere else. Little drawback: you have to touch the "control file" once before the first run to make sure it exists.
The list of files presented to awk is the lsed directory contents with the "already done files" removed by grep's -v option. The /dev/null empty file serves as a dummy to avoid awk reading from terminal / stdin when no new files exist, and all old files fall victim to this procedure.

Give it a shot and report back.

Last edited by RudiC; 04-24-2020 at 06:25 PM..

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

04-26-2020

Registered User

295, 6

Join Date: May 2009

Last Activity: 7 May 2020, 5:18 PM EDT

Posts: 295

Thanks Given: 62

Thanked 6 Times in 6 Posts

Hi Rudic

Your suggestion is really cool, it does like what you said where it skips the ones that has been parsed before as per the filesdone control file. I tested it and rename one of the file and re-run the same awk and it only does the one that it has not work on before.

It works on Linux but not on Solaris. On Solaris it gives error

Code:

grep: illegal option -- f
Usage: grep [-c|-l|-q] -bhinsvw pattern file . . .
awk: syntax error near line 9
awk: bailing out near line 9

I also tried using /usr/xpg4/bin/grep

Code:

awk: syntax error near line 9
awk: bailing out near line 9

The only problem with this approach is that while most of the alert_${sid}* are final, one of them isn't. So there will be several alert_${sid}.log.YYYYMMDDHHMM and one current log that is named alert_${sid}.log. So

Code:

$(ls $DIR_PATH/alert_${sid}* | grep -vf filesdone) /dev/null

should parse the others once but should always be parsing alert_${sid}.log. If such is the case, then the search for the current log may or may not always be a duplicate since the CORRUPT string may or may not appear. Not sure if am explaining it correctly, sorry

newbie_01

View Public Profile for newbie_01

Find all posts by newbie_01

UNIX for Beginners Questions & Answers

Advise to print lines before and after patterh match and checking and removing duplicate files

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Advise on how to print range of lines above and below a number?

Discussion started by: newbie_01

2. UNIX for Dummies Questions & Answers

Removing a set of Duplicate lines from a file

Discussion started by: raosr020

3. Shell Programming and Scripting

Removing a block of duplicate lines from a file

Discussion started by: raosr020

4. Shell Programming and Scripting

removing duplicate lines while maintaing coherence with second file

Discussion started by: adrunknarwhal

5. Shell Programming and Scripting

Removing Duplicate Lines per Section

Discussion started by: petersf

6. Shell Programming and Scripting

Removing duplicates from string (not duplicate lines)

Discussion started by: vickylife

7. Shell Programming and Scripting

removing the duplicate lines in a file

Discussion started by: Sharmila_P

8. Shell Programming and Scripting

removing duplicate blank lines

Discussion started by: rameezrajas

9. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Discussion started by: ocelot

10. UNIX for Dummies Questions & Answers

Removing duplicate lines ignore case

Discussion started by: hellsd