Visit Our UNIX and Linux User Community

Top Forums UNIX for Beginners Questions & Answers Advise to print lines before and after patterh match and checking and removing duplicate files Post 303046121 by newbie_01 on Friday 24th of April 2020 01:06:12 PM
Old 04-24-2020
Advise to print lines before and after patterh match and checking and removing duplicate files

Hi,


I have a script that search log files for the string CORRUPT and I then print 10 lines before and after the pattern match. Let's call this pattern_match.ksh



First I do a
Code:
grep -in "CORRUPTION DETECTED" $DIR_PATH/alert_${sid}* > ${tmpfile_00}.${sid}

which gives me the list of files that has the string "CORRUPTION DETECTED" in them


Then using a while loop, I do something like below. Ignore the ..., am just showing the part where I print the matching pattern and lines before and after the pattern match.



Code:
   while read line
   do
      ALERTLOG=`echo $line | awk -F":" '{ print $1 }'`
      str_found=`echo $line | awk -F":" '{ print $2 }'`
      let str_before=${str_found}-10
      let str_after=${str_found}+10
...
...
      sed -n "${str_before},${str_after}p" ${ALERTLOG} > ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.tmp.CURRENT
      echo

      count=`ls -l ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.out.* 2>/dev/null | wc -l | awk '{ print $1 }'`



      if [[ $count = 0 ]] ; then
         let next=${count}+1
         cp -p ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.tmp.CURRENT ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.out.${next}
      else
...
...
...
          cp -p ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.tmp.CURRENT ${WORK_DIR}/${thisSCRIPT}.${thisSERVER}.${sid}.out.${next}
         fi
      fi

...
...
...


  done < ${tmpfile_00}.${sid}


So, at the moment, it is doing what I am after, that is, so now I have extracts of files that contain the "CORRUPTION DETECTED" string with +/- 10 lines from the pattern match.


This is similar to
Code:
awk 'c-->0;$0~s{if(b)for(c=b+1;c>1;c--)print r[(NR-c+1)%b];print;c=a}b{r[NR%b]=$0}' b=3 a=5 s="abcd"

from Print lines before and after pattern match Unfortunately, I don't have the nawk/gawk that I needed to use it.


There is also the sed one liner example
Code:
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h

but unfortunately I can't get the proper syntax to get it to print more lines before the pattern match. I know how to print more lines after the pattern match but using several
Code:
n;p;

. Is there a short version for sed if you want to do
Code:
n;p;n;p;n;p;n;p;n;p;n;p;n;p;n;p;n;p;n;p;

which is to print 10 lines after the match Smilie



I don't have the grep version also that will allow me to grep and print lines before and after match, i.e.the
Code:
grep -A1 -B1

thingy.


Hence, I end up doing a grep -in and then doing +/- and sed -n is a long and crude way of doing what I am after but I don't know of any other way of doing it the way I can understand it Smilie I am having a hard time understanding the sed and awk one-liners. Also, my method makes it simpler for if I want to print more than +/- 10 lines, I simply change the lines that do the +/- section.



However, there are flaws to my script as always Smilie

  1. If for example the log file is small that it only has 10 lines for example, the sed -n "${str_before},${str_after}p" will then give error. I can't find a way of getting sed to check for valid line numbers to do a sed on, is there?
  2. Because the files that I am doing grep on doesn't get deleted until after a month or so, and I run this corruption check script daily, I do end up with several duplicate files named differently.
How do I check and remove duplicate files that are named differently? I used the following script and running md5sum. Script is name x.ksh at the moment, will change it later :-)


Sample run of the x.ksh script with some example log files is as below:



Code:
$: ls -1 *log*
log.1
log.10
log.11
log.12
log.13
log.14
log.15
log.16
log.17
log.18
log.2
log.3
log.4
log.5
log.6
log.7
log.8
log.9
$: md5sum *log*
c931703fc30e4b98c0352029dca44573  log.1
d92e2c0237a6e575287f10c1a86f4353  log.10
c931703fc30e4b98c0352029dca44573  log.11
d92e2c0237a6e575287f10c1a86f4353  log.12
c931703fc30e4b98c0352029dca44573  log.13
d92e2c0237a6e575287f10c1a86f4353  log.14
c931703fc30e4b98c0352029dca44573  log.15
d92e2c0237a6e575287f10c1a86f4353  log.16
c931703fc30e4b98c0352029dca44573  log.17
d92e2c0237a6e575287f10c1a86f4353  log.18
d92e2c0237a6e575287f10c1a86f4353  log.2
c931703fc30e4b98c0352029dca44573  log.3
d92e2c0237a6e575287f10c1a86f4353  log.4
c931703fc30e4b98c0352029dca44573  log.5
d92e2c0237a6e575287f10c1a86f4353  log.6
c931703fc30e4b98c0352029dca44573  log.7
d92e2c0237a6e575287f10c1a86f4353  log.8
c931703fc30e4b98c0352029dca44573  log.9
$: ./x.ksh
$: ls -1 *log*
log.1
log.2
$: cat x.ksh
#!/bin/ksh
#

#ls -1 *log*

md5sum *log* | sort > tmp.00
md5sum *log* | awk '{ print $1 }' | sort | uniq > tmp.01

while read md5
do
   grep "^${md5}" tmp.00 | awk '{ print $2 }' | sort | sort -n -t. -k2 | awk 'NR>1 { print }' | xargs rm
done < tmp.01

rm tmp.00
rm tmp.01


Is there any other way of checking for duplicate files? At the moment, I run pattern_match.ksh and then call x.ksh from there. My question is, is there a way to check for duplicate files 'immediately' instead of how am doing it at the moment running x.ksh?


For example, if I already have files log.1 to log.50 and they have different checksum meaning they are all different files, non-duplicated. Then the sed/pattern_match.ksh generates file log.51, I want to be able to check log.51 against log.1 to log.50 that it isn't a duplicate of any of them. Or is this already exactly what my x.ksh script is doing and am just over-complicating stuff Smilie I hope I am explaining this correctly.



Anyway, please advise. Thanks in advance.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Removing duplicate lines ignore case

hi, I have the following input in file: abc ab a AB b c a C B When I use uniq -u file,the out put file is: abc ab AB c v B C (17 Replies)
Discussion started by: hellsd
17 Replies

2. UNIX for Dummies Questions & Answers

removing duplicate lines from a file

Hi, I am trying to remove duplicate lines from a file. For example the contents of example.txt is: this is a test 2342 this is a test 34343 this is a test 43434 and i want to remove the "this is a test" lines only and end up with the numbers in the file, that is, end up with: 2342... (4 Replies)
Discussion started by: ocelot
4 Replies

3. Shell Programming and Scripting

removing duplicate blank lines

Hi, how to remove the blank lines from the file only If we have more than one blank line. thanks rameez (8 Replies)
Discussion started by: rameezrajas
8 Replies

4. Shell Programming and Scripting

removing the duplicate lines in a file

Hi, I need to concatenate three files in to one destination file.In this if some duplicate data occurs it should be deleted. eg: file1: ----- data1 value1 data2 value2 data3 value3 file2: ----- data1 value1 data4 value4 data5 value5 file3: ----- data1 value1 data4 value4 (3 Replies)
Discussion started by: Sharmila_P
3 Replies

5. Shell Programming and Scripting

Removing duplicates from string (not duplicate lines)

please help me in getting following: Input Desired output x="foo" foo x="foo foo" foo x="foo foo" foo x="foo abc foo" foo abc x="foo foo1 foo2" foo foo1 foo2 I need to remove duplicated from string.. (8 Replies)
Discussion started by: vickylife
8 Replies

6. Shell Programming and Scripting

Removing Duplicate Lines per Section

Hello, I am in need of removing duplicate lines from within a file per section. File: ABC1 012345 header ABC2 7890-000 ABC3 012345 Header Table ABC4 ABC5 593.0000 587.4800 ABC5 593.5000 587.6580 <= dup need to remove ABC5 593.5000 ... (5 Replies)
Discussion started by: petersf
5 Replies

7. Shell Programming and Scripting

removing duplicate lines while maintaing coherence with second file

So I have two files. The first file, file1.txt, has lines of numbers separated by commas. file1.txt 10,2,30,50 22,6,3,15,16,100 73,55 78,40,33,30,11 73,55 99,82,85 22,6,3,15,16,100 The second file, file2.txt, has sentences. file2.txt "the cat is fat" "I like eggs" "fish live in... (6 Replies)
Discussion started by: adrunknarwhal
6 Replies

8. Shell Programming and Scripting

Removing a block of duplicate lines from a file

Hi all, I have a file with the data 1 abc 2 123 3 ; 4 rao 5 bell 6 ; 7 call 8 abc 9 123 10 ; 11 rao 12 bell 13 ; (10 Replies)
Discussion started by: raosr020
10 Replies

9. UNIX for Dummies Questions & Answers

Removing a set of Duplicate lines from a file

Hi, How do i remove a set of duplicate lines from a file. My file contains the lines: abc def ghi abc def ghi jkl mno pqr jkl mno (1 Reply)
Discussion started by: raosr020
1 Replies

10. UNIX for Beginners Questions & Answers

Advise on how to print range of lines above and below a number?

Hi, I have attached an output file which is some kind of database file mapping. It is basically like an allocation mapping of a tablespace and its datafile/s. The output is generated by the SQL script that I found from 401 Authorization Required Excerpts of the file are as below: ... (2 Replies)
Discussion started by: newbie_01
2 Replies

Featured Tech Videos

All times are GMT -4. The time now is 02:41 PM.
Unix & Linux Forums Content Copyright 1993-2021. All Rights Reserved.
Privacy Policy