removing certain paragraphs for matching patterns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting removing certain paragraphs for matching patterns
# 1  
Old 07-18-2008
removing certain paragraphs for matching patterns

Hi,
I have a log file which might have certain paragraphs.

Switch not possible Error code 1234
Process number 678

Log not available Error code 567
Process number 874
.....
......
......


Now I create an exception file like this.

cat text.exp
Error code 1234
Process number 874


What I want is to read my original file but exclude any paragraphs that might contact the keywords written in the exception file. How can I do that?
# 2  
Old 07-18-2008
Code:
$ cat file1
Switch not possible Error code 1234
Process number 678
 
Log not available Error code 567
Process number 874
 
Log not available Error code 333
Process number 34
 
Log not available Error code 33334234
Process number 012
 
Log not available Error code 333 hello
Process number 012
 
Log not available Error code 567
Process number 8743434
 
Log not available Error code 567
Process number 874 ok

Code:
$ cat text.exp 
Error code 1234
Process number 874
Error code 333

Code:
$ ./script.sh 
Log not available Error code 33334234
Process number 012
 
Log not available Error code 567
Process number 8743434

Code:
$ cat script.sh 
#!/bin/bash
awk '
  BEGIN{
        RS=""; ORS="\n\n"
  }
  NR == FNR{
        split($0, arr, "\n");
  }
  NR != FNR{
        str = $0 "\n"
        for(i=1; i <= length(arr); i++){
          regx = arr[i] "[[:space:]?|\n]"
          if(str ~ regx) break;
        }
        if(i > length(arr)) print $0
  }
' text.exp file1

hope to help u!Smilie

.Aaron
# 3  
Old 07-19-2008
I am not sure how it works but looks like this could help me. Thank you for such a short turnaround. I will check the awk script and see how it works.

Just as a doubt can't I use a while read do loop and do the same thing. Something like this...

cat tem.exp | while read exception
do
cat file | grep -iv $exception
done


I know the above code is not right. But thats just a basic idea. I dont know how to remove a para in the first place. And then I need to do it in a loop so that I parse each line in the exception file.
# 4  
Old 07-19-2008
Hi.

This is a long solution, but modular for generality. It consists primarily of 2 perl scripts. The first reads paragraphs and creates single lines of them, using a character for the newline (I used "%"). The second perl script does the opposite, takes the long "%"-embedded lines, and creates separate lines.

With those two on the outside, we can manipulate the file as we wish with line-oriented *nix tools. In this demonstration a grep is used to eliminate paragraphs (long lines) matching the phrases you wish.

Here are the perl scripts:
Code:
#!/usr/bin/perl

# @(#) p1       Demonstrate paragraphs into lines, substitution for newline.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;

my($FAKE_RS) = "%";

# read paragraphs
$/ = "\n\n";

while ( <> ) {
        s/\n/$FAKE_RS/msg;
        print "$_\n";
}

exit(0);

and:
Code:
#!/usr/bin/perl

# @(#) p2       Demonstrate read lines, substituting newline.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;

my($FAKE_RS) = "%";

while ( <> ) {
        s/$FAKE_RS$FAKE_RS/\n/msg;
        s/$FAKE_RS/\n/msg;
        print "$_";
}

exit(0);

These can be driven by a shell script:
Code:
#!/bin/bash -

# @(#) s1       Demonstrate pipeline for paragraph matching.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1)
set -o nounset
echo

FILE1=data1
FILE2=data2

echo
echo " Data file $FILE1:"
cat $FILE1

echo
echo " Data file $FILE2:"
cat $FILE2

echo
echo " Results:"
./p1 $FILE1 |
tee t1 |
grep -v -f $FILE2 |
tee t2 |
./p2

exit 0

Producing:
Code:
% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0


 Data file data1:
Switch not possible Error code 1234
Process number 678

Log not available Error code 567
Process number 874

Log not available Error code 333
Process number 34

Log not available Error code 33334234
Process number 012

Log not available Error code 333 hello
Process number 012

Log not available Error code 567
Process number 8743434

Log not available Error code 567
Process number 874 ok

Log not available Error code 999
Process number 777 missing

 Data file data2:
Error code 1234
Error code 333
Process number 874%

 Results:
Log not available Error code 567
Process number 8743434

Log not available Error code 567
Process number 874 ok

Log not available Error code 999
Process number 777 missing

The patterns to be excluded could contain a "%" to force exclusion of patterns ending at the point corresponding to a newline, as with pattern 3. If the character "%" occurs in either data file, then changes would need to be made to the perl scripts and the pattern data file. However, the manipulation is generally unaware that it is operating on paragraphs -- it sees everything as just a line, a long line to be sure, but just a line. This places the complexity outside the scope of the real operation you wish to perform.

The intermediate files t1 and t2 may be viewed to see in more detail how the process works. The tee commands may be removed when desired -- just delete the lines, that's why they are on separate lines in the pipeline.

The paragraphs must be separated by empty lines, no spaces, TABs, etc. are allowed, only a newline.

See man pages for details ... cheers, drl
# 5  
Old 07-21-2008
awk:

Code:
nawk '{
if (NR==FNR){arr[$NF]++}
else{
if (FNR%3==1)
{
        if($NF in arr){flag=1}
        else{t=$0}
}
else if(FNR%3==2){
	if($NF in arr || flag==1){flag=0;next}
        else{print t;print $0}
}
else
        next
}
}' text.exp file


perl:

Code:
open FH,"<text.exp";
while(<FH>){
	@arr=split(" ",$_);
	%hash=(%hash,$arr[$#arr],$.);
}
close(FH);
open FH,"<file";
while(<FH>){
	@temp=split(" ",$_);
	$t=$temp[$#temp];
	$line=$_ if (!exists($hash{$t}) && $.%3==1);
	if(!exists($hash{$t}) && $.%3==2){
		if(length($line)>0){
			push(@res,$line);
			push(@res,$_);
		}
	}
	if($,%3==0){
		next;
	}
}
foreach (@res){
print;
}
close(FH);


Last edited by summer_cherry; 07-21-2008 at 04:57 AM..
# 6  
Old 07-22-2008
Thank you so much. This was really easy with so many options.
# 7  
Old 07-22-2008
The first awk script runs well on one server on the other it says syntax error on line 12 where the regx occurs.

The second NAWK script that was given has a diff problem. It runs fine but it gives out all lines which dont match the pattern. I wanted to ignore the entire paragraph with matching pattern.

I cant use perl script since its not available on every server.

Last edited by kaushys; 07-22-2008 at 06:23 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extended grep not matching some patterns

i have a file where the hostnames and variables are in same line in below format, am able extract some part variables while otherlike subscriptions and handler is missing. can you please correct me if grep is able to perform this ? cat /tmp/test localhost subscriptions='' handler="genie"... (14 Replies)
Discussion started by: rakeshkumar
14 Replies

2. Shell Programming and Scripting

Delete patterns matching

Delete patterns matching OS version: RHEL 7.3 Shell : Bash I have a file like below (pattern.txt). I need to delete all lines starting with the following words (words separated by comma below) and ) character. LOGGING, NOCOMPRESS, TABLESPACE , PCTFREE, INITRANS, MAXTRANS, STORAGE,... (3 Replies)
Discussion started by: John K
3 Replies

3. Shell Programming and Scripting

Finding matching patterns in two files

Hi, I have requirement to find the matching patterns of two files in Unix. One file is the log file and the other is the error list file. If any pattern in the log file matches the list of errors in the error list file, then I would need to find the counts of the match. For example, ... (5 Replies)
Discussion started by: Bobby_2000
5 Replies

4. UNIX for Dummies Questions & Answers

Matching two patterns in the consecutive lines

Hi Experts I need to match 2 patterns consecutively and display 25 lines after that. 1st one - Error 2nd one - End string ( comes along with the pattern one) 3rd one - error Logic grep "ERROR OCCURRED :" trace.log | awk -v "ES=:" -v "SS=java.lang.NullPointerException" '{ if($NF ~... (8 Replies)
Discussion started by: senthil.ak
8 Replies

5. Shell Programming and Scripting

Match paragraph between two patterns, delete the duplicate paragraphs

Hello all I have a file my DNS server where there are duplicate paragrapsh like below. How can I remove the duplicate paragraph so that only one paragraph remains. BEGIN; replace into domains (name,type) values ('225.168.192.in-addr.arpa','MASTER'); replace into records (domain_id,... (2 Replies)
Discussion started by: sb245
2 Replies

6. UNIX for Dummies Questions & Answers

Search and extract matching patterns

%%%%% (9 Replies)
Discussion started by: lucasvs
9 Replies

7. Shell Programming and Scripting

print lines between 2 matching patterns

Hi Guys, I have file like below, I want to print all lines between test1231233 to its 10 occurrence(till line 41) test1231233 qwe qwe qweq123 test1231233 qwe qwe qweq23 test1231233 qwe qwe qweq123 test1231233 qwe qwe qweq123131 (3 Replies)
Discussion started by: jagnikam
3 Replies

8. Shell Programming and Scripting

Matching patterns

I have a file name in $f. If $f has "-" at the beginning, or "=", or does not have extension ".ry" or ".xt" or ".dat" then cerr would not be empty. Tried the following but having some problems. set cerr = `echo $f | awk '/^-|=|!.ry|!.xt|!.dat/'` (4 Replies)
Discussion started by: kristinu
4 Replies

9. Shell Programming and Scripting

AWK: matching patterns in 2 different files

In a directory, there are two different file extensions (*.txt and *.xyz) having similar names of numerical strings (*). The (*.txt) contains 5000 multiple files and the (*.xyz) also contains 5000 multiple files. Each of the files has around 4000 rows and 8 columns, with several unique string... (5 Replies)
Discussion started by: asanjuan
5 Replies

10. UNIX for Dummies Questions & Answers

matching 3 patterns in shell script

IN a file I need to check for 3 patterns if all the 3 patterns are in the file. I need to send out an email. All this needs to be done in korn shell script. Please advise. (1 Reply)
Discussion started by: saibsk
1 Replies
Login or Register to Ask a Question