The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
.
google unix.com



Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
how to filter out some paragraphs in a file cnlhap Shell Programming and Scripting 7 08-19-2008 04:03 PM
Using sed to remove paragraphs with variables BlueberryPickle UNIX for Dummies Questions & Answers 1 07-03-2008 10:46 AM
how to sort paragraphs by date within a file nabmufti Shell Programming and Scripting 1 02-13-2008 05:33 PM
how to extract paragraphs from file in BASH script followed by prefix ! , !! and !!! nabmufti Shell Programming and Scripting 6 02-09-2008 08:32 PM
matching 3 patterns in shell script saibsk UNIX for Dummies Questions & Answers 1 01-11-2008 03:06 PM

Closed Thread
English Japanese Spanish French German Portuguese Italian Dutch Swedish Russian Norwegian Hungarian Hebrew Danish Bulgarian Greek Powered by Powered by Google
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
  #1 (permalink)  
Old 07-18-2008
kaushys kaushys is offline
Registered User
  
 

Join Date: Mar 2006
Posts: 22
removing certain paragraphs for matching patterns

Hi,
I have a log file which might have certain paragraphs.

Switch not possible Error code 1234
Process number 678

Log not available Error code 567
Process number 874
.....
......
......


Now I create an exception file like this.

cat text.exp
Error code 1234
Process number 874


What I want is to read my original file but exclude any paragraphs that might contact the keywords written in the exception file. How can I do that?
  #2 (permalink)  
Old 07-18-2008
yunccll yunccll is offline
Registered User
  
 

Join Date: Mar 2008
Posts: 23
Code:
$ cat file1
Switch not possible Error code 1234
Process number 678
 
Log not available Error code 567
Process number 874
 
Log not available Error code 333
Process number 34
 
Log not available Error code 33334234
Process number 012
 
Log not available Error code 333 hello
Process number 012
 
Log not available Error code 567
Process number 8743434
 
Log not available Error code 567
Process number 874 ok
Code:
$ cat text.exp 
Error code 1234
Process number 874
Error code 333
Code:
$ ./script.sh 
Log not available Error code 33334234
Process number 012
 
Log not available Error code 567
Process number 8743434
Code:
$ cat script.sh 
#!/bin/bash
awk '
  BEGIN{
        RS=""; ORS="\n\n"
  }
  NR == FNR{
        split($0, arr, "\n");
  }
  NR != FNR{
        str = $0 "\n"
        for(i=1; i <= length(arr); i++){
          regx = arr[i] "[[:space:]?|\n]"
          if(str ~ regx) break;
        }
        if(i > length(arr)) print $0
  }
' text.exp file1
hope to help u!

.Aaron
  #3 (permalink)  
Old 07-19-2008
kaushys kaushys is offline
Registered User
  
 

Join Date: Mar 2006
Posts: 22
I am not sure how it works but looks like this could help me. Thank you for such a short turnaround. I will check the awk script and see how it works.

Just as a doubt can't I use a while read do loop and do the same thing. Something like this...

cat tem.exp | while read exception
do
cat file | grep -iv $exception
done


I know the above code is not right. But thats just a basic idea. I dont know how to remove a para in the first place. And then I need to do it in a loop so that I parse each line in the exception file.
  #4 (permalink)  
Old 07-19-2008
drl's Avatar
drl drl is offline Forum Advisor  
Registered User
  
 

Join Date: Apr 2007
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 712
Hi.

This is a long solution, but modular for generality. It consists primarily of 2 perl scripts. The first reads paragraphs and creates single lines of them, using a character for the newline (I used "%"). The second perl script does the opposite, takes the long "%"-embedded lines, and creates separate lines.

With those two on the outside, we can manipulate the file as we wish with line-oriented *nix tools. In this demonstration a grep is used to eliminate paragraphs (long lines) matching the phrases you wish.

Here are the perl scripts:
Code:
#!/usr/bin/perl

# @(#) p1       Demonstrate paragraphs into lines, substitution for newline.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;

my($FAKE_RS) = "%";

# read paragraphs
$/ = "\n\n";

while ( <> ) {
        s/\n/$FAKE_RS/msg;
        print "$_\n";
}

exit(0);
and:
Code:
#!/usr/bin/perl

# @(#) p2       Demonstrate read lines, substituting newline.

use warnings;
use strict;

my($debug);
$debug = 0;
$debug = 1;

my($FAKE_RS) = "%";

while ( <> ) {
        s/$FAKE_RS$FAKE_RS/\n/msg;
        s/$FAKE_RS/\n/msg;
        print "$_";
}

exit(0);
These can be driven by a shell script:
Code:
#!/bin/bash -

# @(#) s1       Demonstrate pipeline for paragraph matching.

echo
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version =o $(_eat $0 $1)
set -o nounset
echo

FILE1=data1
FILE2=data2

echo
echo " Data file $FILE1:"
cat $FILE1

echo
echo " Data file $FILE2:"
cat $FILE2

echo
echo " Results:"
./p1 $FILE1 |
tee t1 |
grep -v -f $FILE2 |
tee t2 |
./p2

exit 0
Producing:
Code:
% ./s1

(Versions displayed with local utility "version")
Linux 2.6.11-x1
GNU bash 2.05b.0


 Data file data1:
Switch not possible Error code 1234
Process number 678

Log not available Error code 567
Process number 874

Log not available Error code 333
Process number 34

Log not available Error code 33334234
Process number 012

Log not available Error code 333 hello
Process number 012

Log not available Error code 567
Process number 8743434

Log not available Error code 567
Process number 874 ok

Log not available Error code 999
Process number 777 missing

 Data file data2:
Error code 1234
Error code 333
Process number 874%

 Results:
Log not available Error code 567
Process number 8743434

Log not available Error code 567
Process number 874 ok

Log not available Error code 999
Process number 777 missing
The patterns to be excluded could contain a "%" to force exclusion of patterns ending at the point corresponding to a newline, as with pattern 3. If the character "%" occurs in either data file, then changes would need to be made to the perl scripts and the pattern data file. However, the manipulation is generally unaware that it is operating on paragraphs -- it sees everything as just a line, a long line to be sure, but just a line. This places the complexity outside the scope of the real operation you wish to perform.

The intermediate files t1 and t2 may be viewed to see in more detail how the process works. The tee commands may be removed when desired -- just delete the lines, that's why they are on separate lines in the pipeline.

The paragraphs must be separated by empty lines, no spaces, TABs, etc. are allowed, only a newline.

See man pages for details ... cheers, drl
  #5 (permalink)  
Old 07-21-2008
summer_cherry summer_cherry is offline Forum Advisor  
Registered User
  
 

Join Date: Jun 2007
Location: Beijing China
Posts: 1,088
awk:

Code:
nawk '{
if (NR==FNR){arr[$NF]++}
else{
if (FNR%3==1)
{
        if($NF in arr){flag=1}
        else{t=$0}
}
else if(FNR%3==2){
	if($NF in arr || flag==1){flag=0;next}
        else{print t;print $0}
}
else
        next
}
}' text.exp file

perl:

Code:
open FH,"<text.exp";
while(<FH>){
	@arr=split(" ",$_);
	%hash=(%hash,$arr[$#arr],$.);
}
close(FH);
open FH,"<file";
while(<FH>){
	@temp=split(" ",$_);
	$t=$temp[$#temp];
	$line=$_ if (!exists($hash{$t}) && $.%3==1);
	if(!exists($hash{$t}) && $.%3==2){
		if(length($line)>0){
			push(@res,$line);
			push(@res,$_);
		}
	}
	if($,%3==0){
		next;
	}
}
foreach (@res){
print;
}
close(FH);

Last edited by summer_cherry; 07-21-2008 at 04:57 AM..
  #6 (permalink)  
Old 07-22-2008
kaushys kaushys is offline
Registered User
  
 

Join Date: Mar 2006
Posts: 22
Thank you so much. This was really easy with so many options.
Closed Thread

Bookmarks

Tags
linux, linux commands, solaris

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT -4. The time now is 05:42 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited. Language Translations Powered by .
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios
The UNIX and Linux Forums Content Copyright ©1993-2009. All Rights Reserved.Ad Management by RedTyger

Content Relevant URLs by vBSEO 3.2.0