Extract if pattern matches


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract if pattern matches
# 15  
Old 10-22-2007
Quote:
Originally Posted by Raynon
Hi Summer,

Thanks for your code. But we do not know the value of 2nd field (which is XXX) in the 1st place so your code can't apply here.

Hi GhostDog,

I have a little problem here. I have added some more data to my input file (highlighted in blue).
If the 2nd field of last occurence of last occurence of this pattern " ** abc ccc cc cc cc cc 2007 " does not start with " XX ", then the below will be output (that is only the very last portion of the block which matches the pattern will be printed out)

Can you help ?

Input:

wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1
** abc ccc cc cc cc cc 2007
0001 k= 1
wwwwww
0002 k= 1
0003 k= 1


Output:

** abc ccc cc cc cc cc 2007
0001 k= 1
0002 k= 1
0003 k= 1
Hi GhostDOg,

Seems that i am pretty near towards my target.
But there's still a contraint. If the term " ** abc ccc ccc cc cc ccc 2007 " occurs more than 2 times, all the 2nd blocks onwards will be outputted because of these 2 statements.
occur++;
if (occur > 1) print;

Is there any way i could find out the last number of the " occur " variable and make sure that only the last occurence will be printed out ?

Code:
FNR==NR&&/^\*\*/{line=$2; CODE = substr ($2,1,2); next}

FNR != NR && $0 ~ line {
      print 
      flag=1
     }
     flag == 1 && $0 ~ /^\*\*/ && CODE == "XX"{ 
       if($2 !~ line) flag=0
     }
     flag == 1 && $2 == "k="{print}


FNR != NR && $2 ~ line && CODE != "XX"  {
      flag=2;
      occur++;
      if (occur > 1)  print;
     }
      flag==2 && occur > 1 && $2 == "k=" { print }


Last edited by Raynon; 10-22-2007 at 05:42 AM..
# 16  
Old 10-22-2007
So "XXX" is actually what you want to get?
Code:
awk 'FNR==NR&&/^\*\*/&&$2=="XXX"{line=$2;next}
     FNR!=NR&&$0~line{
      print 
      f=1
     }
     f&&$0~/^\*\*/{ 
       if($2 !~ line) f=0
     }
     f&&$2=="k="{print}
' "file" "file"

output:
Code:
# ./testnew.sh
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

# 17  
Old 10-22-2007
Let me illustrate with 2 examples.

Scenerio 1:
Here the last occurence of the pattern is " ** XXX ccc ccc cc cc ccc 2007 ". In this pattern, 2nd field is " XXX ". Since first 2 characters of the 2nd field are " XX ", it will print out all occurence of such patterns and lines containing " k= " appearing after this pattern.

Input:
wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1

Output:

** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

Scenerio 2:
Here the last occurence of the pattern is " ** abc ccc cc cc cc cc 2007 ". In this pattern, 2nd field is " abc ". Since first 2 characters of the 2nd field DOES NOT match " XX ", it will only print out last occurrence of this pattern and of lines containing " k= " appearing after this pattern.

Input:

wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1
** abc ccc cc cc cc cc 2007
0001 k= 1
wwwwww
0002 k= 1
0003 k= 1

Output:

** abc ccc cc cc cc cc 2007
0001 k= 1
0002 k= 1
0003 k= 1


My below code actually does the trick but if the number of occurrence of " ** abc ccc cc cc cc cc 2007 " appear more than 2 times in the input file. It would not work any more. Pls help me.

Code:
FNR==NR&&/^\*\*/{line=$2; CODE = substr ($2,1,2); next}

FNR != NR && $0 ~ line {
      print 
      flag=1
     }
     flag == 1 && $0 ~ /^\*\*/ && CODE == "XX"{ 
       if($2 !~ line) flag=0
     }
     flag == 1 && $2 == "k="{print}


FNR != NR && $2 ~ line && CODE != "XX"  {
      flag=2;
      occur++;
      if (occur > 1)  print;
     }
      flag==2 && occur > 1 && $2 == "k=" { print }

# 18  
Old 10-22-2007
if you run the amended coded i posted in #16, what happens?
# 19  
Old 10-22-2007
Hi.
Quote:
Originally Posted by Raynon
Since first 2 characters of the 2nd field DOES NOT match " XX " ...
I thought that I understood the requirements about matching the second field of the "**" lines, but now you're writing about specifically matching (only) 2 characters. This is confusing to me ... cheers, drl
# 20  
Old 10-22-2007
Hi GhostDog,

It all depends on what the input is as i mention earlier.

Your code has an output below when using the input (file2) from scenerio2.
This output will be correct if you are using input(file1) from scenerio1.

% nawk -f awking file2 file2
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1


But i would be expecting the below for input (file2) from scenerio2.

** abc ccc cc cc cc cc 2007
0001 k= 1
0002 k= 1
0003 k= 1
# 21  
Old 10-29-2007
Hi GhostDog,

I am not sure if you understand my requirement. Any way the below code can work. It's just that i would only want to print the very last block if the condition fits. The code in blue need some adjustment has the assumption that the block that matches the pattern will occur twice, however we can't make that assumption since the block can appear more than twice in the input
The block i am refering to is the below in purple.
Can you help ?

** abc ccc cc cc cc cc 2007
0001 k= 1
wwwwww
0002 k= 1
0003 k= 1



FNR != NR && $2 ~ line && CODE != "XX" {
flag=2;
occur++;
if (occur > 1) print;
}
flag==2 && occur > 1 && $2 == "k=" { print }



Code:
FNR==NR&&/^\*\*/{line=$2; CODE = substr ($2,1,2); next}

FNR != NR && $0 ~ line {
      print 
      flag=1
     }
     flag == 1 && $0 ~ /^\*\*/ && CODE == "XX"{ 
       if($2 !~ line) flag=0
     }
     flag == 1 && $2 == "k="{print}


FNR != NR && $2 ~ line && CODE != "XX"  {
      flag=2;
      occur++;
      if (occur > 1)  print;
     }
      flag==2 && occur > 1 && $2 == "k=" { print }

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract the whole set if a pattern matches

Hi, I have to extract the whole set if a pattern matches.i have a file called input.txt input.txt ------------ CREATE TABLE ABC ( A, B, C ); CREATE TABLE XYZ ( X, Y, Z, P, Q ); (6 Replies)
Discussion started by: raju2016
6 Replies

2. Shell Programming and Scripting

How to get a 1st line which matches the particular pattern?

Hi all, I have file on which I do grep on "/tmp/data" then I get 5 lines as dir Path: /tmp/data/20162343134 Starting to listen on ports logging: -- Moving results files from local storage: /tmp/resultsFiles/20162343134/*.gz to NFS: /data/temp/20162343134/outgoing from above got to get... (7 Replies)
Discussion started by: girijajoshi
7 Replies

3. Shell Programming and Scripting

Insert tags which matches the pattern

Hi Guys, How to achieve this in awk or sed: Patterns: A.B. No. T-8346 or A.B. No. T-8xxx will look like this: Patterns: A.B. No. T-8346<br> or A.B. No. T-8xxx<br> #cat file.txt JHON VS. PETER, AGOO PET. How Old Are Youthe file will look like this: A.B. No. T-8346<br> January 01,... (10 Replies)
Discussion started by: lxdorney
10 Replies

4. Shell Programming and Scripting

Extract all the sentences from a text file that matches a pattern list

Hi I have a big text file. I want to extract all the sentences that matches at least 70% (seventy percent) of the words from each sentence based on a word list called A. Say the format of the text file is as given below: This is the first sentence which consists of fifteen words... (4 Replies)
Discussion started by: my_Perl
4 Replies

5. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

6. Shell Programming and Scripting

awk with range but matches pattern

To match range, the command is: awk '/BEGIN/,/END/' but what I want is the range is printed only if there is additional pattern that matches in the range itself? maybe like this: awk '/BEGIN/,/END/ if only in that range there is /pattern/' Thanks (8 Replies)
Discussion started by: zorrox
8 Replies

7. Shell Programming and Scripting

Extract columns where header matches a given string

Hi, I'm having trouble pulling out columns where the headers match a file of key ID's I'm interested in and was looking for some help. file1.txt I Name 34 56 84 350 790 1215 1919 7606 9420 file2.txt I Name 1 1 2 2 3 3 ... 34 34... 56 56... 84 84... 350 350... M 1 A A A A... (20 Replies)
Discussion started by: flotsam
20 Replies

8. Shell Programming and Scripting

Remove if the above line matches pattern

but keep if does not I have a file: --> my.out foo: bar foo: moo blarg i am on vacation foo: goose foo: lucy foo: moose foo: stucky groover@monkey.org foo: bozo grimace@gonzo.net dear sir - blargo blargo foo: goon foo: sloppy foo: saudi gimme gimme gimme (3 Replies)
Discussion started by: spacegoose
3 Replies

9. Shell Programming and Scripting

get value that matches file name pattern

Hi I have files with names that contain the date in several formats as, YYYYMMDD, DD-MM-YY,DD.MM.YY or similar combinations. I know if a file fits in one pattern or other, but i donīt know how to extract the substring contained in the file that matches the pattern. For example, i know that ... (1 Reply)
Discussion started by: pjrm
1 Replies

10. Shell Programming and Scripting

awk to count pattern matches

i have an awk statement which i am using to count the number of occurences of the number ,5, in the file: awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}' i know there is a total of 10 matches..what is wrong here? thanks (16 Replies)
Discussion started by: npatwardhan
16 Replies
Login or Register to Ask a Question