Extract if pattern matches

10-22-2007

Registered User

353, 2

Join Date: Sep 2006

Last Activity: 18 April 2011, 3:56 AM EDT

Location: Sg

Posts: 353

Thanks Given: 0

Thanked 2 Times in 2 Posts

Quote:

Originally Posted by Raynon

Hi Summer,

Thanks for your code. But we do not know the value of 2nd field (which is XXX) in the 1st place so your code can't apply here.

Hi GhostDog,

I have a little problem here. I have added some more data to my input file (highlighted in blue).
If the 2nd field of last occurence of last occurence of this pattern " ** abc ccc cc cc cc cc 2007 " does not start with " XX ", then the below will be output (that is only the very last portion of the block which matches the pattern will be printed out)

Can you help ?

Input:

wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1
** abc ccc cc cc cc cc 2007
0001 k= 1
wwwwww
0002 k= 1
0003 k= 1

Output:

** abc ccc cc cc cc cc 2007
0001 k= 1
0002 k= 1
0003 k= 1

Hi GhostDOg,

Seems that i am pretty near towards my target.
But there's still a contraint. If the term " ** abc ccc ccc cc cc ccc 2007 " occurs more than 2 times, all the 2nd blocks onwards will be outputted because of these 2 statements.
occur++;
if (occur > 1) print;
Is there any way i could find out the last number of the " occur " variable and make sure that only the last occurence will be printed out ?

Code:

FNR==NR&&/^\*\*/{line=$2; CODE = substr ($2,1,2); next}

FNR != NR && $0 ~ line {
      print 
      flag=1
     }
     flag == 1 && $0 ~ /^\*\*/ && CODE == "XX"{ 
       if($2 !~ line) flag=0
     }
     flag == 1 && $2 == "k="{print}


FNR != NR && $2 ~ line && CODE != "XX"  {
      flag=2;
      occur++;
      if (occur > 1)  print;
     }
      flag==2 && occur > 1 && $2 == "k=" { print }

Last edited by Raynon; 10-22-2007 at 05:42 AM..

Raynon

View Public Profile for Raynon

Find all posts by Raynon

10-22-2007

Registered User

2,669, 20

Join Date: Sep 2006

Last Activity: 28 January 2015, 8:30 AM EST

Posts: 2,669

Thanks Given: 0

Thanked 20 Times in 20 Posts

So "XXX" is actually what you want to get?

Code:

awk 'FNR==NR&&/^\*\*/&&$2=="XXX"{line=$2;next}
     FNR!=NR&&$0~line{
      print 
      f=1
     }
     f&&$0~/^\*\*/{ 
       if($2 !~ line) f=0
     }
     f&&$2=="k="{print}
' "file" "file"

output:

Code:

# ./testnew.sh
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

10-22-2007

Registered User

353, 2

Join Date: Sep 2006

Last Activity: 18 April 2011, 3:56 AM EDT

Location: Sg

Posts: 353

Thanks Given: 0

Thanked 2 Times in 2 Posts

Let me illustrate with 2 examples.

Scenerio 1:
Here the last occurence of the pattern is " ** XXX ccc ccc cc cc ccc 2007 ". In this pattern, 2nd field is " XXX ". Since first 2 characters of the 2nd field are " XX ", it will print out all occurence of such patterns and lines containing " k= " appearing after this pattern.

Input:
wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1

Output:

** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

Scenerio 2:
Here the last occurence of the pattern is " ** abc ccc cc cc cc cc 2007 ". In this pattern, 2nd field is " abc ". Since first 2 characters of the 2nd field DOES NOT match " XX ", it will only print out last occurrence of this pattern and of lines containing " k= " appearing after this pattern.

Input:

wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1
** abc ccc cc cc cc cc 2007
0001 k= 1
wwwwww
0002 k= 1
0003 k= 1

Output:

** abc ccc cc cc cc cc 2007
0001 k= 1
0002 k= 1
0003 k= 1

My below code actually does the trick but if the number of occurrence of " ** abc ccc cc cc cc cc 2007 " appear more than 2 times in the input file. It would not work any more. Pls help me.

Code:

FNR==NR&&/^\*\*/{line=$2; CODE = substr ($2,1,2); next}

FNR != NR && $0 ~ line {
      print 
      flag=1
     }
     flag == 1 && $0 ~ /^\*\*/ && CODE == "XX"{ 
       if($2 !~ line) flag=0
     }
     flag == 1 && $2 == "k="{print}


FNR != NR && $2 ~ line && CODE != "XX"  {
      flag=2;
      occur++;
      if (occur > 1)  print;
     }
      flag==2 && occur > 1 && $2 == "k=" { print }

Raynon

View Public Profile for Raynon

Find all posts by Raynon

10-22-2007

Registered User

2,669, 20

Join Date: Sep 2006

Last Activity: 28 January 2015, 8:30 AM EST

Posts: 2,669

Thanks Given: 0

Thanked 20 Times in 20 Posts

if you run the amended coded i posted in #16, what happens?

ghostdog74

View Public Profile for ghostdog74

Find all posts by ghostdog74

10-22-2007

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Quote:

Originally Posted by Raynon

Since first 2 characters of the 2nd field DOES NOT match " XX " ...

I thought that I understood the requirements about matching the second field of the "**" lines, but now you're writing about specifically matching (only) 2 characters. This is confusing to me ... cheers, drl

drl

View Public Profile for drl

Find all posts by drl

10-22-2007

Registered User

353, 2

Join Date: Sep 2006

Last Activity: 18 April 2011, 3:56 AM EDT

Location: Sg

Posts: 353

Thanks Given: 0

Thanked 2 Times in 2 Posts

Hi GhostDog,

It all depends on what the input is as i mention earlier.

Your code has an output below when using the input (file2) from scenerio2.
This output will be correct if you are using input(file1) from scenerio1.

% nawk -f awking file2 file2
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

But i would be expecting the below for input (file2) from scenerio2.

** abc ccc cc cc cc cc 2007
0001 k= 1
0002 k= 1
0003 k= 1

Raynon

View Public Profile for Raynon

Find all posts by Raynon

10-29-2007

Registered User

353, 2

Join Date: Sep 2006

Last Activity: 18 April 2011, 3:56 AM EDT

Location: Sg

Posts: 353

Thanks Given: 0

Thanked 2 Times in 2 Posts

Hi GhostDog,

I am not sure if you understand my requirement. Any way the below code can work. It's just that i would only want to print the very last block if the condition fits. The code in blue need some adjustment has the assumption that the block that matches the pattern will occur twice, however we can't make that assumption since the block can appear more than twice in the input
The block i am refering to is the below in purple.
Can you help ?

** abc ccc cc cc cc cc 2007
0001 k= 1
wwwwww
0002 k= 1
0003 k= 1

FNR != NR && $2 ~ line && CODE != "XX" {
flag=2;
occur++;
if (occur > 1) print;
}
flag==2 && occur > 1 && $2 == "k=" { print }

Code:

FNR==NR&&/^\*\*/{line=$2; CODE = substr ($2,1,2); next}

FNR != NR && $0 ~ line {
      print 
      flag=1
     }
     flag == 1 && $0 ~ /^\*\*/ && CODE == "XX"{ 
       if($2 !~ line) flag=0
     }
     flag == 1 && $2 == "k="{print}


FNR != NR && $2 ~ line && CODE != "XX"  {
      flag=2;
      occur++;
      if (occur > 1)  print;
     }
      flag==2 && occur > 1 && $2 == "k=" { print }

Raynon

View Public Profile for Raynon

Find all posts by Raynon

Shell Programming and Scripting

Extract if pattern matches

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract the whole set if a pattern matches

Discussion started by: raju2016

2. Shell Programming and Scripting

How to get a 1st line which matches the particular pattern?

Discussion started by: girijajoshi

3. Shell Programming and Scripting

Insert tags which matches the pattern

Discussion started by: lxdorney

4. Shell Programming and Scripting

Extract all the sentences from a text file that matches a pattern list

Discussion started by: my_Perl

5. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

Discussion started by: Bashingaway

6. Shell Programming and Scripting

awk with range but matches pattern

Discussion started by: zorrox

7. Shell Programming and Scripting

Extract columns where header matches a given string

Discussion started by: flotsam

8. Shell Programming and Scripting

Remove if the above line matches pattern

Discussion started by: spacegoose

9. Shell Programming and Scripting

get value that matches file name pattern

Discussion started by: pjrm

10. Shell Programming and Scripting

awk to count pattern matches

Discussion started by: npatwardhan