Extract if pattern matches


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract if pattern matches
# 1  
Old 10-19-2007
Extract if pattern matches

Hi All,

I have an input below. I tried to use the awk below but it seems that it ;s not working. Can anybody help ?
My concept here is to find the 2nd field of the last occurrence of such pattern " ** XXX ccc ccc cc cc ccc 2007 " . In this case, the 2nd field is " XXX ". With this "XXX" term stored as a variable, i want to print out the all lines with 2nd field having " XXX " term and its subsequent lines containing terms matching with " k= ". Expected output are highlighted as bold red in the input.

Input:

wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1


Output:

** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1


My AWK code:
Code:
$NF == "2007" && $1 == "**" && NF == "8" {Field2 = $2}

$1 == "**" && $8 == "2007" && $2 == Field2   {
print ;
flag = 1;
next;
}
flag == 1 && $2 ~ /k=/ {print}

$1 == "**" && $8 == "2007" && $2 != Field2 {flag = 0}

# 2  
Old 10-19-2007
Hi All,

Actually my main problem is to assign the last occurrence of 2nd field which follows this pattern " ** XXX ccc ccc cc cc ccc 2007 ". I could have done it with the END option shown below but i can't because i need to use the Field2 variable in the coming lines. Can anybody help ?

{$NF == "2007" && $1 == "**" && NF == "8"} END {Field2 = $2}
# 3  
Old 10-19-2007
Hi.

I often think that awk's automatic read can get in the way (perhaps it just gets in the way of my thought processes).

Here's a perl script that produces your specified output from the given input:
Code:
#!/usr/bin/perl

# @(#) p1       Demonstrate extraction after pattern match.

use warnings;
use strict;

my ($debug);
$debug = 1;
$debug = 0;

my ($lines) = 0;

my (@a);

# Read until ** XXX, then turn over control to function to scan
# for other pattern.

while (<>) {
  $lines++;
  chomp;
  @a = split;
  if ( $a[0] eq "**" && $a[1] eq "XXX" ) {
    print " Found XXX line at $.\n" if $debug;
    print "$_\n";
    last if not extract_k();
  }
}

print STDERR " ( Lines read: $lines )\n";

# Extract k= lines until line with "**".

sub extract_k {
  my (@a);
  while (<>) {
    chomp();
    @a = split;
    return 1 if $a[0] eq "**";    # not EOF
    print "$_\n" if /k=/;
  }
  return 0;                       # EOF
}

exit(0);

Running against data in file data1:
Code:
% ./p1 data1
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1
 ( Lines read: 13 )

This makes the assumption that the ** lines alternate; more work will be necessary if that's wrong ... cheers, drl
# 4  
Old 10-19-2007
Hi drl,

I got the following output.
I rename your perl code as "myperl" and input file as "input"
But it seems to have some problem ? Can you give some guidance?

Code:
$ perl myperl input
 ( Lines read: 24 )

# 5  
Old 10-19-2007
Code:
awk 'FNR==NR&&/^\*\*/{line=$2;next}
     FNR!=NR&&$0~line{
      print 
      f=1
     }
     f&&$0~/^\*\*/{ 
       if($2 !~ line) f=0
     }
     f&&$2=="k="{print}
' "file" "file"

output;
Code:
# ./test.sh
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

# 6  
Old 10-20-2007
Hi, Raynon.

No, I cannot reproduce your failed result with perl code p1. I have amended and extended the code (calling it p2), added a few lines to the data to make sure that consecutive ** XXX line series will be handled, and ran it as you did:
Code:
#!/usr/bin/perl

# @(#) p2       Demonstrate extraction after pattern match.

use warnings;
use strict;

my ($debug);
$debug = 1;
$debug = 0;

our ($lines) = 0;

my (@a);

# Read until ** XXX, then turn over control to function to scan
# for other pattern.

while (<>) {
  $lines++;
  chomp;
  @a = split;
  if ( $a[0] eq "**" && $a[1] eq "XXX" ) {
    print " Found XXX line at $.\n" if $debug;
    print "$_\n";

    # last if not extract_k();
    $_ = extract_k();
    if ( not $_ ) {
      last;
    }
    else {
      print " cycling with line $. ", $_ if $debug;
      redo;
    }
  }
}

print STDERR " ( Lines read: $lines )\n";

# Extract k= lines until line with "**".

sub extract_k {
  our ($lines);
  my (@a);
  while (<>) {
    $lines++;
    chomp();
    @a = split;
    return "$_\n" if $a[0] eq "**";    # not EOF
    print "$_\n" if /k=/;
  }
  return 0;                            # EOF
}

exit(0);

Producing:
Code:
% perl p2 data2
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1
** XXX ccc ccc cc cc ccc 2007
0006 k= 1
0007 k= 1
 ( Lines read: 32 )

If you cannot get my code to work, then it looks like the awk script from ghostdog74 will work -- and it's far shorter than the perl code.

Best wishes ... cheers, drl
# 7  
Old 10-20-2007
Quote:
Originally Posted by ghostdog74
Code:
awk 'FNR==NR&&/^\*\*/{line=$2;next}
     FNR!=NR&&$0~line{
      print 
      f=1
     }
     f&&$0~/^\*\*/{ 
       if($2 !~ line) f=0
     }
     f&&$2=="k="{print}
' "file" "file"

output;
Code:
# ./test.sh
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

Hi GhostDog,

Your code work!!
But i don't really understand about the FNR = NR statement. Can you help me understand that ?
And also why is there a need to have 2 identical input file for this awk code?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract the whole set if a pattern matches

Hi, I have to extract the whole set if a pattern matches.i have a file called input.txt input.txt ------------ CREATE TABLE ABC ( A, B, C ); CREATE TABLE XYZ ( X, Y, Z, P, Q ); (6 Replies)
Discussion started by: raju2016
6 Replies

2. Shell Programming and Scripting

How to get a 1st line which matches the particular pattern?

Hi all, I have file on which I do grep on "/tmp/data" then I get 5 lines as dir Path: /tmp/data/20162343134 Starting to listen on ports logging: -- Moving results files from local storage: /tmp/resultsFiles/20162343134/*.gz to NFS: /data/temp/20162343134/outgoing from above got to get... (7 Replies)
Discussion started by: girijajoshi
7 Replies

3. Shell Programming and Scripting

Insert tags which matches the pattern

Hi Guys, How to achieve this in awk or sed: Patterns: A.B. No. T-8346 or A.B. No. T-8xxx will look like this: Patterns: A.B. No. T-8346<br> or A.B. No. T-8xxx<br> #cat file.txt JHON VS. PETER, AGOO PET. How Old Are Youthe file will look like this: A.B. No. T-8346<br> January 01,... (10 Replies)
Discussion started by: lxdorney
10 Replies

4. Shell Programming and Scripting

Extract all the sentences from a text file that matches a pattern list

Hi I have a big text file. I want to extract all the sentences that matches at least 70% (seventy percent) of the words from each sentence based on a word list called A. Say the format of the text file is as given below: This is the first sentence which consists of fifteen words... (4 Replies)
Discussion started by: my_Perl
4 Replies

5. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

6. Shell Programming and Scripting

awk with range but matches pattern

To match range, the command is: awk '/BEGIN/,/END/' but what I want is the range is printed only if there is additional pattern that matches in the range itself? maybe like this: awk '/BEGIN/,/END/ if only in that range there is /pattern/' Thanks (8 Replies)
Discussion started by: zorrox
8 Replies

7. Shell Programming and Scripting

Extract columns where header matches a given string

Hi, I'm having trouble pulling out columns where the headers match a file of key ID's I'm interested in and was looking for some help. file1.txt I Name 34 56 84 350 790 1215 1919 7606 9420 file2.txt I Name 1 1 2 2 3 3 ... 34 34... 56 56... 84 84... 350 350... M 1 A A A A... (20 Replies)
Discussion started by: flotsam
20 Replies

8. Shell Programming and Scripting

Remove if the above line matches pattern

but keep if does not I have a file: --> my.out foo: bar foo: moo blarg i am on vacation foo: goose foo: lucy foo: moose foo: stucky groover@monkey.org foo: bozo grimace@gonzo.net dear sir - blargo blargo foo: goon foo: sloppy foo: saudi gimme gimme gimme (3 Replies)
Discussion started by: spacegoose
3 Replies

9. Shell Programming and Scripting

get value that matches file name pattern

Hi I have files with names that contain the date in several formats as, YYYYMMDD, DD-MM-YY,DD.MM.YY or similar combinations. I know if a file fits in one pattern or other, but i donīt know how to extract the substring contained in the file that matches the pattern. For example, i know that ... (1 Reply)
Discussion started by: pjrm
1 Replies

10. Shell Programming and Scripting

awk to count pattern matches

i have an awk statement which i am using to count the number of occurences of the number ,5, in the file: awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}' i know there is a total of 10 matches..what is wrong here? thanks (16 Replies)
Discussion started by: npatwardhan
16 Replies
Login or Register to Ask a Question