Visit Our UNIX and Linux User Community


Extract if pattern matches


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Extract if pattern matches
# 1  
Old 10-19-2007
Extract if pattern matches

Hi All,

I have an input below. I tried to use the awk below but it seems that it ;s not working. Can anybody help ?
My concept here is to find the 2nd field of the last occurrence of such pattern " ** XXX ccc ccc cc cc ccc 2007 " . In this case, the 2nd field is " XXX ". With this "XXX" term stored as a variable, i want to print out the all lines with 2nd field having " XXX " term and its subsequent lines containing terms matching with " k= ". Expected output are highlighted as bold red in the input.

Input:

wwwwww
0999 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
wwwwww
0001 k= 1
wwwwww
0002 k= 1
** abc ccc cc cc cc cc 2007
wwwwww
0001 k= 1
wwwwww
0002 k= 1
wwwwww
wwwwww
0003 k= 1
wwwwww
** XXX ccc ccc cc cc ccc 2007
wwwwww
0003 k= 1
wwwwww
0004 k= 1
0005 k= 1


Output:

** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1


My AWK code:
Code:
$NF == "2007" && $1 == "**" && NF == "8" {Field2 = $2}

$1 == "**" && $8 == "2007" && $2 == Field2   {
print ;
flag = 1;
next;
}
flag == 1 && $2 ~ /k=/ {print}

$1 == "**" && $8 == "2007" && $2 != Field2 {flag = 0}

# 2  
Old 10-19-2007
Hi All,

Actually my main problem is to assign the last occurrence of 2nd field which follows this pattern " ** XXX ccc ccc cc cc ccc 2007 ". I could have done it with the END option shown below but i can't because i need to use the Field2 variable in the coming lines. Can anybody help ?

{$NF == "2007" && $1 == "**" && NF == "8"} END {Field2 = $2}
# 3  
Old 10-19-2007
Hi.

I often think that awk's automatic read can get in the way (perhaps it just gets in the way of my thought processes).

Here's a perl script that produces your specified output from the given input:
Code:
#!/usr/bin/perl

# @(#) p1       Demonstrate extraction after pattern match.

use warnings;
use strict;

my ($debug);
$debug = 1;
$debug = 0;

my ($lines) = 0;

my (@a);

# Read until ** XXX, then turn over control to function to scan
# for other pattern.

while (<>) {
  $lines++;
  chomp;
  @a = split;
  if ( $a[0] eq "**" && $a[1] eq "XXX" ) {
    print " Found XXX line at $.\n" if $debug;
    print "$_\n";
    last if not extract_k();
  }
}

print STDERR " ( Lines read: $lines )\n";

# Extract k= lines until line with "**".

sub extract_k {
  my (@a);
  while (<>) {
    chomp();
    @a = split;
    return 1 if $a[0] eq "**";    # not EOF
    print "$_\n" if /k=/;
  }
  return 0;                       # EOF
}

exit(0);

Running against data in file data1:
Code:
% ./p1 data1
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1
 ( Lines read: 13 )

This makes the assumption that the ** lines alternate; more work will be necessary if that's wrong ... cheers, drl
# 4  
Old 10-19-2007
Hi drl,

I got the following output.
I rename your perl code as "myperl" and input file as "input"
But it seems to have some problem ? Can you give some guidance?

Code:
$ perl myperl input
 ( Lines read: 24 )

# 5  
Old 10-20-2007
Code:
awk 'FNR==NR&&/^\*\*/{line=$2;next}
     FNR!=NR&&$0~line{
      print 
      f=1
     }
     f&&$0~/^\*\*/{ 
       if($2 !~ line) f=0
     }
     f&&$2=="k="{print}
' "file" "file"

output;
Code:
# ./test.sh
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

# 6  
Old 10-20-2007
Hi, Raynon.

No, I cannot reproduce your failed result with perl code p1. I have amended and extended the code (calling it p2), added a few lines to the data to make sure that consecutive ** XXX line series will be handled, and ran it as you did:
Code:
#!/usr/bin/perl

# @(#) p2       Demonstrate extraction after pattern match.

use warnings;
use strict;

my ($debug);
$debug = 1;
$debug = 0;

our ($lines) = 0;

my (@a);

# Read until ** XXX, then turn over control to function to scan
# for other pattern.

while (<>) {
  $lines++;
  chomp;
  @a = split;
  if ( $a[0] eq "**" && $a[1] eq "XXX" ) {
    print " Found XXX line at $.\n" if $debug;
    print "$_\n";

    # last if not extract_k();
    $_ = extract_k();
    if ( not $_ ) {
      last;
    }
    else {
      print " cycling with line $. ", $_ if $debug;
      redo;
    }
  }
}

print STDERR " ( Lines read: $lines )\n";

# Extract k= lines until line with "**".

sub extract_k {
  our ($lines);
  my (@a);
  while (<>) {
    $lines++;
    chomp();
    @a = split;
    return "$_\n" if $a[0] eq "**";    # not EOF
    print "$_\n" if /k=/;
  }
  return 0;                            # EOF
}

exit(0);

Producing:
Code:
% perl p2 data2
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1
** XXX ccc ccc cc cc ccc 2007
0006 k= 1
0007 k= 1
 ( Lines read: 32 )

If you cannot get my code to work, then it looks like the awk script from ghostdog74 will work -- and it's far shorter than the perl code.

Best wishes ... cheers, drl
# 7  
Old 10-20-2007
Quote:
Originally Posted by ghostdog74
Code:
awk 'FNR==NR&&/^\*\*/{line=$2;next}
     FNR!=NR&&$0~line{
      print 
      f=1
     }
     f&&$0~/^\*\*/{ 
       if($2 !~ line) f=0
     }
     f&&$2=="k="{print}
' "file" "file"

output;
Code:
# ./test.sh
** XXX ccc ccc cc cc ccc 2007
0001 k= 1
0002 k= 1
** XXX ccc ccc cc cc ccc 2007
0003 k= 1
0004 k= 1
0005 k= 1

Hi GhostDog,

Your code work!!
But i don't really understand about the FNR = NR statement. Can you help me understand that ?
And also why is there a need to have 2 identical input file for this awk code?

Previous Thread | Next Thread
Test Your Knowledge in Computers #586
Difficulty: Medium
Making many function calls can be costly in terms of stack space. One optimization that can be made is to use tail recursion.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Extract the whole set if a pattern matches

Hi, I have to extract the whole set if a pattern matches.i have a file called input.txt input.txt ------------ CREATE TABLE ABC ( A, B, C ); CREATE TABLE XYZ ( X, Y, Z, P, Q ); (6 Replies)
Discussion started by: raju2016
6 Replies

2. Shell Programming and Scripting

How to get a 1st line which matches the particular pattern?

Hi all, I have file on which I do grep on "/tmp/data" then I get 5 lines as dir Path: /tmp/data/20162343134 Starting to listen on ports logging: -- Moving results files from local storage: /tmp/resultsFiles/20162343134/*.gz to NFS: /data/temp/20162343134/outgoing from above got to get... (7 Replies)
Discussion started by: girijajoshi
7 Replies

3. Shell Programming and Scripting

Insert tags which matches the pattern

Hi Guys, How to achieve this in awk or sed: Patterns: A.B. No. T-8346 or A.B. No. T-8xxx will look like this: Patterns: A.B. No. T-8346<br> or A.B. No. T-8xxx<br> #cat file.txt JHON VS. PETER, AGOO PET. How Old Are Youthe file will look like this: A.B. No. T-8346<br> January 01,... (10 Replies)
Discussion started by: lxdorney
10 Replies

4. Shell Programming and Scripting

Extract all the sentences from a text file that matches a pattern list

Hi I have a big text file. I want to extract all the sentences that matches at least 70% (seventy percent) of the words from each sentence based on a word list called A. Say the format of the text file is as given below: This is the first sentence which consists of fifteen words... (4 Replies)
Discussion started by: my_Perl
4 Replies

5. Shell Programming and Scripting

Blocks of text in a file - extract when matches...

I sat down yesterday to write this script and have just realised that my methodology is broken........ In essense I have..... ----------------------------------------------------------------- (This line really is in the file) Service ID: 12345 ... (7 Replies)
Discussion started by: Bashingaway
7 Replies

6. Shell Programming and Scripting

awk with range but matches pattern

To match range, the command is: awk '/BEGIN/,/END/' but what I want is the range is printed only if there is additional pattern that matches in the range itself? maybe like this: awk '/BEGIN/,/END/ if only in that range there is /pattern/' Thanks (8 Replies)
Discussion started by: zorrox
8 Replies

7. Shell Programming and Scripting

Extract columns where header matches a given string

Hi, I'm having trouble pulling out columns where the headers match a file of key ID's I'm interested in and was looking for some help. file1.txt I Name 34 56 84 350 790 1215 1919 7606 9420 file2.txt I Name 1 1 2 2 3 3 ... 34 34... 56 56... 84 84... 350 350... M 1 A A A A... (20 Replies)
Discussion started by: flotsam
20 Replies

8. Shell Programming and Scripting

Remove if the above line matches pattern

but keep if does not I have a file: --> my.out foo: bar foo: moo blarg i am on vacation foo: goose foo: lucy foo: moose foo: stucky groover@monkey.org foo: bozo grimace@gonzo.net dear sir - blargo blargo foo: goon foo: sloppy foo: saudi gimme gimme gimme (3 Replies)
Discussion started by: spacegoose
3 Replies

9. Shell Programming and Scripting

get value that matches file name pattern

Hi I have files with names that contain the date in several formats as, YYYYMMDD, DD-MM-YY,DD.MM.YY or similar combinations. I know if a file fits in one pattern or other, but i donīt know how to extract the substring contained in the file that matches the pattern. For example, i know that ... (1 Reply)
Discussion started by: pjrm
1 Replies

10. Shell Programming and Scripting

awk to count pattern matches

i have an awk statement which i am using to count the number of occurences of the number ,5, in the file: awk '/,5,/ {count++}' TRY.txt | awk 'END { printf(" Total parts: %d",count)}' i know there is a total of 10 matches..what is wrong here? thanks (16 Replies)
Discussion started by: npatwardhan
16 Replies

Featured Tech Videos