awk extract strings matching multiple patterns

10-13-2013

Registered User

22, 0

Join Date: Feb 2010

Last Activity: 16 December 2016, 4:43 AM EST

Posts: 22

Thanks Given: 1

Thanked 0 Times in 0 Posts

awk extract strings matching multiple patterns

Hi,

I wasn't quite sure how to title this one! Here goes:

I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of fields and the exact format etc. All I know is I can look for certain patterns. An extract of the original source is:

Code:

Job <1>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes; 
Job <2>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn069>, -- The CPU time is 10 seconds. MEM: 1 Gbytes; 
Job <3>, Job Name <BLAH>,  MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn049>, ;-- The CPU time is 13 seconds. MEM: 2 Gbytes; 
Job <4>, Job Name <BLAH>,  Status <RUN>,  Command <-- The CPU time is 76 seconds. MEM: 3 Gbytes; 
Job <7>, Job Name <BLAH>,  Stat us <RUN>,  Command <-- The CPU time is 49 seconds. MEM: 1014 Mbytes; 
Job <8>, Job Name <BLAH> , Status <RUN>, -- MEMLIMIT 10 G Fri Oct 11 22:13:19: Started on <cn014>;-- The CPU time is 12 seconds. MEM: 391 Mbytes; 
Job <9>, Job Name <BLAH>,  Status <RUN >,  Command <: Started on <cn026>,-- The CPU time is 71 seconds. MEM: 13 Mbytes; 
Job <10>, Job Name <BLAH>,  Sta tus <RUN>,  Command <#!/bi-- MEMLIMIT 22 G  Started on <cn064>, -- The CPU time is 25 seconds. MEM: 12 Gbytes;

I want to extract based on:

Code:

Started on <____>,
MEMLIMIT __ G
MEM: ___ bytes;

The first line example being:

Code:

MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes;

Each line may contain all, some or none of the above. My ideal output based on the above would be something like:

Code:

Started: cn035 MEMLIMIT: 10 G MEM: 1 G
Started: cn069 MEMLIMIT: 10 G MEM: 1 G 
etc
etc

(ideally, if there is no MEMLIMIT found on a line for example):
Started: cn026 MEMLIMIT: 0 G MEM: 13 M

I've messed around with gsub in awk to extract a single instance but couldn't work out how to select on multiple patterns...

Any help as always would be appreciated!

Last edited by Scrutinizer; 10-13-2013 at 06:38 AM.. Reason: additional code tags

chrissycc

View Public Profile for chrissycc

Find all posts by chrissycc

10-13-2013

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Like this?

Code:

sed -n 's/.*\(MEMLIMIT [^ ]* [^ ]*\).*Started on <\([^>]*\).*\(MEM: [^ ]* .\).*/Started: \2 \1 \3/p' file

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

10-13-2013

Registered User

22, 0

Join Date: Feb 2010

Last Activity: 16 December 2016, 4:43 AM EST

Posts: 22

Thanks Given: 1

Thanked 0 Times in 0 Posts

Thanks for that Scrutinizer - so very close to what I need! If I've got it correct, it only displays if all three patterns are found, ideally it would be great if it could print every line with 1 or more matches:

Code:

Started: cn026 MEMLIMIT: 0 G MEM: 13 M

or just blank rather than 0 G on the MEMLIMIT. Basically every entry _should_ have a 'Started on' and a MEM:, but not necessarily a MEMLIMIT

Last edited by chrissycc; 10-13-2013 at 07:46 AM.. Reason: correction

chrissycc

View Public Profile for chrissycc

Find all posts by chrissycc

10-13-2013

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

If you are OK with Perl solution: put this into "script.pl":

Code:

#!/usr/bin/perl
use strict;
open I, "$ARGV[0]";
while (chomp($_=<I>)) {
  if (/Started on <([^>]+)/) {
    my $started=$1;
    my $memlimit=$1 if /MEMLIMIT (\d+) G/;
    $memlimit=$memlimit?$memlimit:0;
    /MEM: ([^;]+)/;
    my $mem=$1;
    print "Started: $started MEMLIMIT: $memlimit G MEM: $mem\n";
  }
}

Then run: perl script.pl file

bartus11

View Public Profile for bartus11

Find all posts by bartus11

10-13-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

straightforward awk:

Code:

awk     'match ($0, /Started on/)       {C++; X=substr ($0, RSTART+RLENGTH,10); gsub (/^.*<|>.*$/, "", X)}
         match ($0, /MEMLIMIT/)         {C++; Y=substr ($0, RSTART+RLENGTH,10); gsub (/^ |[^kMG]*$/, "", Y)}
         match ($0, /MEM:/)             {C++; Z=substr ($0, RSTART+RLENGTH,10);  sub (/[bB].*$/, "", Z)}
         C >=2                          {printf "Started: %s MEMLIMIT: %6s MEM: %6s\n", X, Y, Z}
                                        {C=X=Y=Z=0}
        ' file
Started: cn035 MEMLIMIT:   10 G MEM:    1 G
Started: cn069 MEMLIMIT:      0 MEM:    1 G
Started: cn049 MEMLIMIT:   10 G MEM:    2 G
Started: cn014 MEMLIMIT:   10 G MEM:  391 M
Started: cn026 MEMLIMIT:      0 MEM:   13 M
Started: cn064 MEMLIMIT:   22 G MEM:   12 G

RudiC

View Public Profile for RudiC

Find all posts by RudiC

10-13-2013

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

As RudiC pointed out, the following only works on Solaris:

Code:

/usr/xpg4/bin/awk '{
started=$0; if (!sub(".*Started on <([^>]*).*","\1",started)) started="-"
memlimit=$0; if (!sub(".*MEMLIMIT ([^ ]* [^ ;]*).*","\1",memlimit)) memlimit="-"
mem=$0; if (!sub(".*MEM: ([^ ]* [^ ;]*).*","\1",mem)) mem="-"
printf "Started on: %-8s MEMLIMIT: %-8s MEM: %-8s\n",started,memlimit,mem
}' file
Started on: cn035    MEMLIMIT: 10 G     MEM: 1 Gbytes
Started on: cn069    MEMLIMIT: 10 G     MEM: 1 Gbytes
Started on: cn049    MEMLIMIT: 10 G     MEM: 2 Gbytes
Started on: -        MEMLIMIT: -        MEM: 3 Gbytes
Started on: -        MEMLIMIT: -        MEM: 1014 Mbytes
Started on: cn014    MEMLIMIT: 10 G     MEM: 391 Mbytes
Started on: cn026    MEMLIMIT: -        MEM: 13 Mbytes
Started on: cn064    MEMLIMIT: 22 G     MEM: 12 Gbytes

Last edited by MadeInGermany; 10-13-2013 at 01:47 PM..

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

10-13-2013

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

@MadeInGermany: What awk- version are you using? Mine (mawk) takes "\1" as the "\001" character.

RudiC

View Public Profile for RudiC

Find all posts by RudiC

Shell Programming and Scripting

awk extract strings matching multiple patterns

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Discussion started by: bunny_merah19

2. UNIX for Beginners Questions & Answers

How to extract the partial matching strings among two files?

Discussion started by: dineshkumarsrk

3. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Discussion started by: wahi80

4. Shell Programming and Scripting

Extract multiple strings from line

Discussion started by: bombcan

5. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

Discussion started by: sameermohite

6. UNIX for Dummies Questions & Answers

[SOLVED] awk: matching degenerate patterns

Discussion started by: heecha

7. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

Discussion started by: genzo

8. UNIX for Dummies Questions & Answers

Search and extract matching patterns

Discussion started by: lucasvs

9. Shell Programming and Scripting

matching patterns inside a condition in awk

Discussion started by: kristinu

10. Shell Programming and Scripting

AWK: matching patterns in 2 different files

Discussion started by: asanjuan