awk extract strings matching multiple patterns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk extract strings matching multiple patterns
# 1  
Old 10-13-2013
awk extract strings matching multiple patterns

Hi,

I wasn't quite sure how to title this one! Here goes:

I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of fields and the exact format etc. All I know is I can look for certain patterns. An extract of the original source is:

Code:
Job <1>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes; 
Job <2>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn069>, -- The CPU time is 10 seconds. MEM: 1 Gbytes; 
Job <3>, Job Name <BLAH>,  MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn049>, ;-- The CPU time is 13 seconds. MEM: 2 Gbytes; 
Job <4>, Job Name <BLAH>,  Status <RUN>,  Command <-- The CPU time is 76 seconds. MEM: 3 Gbytes; 
Job <7>, Job Name <BLAH>,  Stat us <RUN>,  Command <-- The CPU time is 49 seconds. MEM: 1014 Mbytes; 
Job <8>, Job Name <BLAH> , Status <RUN>, -- MEMLIMIT 10 G Fri Oct 11 22:13:19: Started on <cn014>;-- The CPU time is 12 seconds. MEM: 391 Mbytes; 
Job <9>, Job Name <BLAH>,  Status <RUN >,  Command <: Started on <cn026>,-- The CPU time is 71 seconds. MEM: 13 Mbytes; 
Job <10>, Job Name <BLAH>,  Sta tus <RUN>,  Command <#!/bi-- MEMLIMIT 22 G  Started on <cn064>, -- The CPU time is 25 seconds. MEM: 12 Gbytes;

I want to extract based on:

Code:
Started on <____>,
MEMLIMIT __ G
MEM: ___ bytes;

The first line example being:
Code:
MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes;

Each line may contain all, some or none of the above. My ideal output based on the above would be something like:

Code:
Started: cn035 MEMLIMIT: 10 G MEM: 1 G
Started: cn069 MEMLIMIT: 10 G MEM: 1 G 
etc
etc

(ideally, if there is no MEMLIMIT found on a line for example):
Started: cn026 MEMLIMIT: 0 G MEM: 13 M

I've messed around with gsub in awk to extract a single instance but couldn't work out how to select on multiple patterns...

Any help as always would be appreciated!

Last edited by Scrutinizer; 10-13-2013 at 06:38 AM.. Reason: additional code tags
# 2  
Old 10-13-2013
Like this?
Code:
sed -n 's/.*\(MEMLIMIT [^ ]* [^ ]*\).*Started on <\([^>]*\).*\(MEM: [^ ]* .\).*/Started: \2 \1 \3/p' file

# 3  
Old 10-13-2013
Thanks for that Scrutinizer - so very close to what I need! If I've got it correct, it only displays if all three patterns are found, ideally it would be great if it could print every line with 1 or more matches:

Code:
Started: cn026 MEMLIMIT: 0 G MEM: 13 M

or just blank rather than 0 G on the MEMLIMIT. Basically every entry _should_ have a 'Started on' and a MEM:, but not necessarily a MEMLIMIT

Last edited by chrissycc; 10-13-2013 at 07:46 AM.. Reason: correction
# 4  
Old 10-13-2013
If you are OK with Perl solution: put this into "script.pl":
Code:
#!/usr/bin/perl
use strict;
open I, "$ARGV[0]";
while (chomp($_=<I>)) {
  if (/Started on <([^>]+)/) {
    my $started=$1;
    my $memlimit=$1 if /MEMLIMIT (\d+) G/;
    $memlimit=$memlimit?$memlimit:0;
    /MEM: ([^;]+)/;
    my $mem=$1;
    print "Started: $started MEMLIMIT: $memlimit G MEM: $mem\n";
  }
}

Then run: perl script.pl file
# 5  
Old 10-13-2013
straightforward awk:
Code:
awk     'match ($0, /Started on/)       {C++; X=substr ($0, RSTART+RLENGTH,10); gsub (/^.*<|>.*$/, "", X)}
         match ($0, /MEMLIMIT/)         {C++; Y=substr ($0, RSTART+RLENGTH,10); gsub (/^ |[^kMG]*$/, "", Y)}
         match ($0, /MEM:/)             {C++; Z=substr ($0, RSTART+RLENGTH,10);  sub (/[bB].*$/, "", Z)}
         C >=2                          {printf "Started: %s MEMLIMIT: %6s MEM: %6s\n", X, Y, Z}
                                        {C=X=Y=Z=0}
        ' file
Started: cn035 MEMLIMIT:   10 G MEM:    1 G
Started: cn069 MEMLIMIT:      0 MEM:    1 G
Started: cn049 MEMLIMIT:   10 G MEM:    2 G
Started: cn014 MEMLIMIT:   10 G MEM:  391 M
Started: cn026 MEMLIMIT:      0 MEM:   13 M
Started: cn064 MEMLIMIT:   22 G MEM:   12 G

# 6  
Old 10-13-2013
As RudiC pointed out, the following only works on Solaris:
Code:
/usr/xpg4/bin/awk '{
started=$0; if (!sub(".*Started on <([^>]*).*","\1",started)) started="-"
memlimit=$0; if (!sub(".*MEMLIMIT ([^ ]* [^ ;]*).*","\1",memlimit)) memlimit="-"
mem=$0; if (!sub(".*MEM: ([^ ]* [^ ;]*).*","\1",mem)) mem="-"
printf "Started on: %-8s MEMLIMIT: %-8s MEM: %-8s\n",started,memlimit,mem
}' file
Started on: cn035    MEMLIMIT: 10 G     MEM: 1 Gbytes
Started on: cn069    MEMLIMIT: 10 G     MEM: 1 Gbytes
Started on: cn049    MEMLIMIT: 10 G     MEM: 2 Gbytes
Started on: -        MEMLIMIT: -        MEM: 3 Gbytes
Started on: -        MEMLIMIT: -        MEM: 1014 Mbytes
Started on: cn014    MEMLIMIT: 10 G     MEM: 391 Mbytes
Started on: cn026    MEMLIMIT: -        MEM: 13 Mbytes
Started on: cn064    MEMLIMIT: 22 G     MEM: 12 Gbytes


Last edited by MadeInGermany; 10-13-2013 at 01:47 PM..
# 7  
Old 10-13-2013
@MadeInGermany: What awk- version are you using? Mine (mawk) takes "\1" as the "\001" character.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Hi, I need help to match patterns from between two different files and extract region of strings. inputfile1.fa >l-WR24-1:1 GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG GGCGGAGGGCGACGGCGGGTGGTGAGCGGCCCGGGAGGGGCCGGGCGGTGGGGTCACGTG... (4 Replies)
Discussion started by: bunny_merah19
4 Replies

2. UNIX for Beginners Questions & Answers

How to extract the partial matching strings among two files?

I have a two file as shown below, file:1 >Contig_152_415 (REVERSE SENSE) >Contig_152_420 (REVERSE SENSE) >Contig_152_472 (REVERSE SENSE) >Contig_152_484 (REVERSE SENSE) File:2 >Contig_152:49081-49929 ATCGAGCAGCGCCGCGTGCGGTGCACCCTTGTGCAGATCGGGAGTAACCACGCGCACGGC... (2 Replies)
Discussion started by: dineshkumarsrk
2 Replies

3. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Hi, I have multiple files in my log folder. e.g: a_m1.log b_1.log c_1.log d_1.log b_2.log c_2.log d_2.log e_m1.log a_m2.log e_m2.log I need to keep latest 10 instances of each file. I can write multiple find commands but looking if it is possible in one line. m file are monthly... (4 Replies)
Discussion started by: wahi80
4 Replies

4. Shell Programming and Scripting

Extract multiple strings from line

Hello I have an output that has a string between quotes and another between square brackets on the same line. I need to extract these 2 strings Example line Device "nrst3a" attributes=(0x4) RAW SERIAL_NUMBER=SNL2 Output should look like nrst3a VD073AV1443BVW00083 I was trying with sed... (3 Replies)
Discussion started by: bombcan
3 Replies

5. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

I need to extract multiple occurance strings between 2 different patterns in given line. For e.g. in below as input ------------------------------------------------------------------------------------- mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)... (8 Replies)
Discussion started by: sameermohite
8 Replies

6. UNIX for Dummies Questions & Answers

[SOLVED] awk: matching degenerate patterns

Hi Folks, I have two arrays a: aaa bbb ccc ddd ddd aaa bbb ccc ddd ccc aaa bbb b: aaa bbb ccc aaa ccc bbb bbb aaa ccc ccc bbb aaa I want to compare row by row a(c1:c4) to b(c1:c3). If elements of 'b' match... (5 Replies)
Discussion started by: heecha
5 Replies

7. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

I am trying to extract multiple strings from snmp-mib files like below. ----- $ cat IF-MIB.mib <snip> linkDown NOTIFICATION-TYPE OBJECTS { ifIndex, ifAdminStatus, ifOperStatus } STATUS current DESCRIPTION "A linkDown trap signifies that the SNMP entity, acting in... (5 Replies)
Discussion started by: genzo
5 Replies

8. UNIX for Dummies Questions & Answers

Search and extract matching patterns

%%%%% (9 Replies)
Discussion started by: lucasvs
9 Replies

9. Shell Programming and Scripting

matching patterns inside a condition in awk

I have the following in an awk script. I want to do them on condition that: fext == "xt" FNR == NR { />/ && idx = ++i $2 || val = $1 next } FNR in idx { v = val] } { !/>/ && srdist = abs($1 - v) } />/ || NF == 2 && srdist < dsrmx {... (1 Reply)
Discussion started by: kristinu
1 Replies

10. Shell Programming and Scripting

AWK: matching patterns in 2 different files

In a directory, there are two different file extensions (*.txt and *.xyz) having similar names of numerical strings (*). The (*.txt) contains 5000 multiple files and the (*.xyz) also contains 5000 multiple files. Each of the files has around 4000 rows and 8 columns, with several unique string... (5 Replies)
Discussion started by: asanjuan
5 Replies
Login or Register to Ask a Question