awk extract strings matching multiple patterns Post: 302863219

Sponsored Content

Top Forums Shell Programming and Scripting awk extract strings matching multiple patterns Post 302863219 by chrissycc on Sunday 13th of October 2013 05:13:18 AM

10-13-2013

Registered User

awk extract strings matching multiple patterns

Hi,

I wasn't quite sure how to title this one! Here goes:

I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of fields and the exact format etc. All I know is I can look for certain patterns. An extract of the original source is:

Code:

Job <1>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes; 
Job <2>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn069>, -- The CPU time is 10 seconds. MEM: 1 Gbytes; 
Job <3>, Job Name <BLAH>,  MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn049>, ;-- The CPU time is 13 seconds. MEM: 2 Gbytes; 
Job <4>, Job Name <BLAH>,  Status <RUN>,  Command <-- The CPU time is 76 seconds. MEM: 3 Gbytes; 
Job <7>, Job Name <BLAH>,  Stat us <RUN>,  Command <-- The CPU time is 49 seconds. MEM: 1014 Mbytes; 
Job <8>, Job Name <BLAH> , Status <RUN>, -- MEMLIMIT 10 G Fri Oct 11 22:13:19: Started on <cn014>;-- The CPU time is 12 seconds. MEM: 391 Mbytes; 
Job <9>, Job Name <BLAH>,  Status <RUN >,  Command <: Started on <cn026>,-- The CPU time is 71 seconds. MEM: 13 Mbytes; 
Job <10>, Job Name <BLAH>,  Sta tus <RUN>,  Command <#!/bi-- MEMLIMIT 22 G  Started on <cn064>, -- The CPU time is 25 seconds. MEM: 12 Gbytes;

I want to extract based on:

Code:

Started on <____>,
MEMLIMIT __ G
MEM: ___ bytes;

The first line example being:

Code:

MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes;

Each line may contain all, some or none of the above. My ideal output based on the above would be something like:

Code:

Started: cn035 MEMLIMIT: 10 G MEM: 1 G
Started: cn069 MEMLIMIT: 10 G MEM: 1 G 
etc
etc

(ideally, if there is no MEMLIMIT found on a line for example):
Started: cn026 MEMLIMIT: 0 G MEM: 13 M

I've messed around with gsub in awk to extract a single instance but couldn't work out how to select on multiple patterns...

Any help as always would be appreciated!

Last edited by Scrutinizer; 10-13-2013 at 06:38 AM.. Reason: additional code tags

chrissycc

View Public Profile for chrissycc

Find all posts by chrissycc

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK: matching patterns in 2 different files

In a directory, there are two different file extensions (*.txt and *.xyz) having similar names of numerical strings (*). The (*.txt) contains 5000 multiple files and the (*.xyz) also contains 5000 multiple files. Each of the files has around 4000 rows and 8 columns, with several unique string...

2. Shell Programming and Scripting

matching patterns inside a condition in awk

I have the following in an awk script. I want to do them on condition that: fext == "xt" FNR == NR { />/ && idx = ++i $2 || val = $1 next } FNR in idx { v = val] } { !/>/ && srdist = abs($1 - v) } />/ || NF == 2 && srdist < dsrmx {...

3. UNIX for Dummies Questions & Answers

Search and extract matching patterns

%%%%%

4. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

I am trying to extract multiple strings from snmp-mib files like below. ----- $ cat IF-MIB.mib <snip> linkDown NOTIFICATION-TYPE OBJECTS { ifIndex, ifAdminStatus, ifOperStatus } STATUS current DESCRIPTION "A linkDown trap signifies that the SNMP entity, acting in...

5. UNIX for Dummies Questions & Answers

[SOLVED] awk: matching degenerate patterns

Hi Folks, I have two arrays a: aaa bbb ccc ddd ddd aaa bbb ccc ddd ccc aaa bbb b: aaa bbb ccc aaa ccc bbb bbb aaa ccc ccc bbb aaa I want to compare row by row a(c1:c4) to b(c1:c3). If elements of 'b' match...

6. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

I need to extract multiple occurance strings between 2 different patterns in given line. For e.g. in below as input ------------------------------------------------------------------------------------- mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)...

7. Shell Programming and Scripting

Extract multiple strings from line

Hello I have an output that has a string between quotes and another between square brackets on the same line. I need to extract these 2 strings Example line Device "nrst3a" attributes=(0x4) RAW SERIAL_NUMBER=SNL2 Output should look like nrst3a VD073AV1443BVW00083 I was trying with sed...

8. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Hi, I have multiple files in my log folder. e.g: a_m1.log b_1.log c_1.log d_1.log b_2.log c_2.log d_2.log e_m1.log a_m2.log e_m2.log I need to keep latest 10 instances of each file. I can write multiple find commands but looking if it is possible in one line. m file are monthly...

9. UNIX for Beginners Questions & Answers

How to extract the partial matching strings among two files?

I have a two file as shown below, file:1 >Contig_152_415 (REVERSE SENSE) >Contig_152_420 (REVERSE SENSE) >Contig_152_472 (REVERSE SENSE) >Contig_152_484 (REVERSE SENSE) File:2 >Contig_152:49081-49929 ATCGAGCAGCGCCGCGTGCGGTGCACCCTTGTGCAGATCGGGAGTAACCACGCGCACGGC...

10. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Hi, I need help to match patterns from between two different files and extract region of strings. inputfile1.fa >l-WR24-1:1 GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG GGCGGAGGGCGACGGCGGGTGGTGAGCGGCCCGGGAGGGGCCGGGCGGTGGGGTCACGTG...

LEARN ABOUT OSF1

patterns

patterns(4)						     Kernel Interfaces Manual						       patterns(4)

NAME

       patterns - Patterns for use with internationalization tools

SYNOPSIS

       See the Description section.

DESCRIPTION

       The patterns file contains the patterns that must be matched for the internationalization tools extract, strextract, and strmerge.

       The pattern file in the following example is the default patterns file located in /usr/lib/nls/patterns.

       # This is the header to insert at the beginning of the first new # source file

       $SRCHEAD1 (1) #include <nl_types.h> nl_catd _m_catd; 

       # The header to insert at the beginning of the rest of the new # source files

       $SRCHEAD2 (2) #include <nl_types.h> extern nl_catd _m_catd; 

       # This is the header to insert at the beginning of the message # catalogues

       $CATHEAD (3) $ /* $  * X/OPEN message catalogue $  */  $quote "

       # This is how patterns that are matched will get rewritten.

       $REWRITE (4) catgets(_m_catd, %s, %n, %t)

       # Following is a list of the sort of strings we are looking for.  # The regular expression syntax is based on regexp(3).

       $MATCH (5)

       # Match on strings containing an escaped " "[^\]*\"[^"]*"

       # Match on general strings "[^"]*"

       # Now reject some special C constructs.

       $REJECT	       (6) # the empty string ""0

       # string with just one format descriptor "%."  "%.."

       # string with just line control in "\."

       # string with just line control and one format descriptor in "%.\."  "\.%."

       # ignore cpp include lines #[  ]*include[  ]*".*"  #[	]*ident[  ]*".*"

       #  reject some common C functions and expressions with quoted # strings [sS][cC][cC][sS][iI][dD][][  ]*=[  ]*".*"  open[  ]*([^,]*,[^)]*)
       creat[  ]*([^,]*,[^)]*) access[	]*([^,]*,[^)]*) chdir[	]*([^,]*,[^)]*) chmod[	]*([^,]*,[^)]*) chown[	]*([^,]*,[^)]*)

       # Reject any strings in single line comments /*.**/

       # Print a warning for initialised strings.

       $ERROR initialised strings cannot be replaced	     (7) char[^=]*=[  ]*"[^"]*" char[^=]*=[  ]*"[^\]*\"[^"]*" char[  ]***[A-Za-z][A-Za-
       z0-9]*[[^]*][ ]*=[  {]*"[^"]*" char[ ]***[A-Za-z][A-Za-z0-9]*[[^]*][ ]*=[  {]*"[^\]*\"[^"]*"

       The default patterns file is divided into the following sections: In the $SRCHEAD1 section, the strmerge and extract commands place text in
       this section at the beginning of the first new source program, which is prefixed by nl_.  These commands define the  native  language  file
       descriptors  that  point to the message catalog.  In the $SRCHEAD2 section, the strmerge and extract commands place text in this section at
       the beginning of the second and remaining source programs. These commands also define the native language file descriptors  that  point	to
       the  message  catalog.  $SRCHEAD2  contains  the external declaration of the nl file descriptor.  In the $CATHEAD section, the strmerge and
       extract commands place text in this section at the beginning of the message catalog.  In the $REWRITE section, you specify how the strmerge
       and  extract  commands should replace the extracted strings in the new source program. You can supply three options to the catgets command:
       This option increments the set number for each source. This option applies only if you are using the strmerge command.  For  more  informa-
       tion  on  set numbers, see the catgets(3) reference page.  This option increments the message number for each string extracted. This option
       applies if you are using either the strmerge or extract commands.  This option expands the text from the string extracted. The  string  can
       be  a error message or the default string extracted and printed by the catgets command. For example, if you want an error message to appear
       when catgets is unable to retrieve the message from the message catalog, you would include the following  line:	catgets(_m_catd,  %s,  %n,
       "BAD STRING")

	      When  catgets  fails,  it  returns the message BAD STRING.  In the $MATCH section, you specify the patterns in the form of a regular
	      expression that you want the strextract, strmerge, and extract commands to find and match.  The regular expression follows the  same
	      syntax  rules  as defined in regexp(3) reference page.  In the $REJECT section, you specify the matched strings that you do not want
	      the strmerge and extract commands to replace in your source program.  The regular  expression  follows  the  same  syntax  rules	as
	      defined in regexp(3) reference page.  In the $ERROR section, the strextract, strmerge, and extract commands look for bad matches and
	      notify you with a warning message. The regular expression follows the same syntax rules as defined in the regexp(3) reference page.

RELATED INFORMATION

       extract(1), strextract(1), strmerge(1), trans(1), regexp(3)
       Writing Software for the International Market delim off

																       patterns(4)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK: matching patterns in 2 different files

Discussion started by: asanjuan

2. Shell Programming and Scripting

matching patterns inside a condition in awk

Discussion started by: kristinu

3. UNIX for Dummies Questions & Answers

Search and extract matching patterns

Discussion started by: lucasvs

4. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

Discussion started by: genzo

5. UNIX for Dummies Questions & Answers

[SOLVED] awk: matching degenerate patterns

Discussion started by: heecha

6. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

Discussion started by: sameermohite

7. Shell Programming and Scripting

Extract multiple strings from line

Discussion started by: bombcan

8. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Discussion started by: wahi80

9. UNIX for Beginners Questions & Answers

How to extract the partial matching strings among two files?

Discussion started by: dineshkumarsrk

10. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Discussion started by: bunny_merah19

LEARN ABOUT OSF1

patterns