Sponsored Content
Top Forums Shell Programming and Scripting awk extract strings matching multiple patterns Post 302863219 by chrissycc on Sunday 13th of October 2013 05:13:18 AM
Old 10-13-2013
awk extract strings matching multiple patterns

Hi,

I wasn't quite sure how to title this one! Here goes:

I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of fields and the exact format etc. All I know is I can look for certain patterns. An extract of the original source is:

Code:
Job <1>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes; 
Job <2>, Job Name <BLAH>, Queue-- MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn069>, -- The CPU time is 10 seconds. MEM: 1 Gbytes; 
Job <3>, Job Name <BLAH>,  MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn049>, ;-- The CPU time is 13 seconds. MEM: 2 Gbytes; 
Job <4>, Job Name <BLAH>,  Status <RUN>,  Command <-- The CPU time is 76 seconds. MEM: 3 Gbytes; 
Job <7>, Job Name <BLAH>,  Stat us <RUN>,  Command <-- The CPU time is 49 seconds. MEM: 1014 Mbytes; 
Job <8>, Job Name <BLAH> , Status <RUN>, -- MEMLIMIT 10 G Fri Oct 11 22:13:19: Started on <cn014>;-- The CPU time is 12 seconds. MEM: 391 Mbytes; 
Job <9>, Job Name <BLAH>,  Status <RUN >,  Command <: Started on <cn026>,-- The CPU time is 71 seconds. MEM: 13 Mbytes; 
Job <10>, Job Name <BLAH>,  Sta tus <RUN>,  Command <#!/bi-- MEMLIMIT 22 G  Started on <cn064>, -- The CPU time is 25 seconds. MEM: 12 Gbytes;

I want to extract based on:

Code:
Started on <____>,
MEMLIMIT __ G
MEM: ___ bytes;

The first line example being:
Code:
MEMLIMIT 10 G Fri Oct 11 09:55:48: Started on <cn035>, -- The CPU time is 12 seconds. MEM: 1 Gbytes;

Each line may contain all, some or none of the above. My ideal output based on the above would be something like:

Code:
Started: cn035 MEMLIMIT: 10 G MEM: 1 G
Started: cn069 MEMLIMIT: 10 G MEM: 1 G 
etc
etc

(ideally, if there is no MEMLIMIT found on a line for example):
Started: cn026 MEMLIMIT: 0 G MEM: 13 M

I've messed around with gsub in awk to extract a single instance but couldn't work out how to select on multiple patterns...

Any help as always would be appreciated!

Last edited by Scrutinizer; 10-13-2013 at 06:38 AM.. Reason: additional code tags
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK: matching patterns in 2 different files

In a directory, there are two different file extensions (*.txt and *.xyz) having similar names of numerical strings (*). The (*.txt) contains 5000 multiple files and the (*.xyz) also contains 5000 multiple files. Each of the files has around 4000 rows and 8 columns, with several unique string... (5 Replies)
Discussion started by: asanjuan
5 Replies

2. Shell Programming and Scripting

matching patterns inside a condition in awk

I have the following in an awk script. I want to do them on condition that: fext == "xt" FNR == NR { />/ && idx = ++i $2 || val = $1 next } FNR in idx { v = val] } { !/>/ && srdist = abs($1 - v) } />/ || NF == 2 && srdist < dsrmx {... (1 Reply)
Discussion started by: kristinu
1 Replies

3. UNIX for Dummies Questions & Answers

Search and extract matching patterns

%%%%% (9 Replies)
Discussion started by: lucasvs
9 Replies

4. Shell Programming and Scripting

awk? extract quoted "" strings from multiple lines.

I am trying to extract multiple strings from snmp-mib files like below. ----- $ cat IF-MIB.mib <snip> linkDown NOTIFICATION-TYPE OBJECTS { ifIndex, ifAdminStatus, ifOperStatus } STATUS current DESCRIPTION "A linkDown trap signifies that the SNMP entity, acting in... (5 Replies)
Discussion started by: genzo
5 Replies

5. UNIX for Dummies Questions & Answers

[SOLVED] awk: matching degenerate patterns

Hi Folks, I have two arrays a: aaa bbb ccc ddd ddd aaa bbb ccc ddd ccc aaa bbb b: aaa bbb ccc aaa ccc bbb bbb aaa ccc ccc bbb aaa I want to compare row by row a(c1:c4) to b(c1:c3). If elements of 'b' match... (5 Replies)
Discussion started by: heecha
5 Replies

6. Shell Programming and Scripting

Extract multiple occurance of strings between 2 patterns

I need to extract multiple occurance strings between 2 different patterns in given line. For e.g. in below as input ------------------------------------------------------------------------------------- mike(hussey) AND mike(donald) AND mike(ryan) AND mike(johnson)... (8 Replies)
Discussion started by: sameermohite
8 Replies

7. Shell Programming and Scripting

Extract multiple strings from line

Hello I have an output that has a string between quotes and another between square brackets on the same line. I need to extract these 2 strings Example line Device "nrst3a" attributes=(0x4) RAW SERIAL_NUMBER=SNL2 Output should look like nrst3a VD073AV1443BVW00083 I was trying with sed... (3 Replies)
Discussion started by: bombcan
3 Replies

8. Shell Programming and Scripting

Find files not matching multiple patterns and then delete anything older than 10 days

Hi, I have multiple files in my log folder. e.g: a_m1.log b_1.log c_1.log d_1.log b_2.log c_2.log d_2.log e_m1.log a_m2.log e_m2.log I need to keep latest 10 instances of each file. I can write multiple find commands but looking if it is possible in one line. m file are monthly... (4 Replies)
Discussion started by: wahi80
4 Replies

9. UNIX for Beginners Questions & Answers

How to extract the partial matching strings among two files?

I have a two file as shown below, file:1 >Contig_152_415 (REVERSE SENSE) >Contig_152_420 (REVERSE SENSE) >Contig_152_472 (REVERSE SENSE) >Contig_152_484 (REVERSE SENSE) File:2 >Contig_152:49081-49929 ATCGAGCAGCGCCGCGTGCGGTGCACCCTTGTGCAGATCGGGAGTAACCACGCGCACGGC... (2 Replies)
Discussion started by: dineshkumarsrk
2 Replies

10. UNIX for Beginners Questions & Answers

Match patterns between two files and extract certain range of strings

Hi, I need help to match patterns from between two different files and extract region of strings. inputfile1.fa >l-WR24-1:1 GCCGGCGTCGCGGTTGCTCGCGCTCTGGGCGCTGGCGGCTGTGGCTCTACCCGGCTCCGG GGCGGAGGGCGACGGCGGGTGGTGAGCGGCCCGGGAGGGGCCGGGCGGTGGGGTCACGTG... (4 Replies)
Discussion started by: bunny_merah19
4 Replies
patterns(4)						     Kernel Interfaces Manual						       patterns(4)

NAME
patterns - Patterns for use with internationalization tools SYNOPSIS
See the Description section. DESCRIPTION
The patterns file contains the patterns that must be matched for the internationalization tools extract, strextract, and strmerge. The pattern file in the following example is the default patterns file located in /usr/lib/nls/patterns. # This is the header to insert at the beginning of the first new # source file $SRCHEAD1 (1) #include <nl_types.h> nl_catd _m_catd; # The header to insert at the beginning of the rest of the new # source files $SRCHEAD2 (2) #include <nl_types.h> extern nl_catd _m_catd; # This is the header to insert at the beginning of the message # catalogues $CATHEAD (3) $ /* $ * X/OPEN message catalogue $ */ $quote " # This is how patterns that are matched will get rewritten. $REWRITE (4) catgets(_m_catd, %s, %n, %t) # Following is a list of the sort of strings we are looking for. # The regular expression syntax is based on regexp(3). $MATCH (5) # Match on strings containing an escaped " "[^\]*\"[^"]*" # Match on general strings "[^"]*" # Now reject some special C constructs. $REJECT (6) # the empty string ""0 # string with just one format descriptor "%." "%.." # string with just line control in "\." # string with just line control and one format descriptor in "%.\." "\.%." # ignore cpp include lines #[ ]*include[ ]*".*" #[ ]*ident[ ]*".*" # reject some common C functions and expressions with quoted # strings [sS][cC][cC][sS][iI][dD][][ ]*=[ ]*".*" open[ ]*([^,]*,[^)]*) creat[ ]*([^,]*,[^)]*) access[ ]*([^,]*,[^)]*) chdir[ ]*([^,]*,[^)]*) chmod[ ]*([^,]*,[^)]*) chown[ ]*([^,]*,[^)]*) # Reject any strings in single line comments /*.**/ # Print a warning for initialised strings. $ERROR initialised strings cannot be replaced (7) char[^=]*=[ ]*"[^"]*" char[^=]*=[ ]*"[^\]*\"[^"]*" char[ ]***[A-Za-z][A-Za- z0-9]*[[^]*][ ]*=[ {]*"[^"]*" char[ ]***[A-Za-z][A-Za-z0-9]*[[^]*][ ]*=[ {]*"[^\]*\"[^"]*" The default patterns file is divided into the following sections: In the $SRCHEAD1 section, the strmerge and extract commands place text in this section at the beginning of the first new source program, which is prefixed by nl_. These commands define the native language file descriptors that point to the message catalog. In the $SRCHEAD2 section, the strmerge and extract commands place text in this section at the beginning of the second and remaining source programs. These commands also define the native language file descriptors that point to the message catalog. $SRCHEAD2 contains the external declaration of the nl file descriptor. In the $CATHEAD section, the strmerge and extract commands place text in this section at the beginning of the message catalog. In the $REWRITE section, you specify how the strmerge and extract commands should replace the extracted strings in the new source program. You can supply three options to the catgets command: This option increments the set number for each source. This option applies only if you are using the strmerge command. For more informa- tion on set numbers, see the catgets(3) reference page. This option increments the message number for each string extracted. This option applies if you are using either the strmerge or extract commands. This option expands the text from the string extracted. The string can be a error message or the default string extracted and printed by the catgets command. For example, if you want an error message to appear when catgets is unable to retrieve the message from the message catalog, you would include the following line: catgets(_m_catd, %s, %n, "BAD STRING") When catgets fails, it returns the message BAD STRING. In the $MATCH section, you specify the patterns in the form of a regular expression that you want the strextract, strmerge, and extract commands to find and match. The regular expression follows the same syntax rules as defined in regexp(3) reference page. In the $REJECT section, you specify the matched strings that you do not want the strmerge and extract commands to replace in your source program. The regular expression follows the same syntax rules as defined in regexp(3) reference page. In the $ERROR section, the strextract, strmerge, and extract commands look for bad matches and notify you with a warning message. The regular expression follows the same syntax rules as defined in the regexp(3) reference page. RELATED INFORMATION
extract(1), strextract(1), strmerge(1), trans(1), regexp(3) Writing Software for the International Market delim off patterns(4)
All times are GMT -4. The time now is 04:56 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy