[SOLVED] awk: matching degenerate patterns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers [SOLVED] awk: matching degenerate patterns
# 1  
Old 09-26-2012
[SOLVED] awk: matching degenerate patterns

Hi Folks,

I have two arrays

a:
Code:
aaa     bbb     ccc     ddd
ddd     aaa     bbb     ccc
ddd     ccc     aaa     bbb

b:
Code:
aaa     bbb     ccc
aaa     ccc     bbb
bbb     aaa     ccc
ccc     bbb     aaa

I want to compare row by row a(c1:c4) to b(c1:c3). If elements of 'b' match those of 'a' then I want check for the order, and only grab those matches in 'b' that maintain the order seen in 'a'. (There will be ambiguity, but I have a straightforward method for filtering once this step is done.)

By way of illustration...

in 'b'

aaa bbb ccc

matches

aaa bbb ccc ddd

in 'a', but doesn't match

ddd ccc aaa bbb

I've managed matching using a version of the code below, but I'm stuck on how to further restrict the matches by requiring that the order of the three in 'b' follow that in 'a'.

Thanks guys...

Code:
awk -v f2=a '
    BEGIN {
        while( (getline<f2) > 0 )   # read and collect records from f2
        {
            key = $2;
            ki = kidx[key]++;        # track number of duplicate keys (0 based)
            k2rec[key,ki] = $0;      # save unique record by key and dup count
        }
        close( f2 );
    }

    {
        key = $1;
        for( i = 0; i < kidx[key]; i++ )          # for each duplicate of key
            printf( "%s\t%s\n", k2rec[key,i], $0 );   # print f2 record, followed by current f1 record
    }
' <b


Last edited by Scrutinizer; 09-26-2012 at 12:06 PM.. Reason: code tags for data sample too
# 2  
Old 09-26-2012
You can use the NR==FNR trick to avoid needing your own read-loop for the first file. FNR is the line number in the file, NR is the total number of lines; they are only the same when you're reading the very first file. (Or, I suppose, if the very first file turns out to be empty.) So when in the first file, handle the line then do next so the code following doesn't try and use it too.

Code:
awk 'NR==FNR { K[$1,$2,$3]=$0; next } ($1 SUBSEP $2 SUBSEP $3) in K' a b

This User Gave Thanks to Corona688 For This Post:
# 3  
Old 09-26-2012
Quote:
Originally Posted by Corona688
[...]

Code:
awk 'NR==FNR { K[$1,$2,$3]=$0; next } ($1 SUBSEP $2 SUBSEP $3) in K' a b

Just for your info,

Code:
($1 SUBSEP $2 SUBSEP $3)

and

Code:
($1, $2, $3)

are equivalent.

So you could use:
Code:
($1, $2, $3) in K

These 2 Users Gave Thanks to radoulov For This Post:
# 4  
Old 09-26-2012
Thanks for your response, I am grateful for your time.

The last section of the code I included
Code:
printf( "%s\t%s\n", k2rec[key,i], $0 );   # print f2 record, followed by current f1 record

prints the entire matched record for 'b' and the entire record for 'a' on the same line. Is there a way to recapture the information from 'a' so that I can print in this same manner.

So the result would be
Code:
aaa     bbb     ccc     ddd     aaa     bbb     ccc

Thanks.

Last edited by heecha; 09-26-2012 at 01:51 PM.. Reason: added formatting for code and data
# 5  
Old 09-26-2012
That's why I saved $0 in K:
Code:
awk 'NR==FNR { K[$1,$2,$3]=$0; next } ($1,$2,$3) in K { print $0, K[$1,$2,$3] }' a b

This User Gave Thanks to Corona688 For This Post:
# 6  
Old 09-26-2012
Ah, yes, now I see!

Thanks for your help, much appreciated.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Delete patterns matching

Delete patterns matching OS version: RHEL 7.3 Shell : Bash I have a file like below (pattern.txt). I need to delete all lines starting with the following words (words separated by comma below) and ) character. LOGGING, NOCOMPRESS, TABLESPACE , PCTFREE, INITRANS, MAXTRANS, STORAGE,... (3 Replies)
Discussion started by: John K
3 Replies

2. Shell Programming and Scripting

awk extract strings matching multiple patterns

Hi, I wasn't quite sure how to title this one! Here goes: I have some already partially parsed log files, which I now need to extract info from. Because of the way they are originally and the fact they have been partially processed already, I can't make any assumptions on the number of... (8 Replies)
Discussion started by: chrissycc
8 Replies

3. Shell Programming and Scripting

[Solved] HP-UX awk sub multiple patterns

Hi, I am using sub to remove blank spaces and one pattern(=>) from the input string. It works fine when I am using two sub functions for the same. However it is giving error while I am trying to remove both spaces and pattern using one single sub function. Working: $ echo " OK => " |awk... (2 Replies)
Discussion started by: sai_2507
2 Replies

4. UNIX for Dummies Questions & Answers

Search and extract matching patterns

%%%%% (9 Replies)
Discussion started by: lucasvs
9 Replies

5. Emergency UNIX and Linux Support

[Solved] AWK to parse adjacent matching lines

Hi, I have an input file like F : 0.1 : 0.002 P : 0.3 : 0.004 P : 0.5 : 0.008 P : 0.1 : 0.005 L : 0.05 : 0.02 P: 0.1 : 0.006 P : 0.01 : 0.08 F : 0.02 : 0.08 Expected output: (2 Replies)
Discussion started by: vasanth.vadalur
2 Replies

6. Shell Programming and Scripting

[Solved] Sed/awk print between patterns the first occurrence

Guys, I am trying the following: i have a log file of a webbap which logs in the following pattern: 2011-08-14 21:10:04,535 blablabla ERROR blablabla bla bla bla bla 2011-08-14 21:10:04,535 blablabla ERROR blablabla bla bla bla ... (6 Replies)
Discussion started by: ppolianidis
6 Replies

7. Shell Programming and Scripting

matching patterns inside a condition in awk

I have the following in an awk script. I want to do them on condition that: fext == "xt" FNR == NR { />/ && idx = ++i $2 || val = $1 next } FNR in idx { v = val] } { !/>/ && srdist = abs($1 - v) } />/ || NF == 2 && srdist < dsrmx {... (1 Reply)
Discussion started by: kristinu
1 Replies

8. Shell Programming and Scripting

Matching patterns

I have a file name in $f. If $f has "-" at the beginning, or "=", or does not have extension ".ry" or ".xt" or ".dat" then cerr would not be empty. Tried the following but having some problems. set cerr = `echo $f | awk '/^-|=|!.ry|!.xt|!.dat/'` (4 Replies)
Discussion started by: kristinu
4 Replies

9. Shell Programming and Scripting

AWK: matching patterns in 2 different files

In a directory, there are two different file extensions (*.txt and *.xyz) having similar names of numerical strings (*). The (*.txt) contains 5000 multiple files and the (*.xyz) also contains 5000 multiple files. Each of the files has around 4000 rows and 8 columns, with several unique string... (5 Replies)
Discussion started by: asanjuan
5 Replies

10. UNIX for Dummies Questions & Answers

matching 3 patterns in shell script

IN a file I need to check for 3 patterns if all the 3 patterns are in the file. I need to send out an email. All this needs to be done in korn shell script. Please advise. (1 Reply)
Discussion started by: saibsk
1 Replies
Login or Register to Ask a Question