awk help to search columns in two files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk help to search columns in two files
# 1  
Old 08-03-2010
awk help to search columns in two files

Hello,

I'm trying to compare multiple columns between two files. I would like to use columns 1,2 from file1 and search file2 in columns 2,3 for any matches. If they match, then print columns 2,3 from file2.

file1
Code:
11:22:33:44:55:66|2010-07-25 12:30|xyz
22:22:22:22:22:22|2010-07-25 01:00|abc
33:33:33:33:33:33|2010-07-25 07:15|def

file2
Code:
111|22:22:22:22:22:22|2010-07-25 01:00|abc
222|11:22:33:44:55:66|2010-07-11 11:25|xyz
333|22:22:22:22:22:22|2010-07-25 03:00|abc
444|11:22:33:44:55:66|2010-07-25 12:30|def


Desired Result:
Code:
11:22:33:44:55:66|2010-07-25 12:30
22:22:22:22:22:22|2010-07-25 01:00

The awk commands that I'm using doesn't seem to work:
Code:

Code:
awk -F\| 'FILENAME=="file1"{A[$1$2]=$1$2}
FILENAME=="file2"{if(A[$1$2]==$2$3){print$2"|"$3}}' file1 file2

Any help is greatly appreciated!
# 2  
Old 08-04-2010
Your basic problem is that you are testing for A[$1$2] when file 2, and you should be testing for A[$2$3].

This is a bit simpler as it doesn't keep redundant information. You only need to know that you saw the fields in file one, so setting the value to 1, rather than keeping the data in those fields is all that is necessary.

Code:
awk -F\| '
        FILENAME=="file1" {
                A[$1,$2] = 1
                next;
        }

        FILENAME=="file2" {
                if( A[$2,$3] )
                {
                        print $2"|"$3
                }
        }
' file1 file2

Also, a personal preference of mine to separate the items used as the index with a comma when it makes sense. This prevents "abc|def" in file one from matching "ab|cdef" in file 2.

Hope this is what you needed.
This User Gave Thanks to agama For This Post:
# 3  
Old 08-04-2010
Agama, thank you very much for your help and useful tips. As you can see, I'm fairly new to using awk but I'm beginning to realize how powerful a tool it really is. Please let me know if you know of any good 'awk' resources- this is a tool I would definitely like to know better Smilie

---------- Post updated at 06:30 PM ---------- Previous update was at 05:42 PM ----------

Actually, I have one followup question. How would the commands look if I wanted to print $1,$2 from file1 that were not in $2,$3 of file2?
# 4  
Old 08-04-2010
You are most welcome! I remember all too well some of my early frustrations with awk and am more than happy to pass my experience along to others.

If you like real, paper, books, the O'Reilly - Sed & Awk 2Nd Edition book O'Reilly - Sed & Awk 2Nd Edition is decent. If you like reading html/pdf the doc linked from this page seems good: The GNU Awk User's Guide Personally, I've gotten by with just the manual pages.

Quote:
Originally Posted by pinseeker
Actually, I have one followup question. How would the commands look if I wanted to print $1,$2 from file1 that were not in $2,$3 of file2?
With a few small changes, your programme can print the fields from the first file that weren't found in the second.

Code:
# print data from file 1 that is not in file 2
awk -F\| '
        FILENAME=="file1" {
                a[$1"|"$2] = 1;
                next;
        }

        FILENAME=="file2" {
                a[$2"|"$3] = 0;

        }

        END {
                for( x in a )
                        if( a[x] )              # still set, not seein in file 2
                                print x;
        }
' file1 file2

A couple of things to note...
1) I use lowercase variable names -- uppercase imples an awk maintained/generated variable like NF, NR or FILENAME.

2) I've changed the index to be a single string of both fields with a vertical bar. This still prevents abc|def in one file from being matched with ab|cdef in another, and lets me print the index out nicely without any extra logic to mash it up.
# 5  
Old 08-05-2010
Thank you once again - the updated program works perfectly! Just as important, thank you for the references and the explanations in the code. I have been struggling with this for the past several days and other methods I have tried have taken too long to complete (file1/file2 in my case are a few hundred thousand to several million lines in length). When I first tried AWK to perform this task, it seemed fast enough, but I just couldn't get the code right to produce the right results.....best regards.
# 6  
Old 08-06-2010
Hi.

See awk.info Awk.info for pointers to many additional resources for awk. including a wiki, FAQ, blog, man pages, usenet group, books, etc.

Best wishes ... cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to use "awk" to print columns from different files in separate columns?

Hi, I'm trying to copy and paste the sixth column from a bunch of files into a single file having each column pasted in separate columns (and not one after each other in just one column.) I tried this code but works only partially because it copied and pasted 50 rows of each column... (6 Replies)
Discussion started by: Frastra
6 Replies

2. Shell Programming and Scripting

Merge columns from two files using awk

I have two csv files : say a.csv, b.csv a.csv looks like this : property1,property2,100 property3,property4,200 In a.csv, the combination of column1 and column2 will be unique b.csv looks like this property1,property2, 300, t1 property1,property2, 400,t2 property3, property4,800,t1... (2 Replies)
Discussion started by: Lakshmikumari
2 Replies

3. Shell Programming and Scripting

Compare 2 columns of files awk

hello everybody I have 2 files the file1 has 10 columns and the form: ... 110103 0802 1.16 38 20.16 22 1.21 8.77 0.00 20 120103 0832 23.40 38 22.10 21 46.35 10.17 0.00 28 120103 1413 45.00 38 24.50 21 48.85 7.89 0.00 38 130103 1112 23.40 38 22.10 21 48.85 ... (5 Replies)
Discussion started by: phaethon
5 Replies

4. Shell Programming and Scripting

Help with awk Matching columns from two files

Hello, I have two files as following: #bin chrom chromStart chromEnd name score strand observed 585 chr2 29442 29443 rs4637157 0 + C/T 585 chr2 33011 33012 rs13423995 0 + A/G 585 chr2 34502 34503 rs13386087 0 + ... (2 Replies)
Discussion started by: Homa
2 Replies

5. Shell Programming and Scripting

Compare columns in two different files using awk

Hi, I want to compare the columns of two files excluding column 2 from both the files. I tried this awk command. awk -F":" 'NR==FNR{++a;next} !(a)' file1.txt file2.txt . Example: File1.txt 123:09-15-2011:abc:123456 123:09-15-2011:abc:234567 123:09-15-2011:abc:345678 ... (5 Replies)
Discussion started by: shell_newbie
5 Replies

6. Shell Programming and Scripting

AWK: Comparing two columns from two different files

Hi - I have two files as follows: File 1: chr5 118464905 118465027 ENST00000514151 utr5 0 + chr5 118464903 118465118 ENST00000504031 utr5 0 + chr5 118468826 118469180 ENST00000504031 utr5 0 + chr5 118469920 118470084 ... (14 Replies)
Discussion started by: polsum
14 Replies

7. UNIX for Dummies Questions & Answers

Using awk to get columns from different files

How can I use awk to create a new file that has 2 columns, each colums comes form a different file. example: I need column 3 from file1 and column 5 from file2 to make file3. (3 Replies)
Discussion started by: cosmologist
3 Replies

8. Shell Programming and Scripting

awk/sed search lines in file1 matching columns in file2

Hi All, as you can see I'm pretty new to this board. :D I'm struggling around with small script to search a few fields in another file. Basically I have file1 looking like this: 15:38:28 sz:10001 pr:14.16 15:38:28 sz:10002 pr:18.41 15:38:29 sz:10003 pr:19.28 15:38:30 sz:10004... (1 Reply)
Discussion started by: floripoint
1 Replies

9. Shell Programming and Scripting

awk adding columns from different files

Hi, I have two files and I need to add column 3 of file1 to column 3 of file 2 > file3 I also need to repeat for column 4. Thanks (1 Reply)
Discussion started by: dsstamps
1 Replies

10. Shell Programming and Scripting

Need help in AWK;Search String and rearrange columns

Hi AWK Experts, file1.txt contains: 29b11b820ddcc:-|OHad.perWrk|spn_id=AH111|spn_ordtyp=MY_REQ|msg_typ=ah.ntf.out|spn_ordid=928176|spn_nid=3|msg_strt=1175615334703|msg_que=oput|diff=371|17:48:55,074|17:48:55,084|10 file2.txt contains:... (2 Replies)
Discussion started by: spring_buck
2 Replies
Login or Register to Ask a Question