Comparing 2 files with awk and updating 2nd file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing 2 files with awk and updating 2nd file
# 1  
Old 06-19-2010
Comparing 2 files with awk and updating 2nd file

file1: (unique files)
Code:
1    /pub/atomicbk/catalog/catalog.gif 693
2    /pub/atomicbk/catalog/home.gif 813
3    /pub/atomicbk/catalog/logo2.gif 12871
4    /pub/atomicbk/catalog/sleazbk.html 18338

file2: (duplicate filenames allowed)
Code:
28/Aug/1995:00:00:38 1 /pub/atomicbk/catalog/home.gif 813
28/Aug/1995:00:00:39 1 /pub/atomicbk/catalog/catalog.gif 693
28/Aug/1995:00:00:40 1 /pub/atomicbk/catalog/logo2.gif 12871
28/Aug/1995:00:00:41 1 /pub/atomicbk/catalog/logo2.gif 12871
28/Aug/1995:00:00:42 1 /pub/atomicbk/catalog/sleazbk.html 18338
28/Aug/1995:00:00:43 1 /pub/atomicbk/catalog/catalog.gif 693

I have 2 files. file1 contains unique files, the 1st field being the FileID and the 2nd is the FileName. File2 contains the timestamp, operation type, FileName, and FileSize respectively.

Basically, what I need to do is to match the filenames of each file. If they match, I need to create a new column in file2 that stores the FileID (taken from 1st col from file1).

Basically, the resulting file2 should be this (new column is in the front):
Code:
2 28/Aug/1995:00:00:38 1 /pub/atomicbk/catalog/home.gif 813
1 28/Aug/1995:00:00:39 1 /pub/atomicbk/catalog/catalog.gif 693
3 28/Aug/1995:00:00:40 1 /pub/atomicbk/catalog/logo2.gif 12871
3 28/Aug/1995:00:00:41 1 /pub/atomicbk/catalog/logo2.gif 12871
4 28/Aug/1995:00:00:42 1 /pub/atomicbk/catalog/sleazbk.html 18338
1 28/Aug/1995:00:00:43 1 /pub/atomicbk/catalog/catalog.gif 693

I will be running this on very large files (upwards of 900,000-1,000,000 lines in file2) and (around 5000 lines in file1). So I need it to run as fast as possible. I've been struggling with this one, so I hope someone can help.

Thank you in advance!

I was thinking that maybe I could sort file2 by the 3rd column (cat file2 | sort -k 3) first. Then compare col2 in file1 with col3 in file2. Here is the pseudocode that I haven't been able to implement in awk.

Sort col3 in file2 in ascending order (cat file2 | sort -k 3)

Code:
while (i <= lastline_file2) {
if (file2.col3 == file1.col2)
     file2.newcol = file1.col1
     i++ //increment line in file2
else
     j++ //increment line in file1
}

Sort file2 back by timestamp
//Assuming the new column created in file2 was at the beginning
Code:
cat file2 | sort -k 2

Maybe there is a better way, but I'm not that great at awk yet.

Thanks!

Last edited by Scott; 06-20-2010 at 08:07 AM.. Reason: Code tags, please...
# 2  
Old 06-19-2010
Hi

Code:
awk 'NR==FNR{a[$2]=$1;next;}{if ($3 in a) print a[$3], $0;}'  file1 file2

Guru.
This User Gave Thanks to guruprasadpr For This Post:
# 3  
Old 06-20-2010
Wow. 1 line. What can I say? That's awesome. Thanks for your help!
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing the matches in two files using awk when both files have their own field separators

I've two files with data like below: file1.txt: AAA,Apples,123 BBB,Bananas,124 CCC,Carrot,125 file2.txt: Store1|AAA|123|11 Store2|BBB|124|23 Store3|CCC|125|57 Store4|DDD|126|38 So,the field separator in file1.txt is a comma and in file2.txt,it is | Now,the output should be... (2 Replies)
Discussion started by: asyed
2 Replies

2. Shell Programming and Scripting

comparing column of two different files and print the column from in order of 2nd file

Hi friends, My file is like: Second file is : I need to print the rows present in file one, but in order present in second file....I used while read gh;do awk ' $1=="' $gh'" {print >> FILENAME"output"} ' cat listoffirstfile done < secondfile but the output I am... (14 Replies)
Discussion started by: CAch
14 Replies

3. Shell Programming and Scripting

comparing 2 files with awk

Hi, I'm a new user in awk and i'm trying to compare two files to create a third one if some values match in both files. The first file has this content: s 45.960746365 _21_ AGT 2490 [21:0 22:0 s 45.980418496 _21_ AGT 2491 [21:0 22:0 s 46.000090627 _21_ AGT 2492 [21:0 22:0 s 47.906552206... (2 Replies)
Discussion started by: carlosoria
2 Replies

4. UNIX for Dummies Questions & Answers

Comparing the 2nd column in two different files and printing corresponding 9th columns in new file

Dear Gurus, I am very new to UNIX. I appreciate your help to manage my files. I have 16 files with equal number of columns in it. Each file has 9 columns separated by space. I need to compare the values in the second column of first file and obtain the corresponding value in the 9th column... (12 Replies)
Discussion started by: Unilearn
12 Replies

5. UNIX for Advanced & Expert Users

Comparing two files using awk

i have one file say file1 having many records.Each record contains 2000 characters.i have to compare 192-200 (stored as name)characters in this file from other file say file2 having name stored in 1-9 characters. after comparing i have to print the record from file1 in another file say file3 ... (3 Replies)
Discussion started by: sonam273
3 Replies

6. Shell Programming and Scripting

Comparing two files and printing 2nd column if match found

Hi guys, I'm rather new at using UNIX based systems, and when it comes to scripting etc I'm even newer. I have two files which i need to compare. file1: (some random ID's) 451245 451288 136588 784522 file2: (random ID's + e-mail assigned to ID) 123888 xc@xc.com 451245 ... (21 Replies)
Discussion started by: spirm8
21 Replies

7. Shell Programming and Scripting

awk - comparing files

I've been trying to use awk to compare two files that have pretty much the same data in apart from certain lines where in one file a fields value has changed. I want to print the line from the first file and the changed line from the second file. At the moment, all I can get it to do is print the... (6 Replies)
Discussion started by: dbrundrett
6 Replies

8. Shell Programming and Scripting

awk updating one file with another, comparing, updating

Hello, I read and search through this wonderful forum and tried different approaches but it seems I lack some knowledge and neurones ^^ Here is what I'm trying to achieve : file1: test filea 3495; test fileb 4578; test filec 7689; test filey 9978; test filez 12300; file2: test filea... (11 Replies)
Discussion started by: mecano
11 Replies

9. UNIX for Dummies Questions & Answers

Korn shell awk use for updating two files

Hi, I have two text files containing records in following format: file1 format is: name1 age1 nickname1 path1 name2 age2 nickname2 path2 file 1 example is: abcd 13 abcd.13 /home/temp/abcd.13 efgh 15 efgh.15 /home/temp/new/efgh.15 (4 Replies)
Discussion started by: alrinno
4 Replies

10. UNIX for Dummies Questions & Answers

Constantly updating log files (tail -f? grep? awk?)

I have a log file which is continuously added to, called log.file. I'd like to monitor this file, and when certain lines are found, update some totals in another file. I've played around with tail -f, grep, and awk, but can't seem to hit the right note, so to speak. The lines I'm... (0 Replies)
Discussion started by: nortonloaf
0 Replies
Login or Register to Ask a Question