Awk or Perl - to selectively merge two files.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Awk or Perl - to selectively merge two files.
# 1  
Old 07-04-2012
Awk or Perl - to selectively merge two files.

I have two files, these have to be selectively merged into two other files. In addition there will require to be a edit to the last field, where the date format is changed.

The first file is a csv file with around 300k lines the data is now nearly 20 years old and I have been asked to move this to a more up to date format, it could probably be done using awk or perl and I'm quite happy to admit that it is probably beyond my limited scripting ability.


Code:
N1,GULFSTREAM AEROSPACE,G-IV,1071,FEDERAL AVIATION ADMINISTRATION,10/26/92
N1A,GOODYEAR,GZ-20A,4117,GOODYEAR TIRE & RUBBER CO,2/24/92
N1AA,CESSNA,414A,414A0482,RAGSDALE INVESTMENTS INC,4/17/91
N1AB,BRITISH AEROSPACE,BAE 125 SERIES 1000,NA1000,ALEXANDER & BALDWIN INC,6/30/92
N1AC,EAA SPORT BIPLANE,HOME BUILT,3138,EAA AVIATION FOUNDATION INC,3/31/92

The second file is a lookup file with around 75K lines, the example below has been chosen to fit.

Code:
3870219,GOODYEAR,GZ-20
3870220,GOODYEAR,GZ-20A
3980110,GULFSTREAM AEROSPACE,G1159B
3980115,GULFSTREAM AEROSPACE,G-IV
3980116,GULFSTREAM AEROSPACE,G-V
3980117,GULFSTREAM AEROSPACE,G-IV SP
2072414,CESSNA,172F
2073414,CESSNA,210D
2075907,CESSNA,414A
2075908,CESSNA,414
05605PE,CREATURA JOHN A,EAA SPORT BI P2
33601K9,EAA SPORT BIPLANE,HOME BUILT
33602TR,HARTZELL,EAA SPORT BIPLANE P2
3360339,ARLAND WARREN/DAVE,EAA SPORT BIPLANE

Each line in file one, should have an exact match on two fields in file two - however some experimentation has shown that this is not the case. So in order to see how bad the data actually is I also would need a file where there is no match.

So the expected output would be for likes with a match, field2 matches field2 and field3 matches field3. Replace field2 and field3 in file1 with field1 from file2

Code:
N1,3980115,1071,FEDERAL AVIATION ADMINISTRATION,1992-10-26
N1A,3870220,4117,GOODYEAR TIRE & RUBBER CO,1992-02-24
N1AA,2075907,414A0482,RAGSDALE INVESTMENTS INC,1991-04-17
N1AC,33601K9,3138,EAA AVIATION FOUNDATION INC,1992-03-31

Lines that have no match should just be output to an other file, without any modification.


Code:
N1AB,BRITISH AEROSPACE,BAE 125 SERIES 1000,NA1000,ALEXANDER & BALDWIN INC,6/30/92

If there is anyone out there who feels like a stab at this I'd be gratefull for any help or even just a pointer.

Regards

Dave
# 2  
Old 07-04-2012
This should give you matched data in "match" and data without match in "no_match". It doesn't modify date field.
Code:
awk -F, -vOFS="," 'NR==FNR{a[$2","$3]=$1;next}$2","$3 in a{$2=a[$2","$3];$3=$4;$4=$5;$5=$6;NF=5;print;next}{print $0 > "no_match"}' lookup.csv data.csv > match

This User Gave Thanks to bartus11 For This Post:
# 3  
Old 07-04-2012
Slightly tweaked version of bartus11's script (which assumes the number of fields in the data file to be 6):

Code:
awk -F, 'BEGIN{OFS=FS} NR==FNR{a[$2$3]=$1;next} $2$3 in a{$2=a[$2$3];for(i=3;i<NF;i++) $i=$(i+1);NF-=1;print > "match";next} {print $0 > "no_match"}' lookup.csv data.csv

This User Gave Thanks to elixir_sinari For This Post:
# 4  
Old 07-04-2012
Another slight tweak:
Code:
awk -F, '{n=$2 "," $3} NR==FNR{A[n]=$1; next} !(n in A){print > "nomatch"; next} sub(n,A[n])' file2 file1 > match


Last edited by Scrutinizer; 07-04-2012 at 10:01 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 07-04-2012
Hi Guys,

Thanks for the replies, I have just tried the first two. The output from the awk that bartus gave me is as follows;

Code:
[~/data/incomming]
(12:05:23)-(davem)-(1319)-> cat no_match
3870219,GOODYEAR,GZ-20
3980110,GULFSTREAM AEROSPACE,G1159B
3980116,GULFSTREAM AEROSPACE,G-V
3980117,GULFSTREAM AEROSPACE,G-IV SP
2072414,CESSNA,172F
2073414,CESSNA,210D
2075908,CESSNA,414
05605PE,CREATURA JOHN A,EAA SPORT BI P2
33602TR,HARTZELL,EAA SPORT BIPLANE P2
3360339,ARLAND WARREN/DAVE,EAA SPORT BIPLANE

[~/data/incomming]
(12:05:26)-(davem)-(1319)-> cat match
3870220,N1A,,,
3980115,N1,,,
2075907,N1AA,,,
33601K9,N1AC,,,

From elixir the output is exactly as I wanted it to look - like;

Code:
[~/data/incomming]
(13:02:09)-(davem)-(1325)-> cat match
N1,3980115,1071,FEDERAL AVIATION ADMINISTRATION,10/26/92
N1A,3870220,4117,GOODYEAR TIRE & RUBBER CO,2/24/92
N1AA,2075907,414A0482,RAGSDALE INVESTMENTS INC,4/17/91
N1AC,33601K9,3138,EAA AVIATION FOUNDATION INC,3/31/92

[~/data/incomming]
(13:02:15)-(davem)-(1326)-> cat no_match
N1AB,BRITISH AEROSPACE,BAE 125 SERIES 1000,NA1000,ALEXANDER & BALDWIN INC,6/30/92

Have yet to checkout your bit of awk scruitinizer, but I will. Thanks again to you all - you probably saved me a lot of grief.

Regards

Dave
# 6  
Old 07-04-2012
With Bartus' suggestion, I think you would need to switch the order of your input files..
# 7  
Old 07-04-2012
Hi Bartus,

Just had a Doh moment - transposed the file names in the command - it works spot on - just going to bang head on wall for a few minutes.

Regards

Dave
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk merge two files

file1 AAA3:WWW1:DDD1:XXX8:DDD2:XXX9 AAA6:WWW2:FFF1:XXX130:FFF1:XXX104:FFF1:XXX16 AAA7:WWW3:ZZZ1:XXX4:ZZZ2:XXX5:ZZZ3:XXX6:ZZZ4:XXX7file2 XXX8:EEE1:EEE2 XXX9:KKK1:KKK2 XXX130:OOO1:OOO2 XXX104:PPP1:PPP2 XXX16:RRR1:RRR1 XXX4:UUU1:UUU2 XXX5:III1:III2 XXX7:JJJ1:JJJ2Result... (2 Replies)
Discussion started by: vikus
2 Replies

2. Shell Programming and Scripting

merge two files with awk

I have two file like follows. I want to merge them according the first field of file1. The separator of file1 is tab, while the second one is ",". I tried some code, but can't get the results. File1: AABB 6072 28 5922 BBCC 316 147 162 CCDD 907 71 231 File2: CCDD,hTRBV12-4,hTRBJ2-3,319895... (7 Replies)
Discussion started by: xshang
7 Replies

3. Shell Programming and Scripting

How to selectively suppress perl output?

The following perl statement in a bash script consists of two substatements. I intend the first perl substatement (the assignment with glob) to get input from the preceding bash pipe, and the second perl substatement (the foreach loop) to output back to bash. However, the first perl substatement... (7 Replies)
Discussion started by: LessNux
7 Replies

4. Shell Programming and Scripting

Merge files using AWK

I want to merge data from 2nd file to 1st file based on 1st column File1 ==== data1,12,comp1 data1,13,comp2 data3,14,, File2 ==== data1,11,host1,lit data2,11,host2,lit3 data3,11,host3,lit4 Required Ouput (5 Replies)
Discussion started by: greycells
5 Replies

5. Shell Programming and Scripting

How to merge 2 files based on delimeter in perl?

Hi, I have 2 files. a.txt & b.txt # a.txt contains the following text. apple grapes # b.txt contains the following text. banana pine My question is. (1 Reply)
Discussion started by: vanitham
1 Replies

6. Shell Programming and Scripting

Merge 2 csv files with awk

I have 2 files pipe delimted and want to merge them based on a key e.g file 1 123$aaa$yyy$zzz 345$xab$yzy$zyz 456$sss$ttt$foo 799$aaa$ggg$dee file 2 123$hhh 345$ddd 456$xxx 888$zzz so if the key is the first field, and the result should be the common key between file 1 and 2 (6 Replies)
Discussion started by: loloAix
6 Replies

7. Shell Programming and Scripting

perl merge two files by the time sequence

Hi Guys, i have two files: fileA: 20090611 00:00:11 20090611 00:00:11 20090611 00:00:24 20090611 00:01:10 20090611 07:13:00 fileB: 20090611 00:00:01 20090611 00:00:12 20090611 00:00:24 20090611 00:01:12 20090611 09:13:00 want to make two files into a single file, but follow the... (14 Replies)
Discussion started by: jimmy_y
14 Replies

8. Shell Programming and Scripting

Merge files of differrent size with one field common in both files using awk

hi, i am facing a problem in merging two files using awk, the problem is as stated below, file1: A|B|C|D|E|F|G|H|I|1 M|N|O|P|Q|R|S|T|U|2 AA|BB|CC|DD|EE|FF|GG|HH|II|1 .... .... .... file2 : 1|Mn|op|qr (2 Replies)
Discussion started by: shashi1982
2 Replies

9. Shell Programming and Scripting

merge two files using awk

Hi Guys, I wonder whether is possible to merge two files using awk. I have two files one with 7 columns and another one with 9 columns and the first column on both files is identical so will be my key to merge the files. Any ideas.Thanks in advance. Harby. (2 Replies)
Discussion started by: hariza
2 Replies

10. Shell Programming and Scripting

Merge two files in windows using perl script

Hi I want to merge two or more files using perl in windows only(Just like Paste command in Unix script) . How can i do this.Is ther any single command to do this? Thanks Kunal (1 Reply)
Discussion started by: kunal_dixit
1 Replies
Login or Register to Ask a Question