Matching column value from 2 different file using awk and append value from different column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching column value from 2 different file using awk and append value from different column
# 1  
Old 09-13-2017
Matching column value from 2 different file using awk and append value from different column

Hi,

I have 2 csv files.

a.csv
Code:
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0
11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012



-------------------------------------------------------------

b.csv
Code:
"10.200.6.69_NE=3148723,S=3,CP=-1,PP=16||1|",14SKA043_1352_UNSUTARA_RTN950A_IDU1-3-ISV3-1(TO_14SKA015_UNS),SURAKARTA
"10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|",20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),No_Location
"10.200.6.69_NE=3147673,S=1,CP=-1,PP=16||1|",14PWT044_1587_TELUK_RTN310-1-SHXA2-1(TO_14PWT010_PURWOKERTOSLT),BANYUMAS


both file as thousand of lines.

I need to match column 2 from both file, and if they match, append the value of column 1 from file b to file a so it becomes like this

file c
Code:
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012,"10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"

I have tried the following but to no avail

Code:
awk -FS=',' 'NR==FNR { a[$2]=$1; next } $2 in a {print $0,a[$13] b.csv a.csv}'

as I understand it, it should read column 2 of file b and store column 1 into the associated array of a, and then match with column 2 of file a, and if match was found, print the row of column a, and also append the array a value to the end of row. I now think I am definitely wrong Smilie. Sorry really newbie here. There is no error but as I tested it, it did print and append the output but it appends at the middle of the row and overriding the row value and also for all line in a.csv, it appends the same value, as if it did not match at all.

Code:
awk -FS=',' 'NR==FNR { a[$2]=$1; next } $2 in a {print $0,a[$13] }' b.csv a.csv > c.csv


HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location

Many thanks for the kind helps gents!


Moderator's Comments:
Mod Comment Please use CODE tags correctly, for data as well, as required by forum rules!

Last edited by RudiC; 09-14-2017 at 05:45 AM.. Reason: Added CODE tags.
# 2  
Old 09-14-2017
It's by sheer coincidence that you get anything from that code snippet. The field separator should be set to comma, but due to an incorrect syntax it isn't. The a array holds one single element only, indexed by the empty string, and is overwritten repeatedly until the first file's last line is encountered. As $13 doesn't exist in file 2 and thus is empty, that only element of a is printed. You seem to want to ignore the commas inside the double quotes, so the entire quoted string will be considered field 1? If so, they need to be replaced upfront, and then converted back after the operations.
Would this come close to what you need:
Code:
awk -F\" '{for (i=2; i<=NF; i+=2) gsub (/,/, "\001", $i)} 1' OFS=\" file2 |
awk -F',' 'NR==FNR {a[$2]=$1; next } {print $0, a[$2] }' - file1 |
awk -F\" '{for (i=2; i<=NF; i+=2) gsub ("\001", ",", $i)} 1' OFS=\" 
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012 "10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012 
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012

This User Gave Thanks to RudiC For This Post:
# 3  
Old 09-14-2017
Hi, Yes!, I realized this today that its the double quote causing the problem. I used your solution and got this result

Code:
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
 "10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
 "10.200.6.69_NE=3147930,S=13,CP=-1,PP=16||1|"),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 
"10.200.6.69_NE=3149110,S=11,CP=-1,PP=16||1|"195.675,20LMG027-20LMG011,99.9995,202.6952012

Image

The original line was like this
Image

Code:
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0
11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012

It seems to have appended to the beginning of '2nd,4th and 6th' line rather than at end of line 1,2 and 3

ImageI am very sure when I vi this file, theres only 3 line. Is awk treating it as 6 line instead as it appends the array hash value to the end of the first line, it moved to beggining of 2nd line? Im confused Smilie


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 09-14-2017 at 08:03 AM.. Reason: Added CODE tags.
# 4  
Old 09-14-2017
Your result is a bit surprising as the proposal has been tested and found to work, as you can see in post#2.
What you show in post#3 lets me suspect you have DOS line terminators (^M, 0x0D, \r, <CR>) in your file. Eliminate those and try again.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 09-14-2017
Yes that was it!. Thank you sooo much!.SmilieSmilie. Would you be so kind to explain the command please whenever you're free Smilie. I would really highly appreciate it!. Smilie

Last edited by tententen; 09-14-2017 at 08:32 AM..
# 6  
Old 09-14-2017
Code:
awk -F\" '                                      # set field sep. to "
        {for (i=2; i<=NF; i+=2) \               # in every second (= inside quotes) field
         gsub (/,/, "\001", $i)                 # replace the comma with ^A (a placeholder)
        }
1                                               # print the modified line
' OFS=\" file2 |                                # set output field sep. to " (important!)
awk -F',' '                                     # set FS to comma
NR==FNR {a[$2] = $1                             # save $1 (text to append) to array indexed by $2
         next                                   # stop processing the actual line; goto next
        }
        {print $0, a[$2] }                      # print file2's line, followed by the app. txt, if exists 
' - file1 |                                     # work on input stream, i.e. stdin ("-", piped in) and file1
awk -F\" '                                      # undo comma replacements as in step 1
        {for (i=2; i<=NF; i+=2) \
                 gsub ("\001", ",", $i)
        }
1
' OFS=\"

This User Gave Thanks to RudiC For This Post:
# 7  
Old 09-16-2017
Sorry quick question. Somehow, there are some rows which has double quotes. Seems like the ones with double quotes was ignored. I just noticed. Is there a way to also handle double quotes?

Sorry and many thanks! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

hi I have 2 file with more than 10 columns for both 1st file apple,0,0,0...... orange,1,2,3..... mango,2,4,5..... 2nd file apple,2,3,4,5,6,7... orange,2,3,4,5,6,8... watermerlon,2,3,4,5,6,abc... mango,5,6,7,4,6,def.... (1 Reply)
Discussion started by: tententen
1 Replies

2. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

3. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

4. Shell Programming and Scripting

Matching column then append to existing File as new column

Good evening I have the below requirements, as I am not an experts in Linux/Unix and am looking for your ideas how I can do this. I have file called file1 and file2. I need to get the second column which is text1_random_alphabets and find that in file 2, if it's exists then print the 3rd... (4 Replies)
Discussion started by: mychbears
4 Replies

5. Shell Programming and Scripting

Solution for replacement of 4th column with 3rd column in a file using awk/sed preserving delimters

input "A","B","C,D","E","F" "S","T","U,V","W","X" "AA","BB","CC,DD","EEEE","FFF" required output: "A","B","C,D","C,D","F" "S", T","U,V","U,V","X" "AA","BB","CC,DD","CC,DD","FFF" tried using awk but double quotes not preserving for every field. any help to solve this is much... (5 Replies)
Discussion started by: khblts
5 Replies

6. Shell Programming and Scripting

How to awk or grep the last column in file when date on column contains spaces?

Hi have a large spreadsheet which has 4 columns APM00111803814 server_2 96085 Corp IT Desktop and Apps APM00111803814 server_2 96085 Corp IT Desktop and Apps APM00111803814 server_2 96034 Storage Mgmt Team APM00111803814 server_2 96152 GWP... (6 Replies)
Discussion started by: kieranfoley
6 Replies

7. Shell Programming and Scripting

Find lines with matching column 1 value, retain only the one with highest value in column 2

I have a file like: I would like to find lines lines with duplicate values in column 1, and retain only one based on two conditions: 1) keep line with highest value in column 3, 2) if column 3 values are equal, retain the line with the highest value in column 4. Desired output: I was able to... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

8. Shell Programming and Scripting

Append column using awk/nawk

Is there any way I can achieve this? Considering test1 : a 1 2 3 4 b 2 3 4 5 c 12 1232 14 1 d 10 13 4 5 e 1 5 6 9 1 And test to be some string : qw twe tew we qw I want something like this : a 1 2 qw 4 b 2 3 twe 5 (5 Replies)
Discussion started by: aksijain
5 Replies

9. UNIX for Dummies Questions & Answers

append column and row header to a file in awk script.

Hi! Is there a way to append column and row header to a file in awk script. For example if I have Jane F 39 manager Carlos M 40 system administrator Sam F 20 programmer and I want it to be # name gend age occup 1 Jane F 39 manager 2 Carlos M ... (4 Replies)
Discussion started by: FUTURE_EINSTEIN
4 Replies

10. Shell Programming and Scripting

Awk: Need help replacing a specific column in a file by part of a column in another file

Hi, I have two input files as File1 : ABC:client1:project1 XYZ:client2-aa:project2 DEF:client4:proj File2 : client1:W-170:xx client2-aa:WT-04:yy client4:L-005A:zz Also, array of valid values can be hardcoded like Output : ABC:W:project1 XYZ:WT:project2 (1 Reply)
Discussion started by: aa2601
1 Replies
Login or Register to Ask a Question