Matching column value from 2 different file using awk and append value from different column

09-13-2017

Registered User

35, 0

Join Date: Dec 2008

Last Activity: 17 March 2019, 5:04 AM EDT

Posts: 35

Thanks Given: 3

Thanked 0 Times in 0 Posts

Matching column value from 2 different file using awk and append value from different column

Hi,

I have 2 csv files.

a.csv

Code:

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0
11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012

-------------------------------------------------------------

b.csv

Code:

"10.200.6.69_NE=3148723,S=3,CP=-1,PP=16||1|",14SKA043_1352_UNSUTARA_RTN950A_IDU1-3-ISV3-1(TO_14SKA015_UNS),SURAKARTA
"10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|",20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),No_Location
"10.200.6.69_NE=3147673,S=1,CP=-1,PP=16||1|",14PWT044_1587_TELUK_RTN310-1-SHXA2-1(TO_14PWT010_PURWOKERTOSLT),BANYUMAS

both file as thousand of lines.

I need to match column 2 from both file, and if they match, append the value of column 1 from file b to file a so it becomes like this

file c

Code:

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012,"10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"

I have tried the following but to no avail

Code:

awk -FS=',' 'NR==FNR { a[$2]=$1; next } $2 in a {print $0,a[$13] b.csv a.csv}'

as I understand it, it should read column 2 of file b and store column 1 into the associated array of a, and then match with column 2 of file a, and if match was found, print the row of column a, and also append the array a value to the end of row. I now think I am definitely wrong

. Sorry really newbie here. There is no error but as I tested it, it did print and append the output but it appends at the middle of the row and overriding the row value and also for all line in a.csv, it appends the same value, as if it did not match at all.

Code:

awk -FS=',' 'NR==FNR { a[$2]=$1; next } $2 in a {print $0,a[$13] }' b.csv a.csv > c.csv


HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 "10.200.6.69_NE=3147286,S=4,CP=-1,PP=16||1|",03CMS006_1315_CIANDA_RTN980_1-4-ISV3-1(IF to 03TSK004 M2 BYPASSTSK),No_Location

Many thanks for the kind helps gents!

Moderator's Comments:

Please use CODE tags correctly, for data as well, as required by forum rules!

Last edited by RudiC; 09-14-2017 at 05:45 AM.. Reason: Added CODE tags.

tententen

View Public Profile for tententen

Find all posts by tententen

09-14-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

It's by sheer coincidence that you get anything from that code snippet. The field separator should be set to comma, but due to an incorrect syntax it isn't. The a array holds one single element only, indexed by the empty string, and is overwritten repeatedly until the first file's last line is encountered. As $13 doesn't exist in file 2 and thus is empty, that only element of a is printed. You seem to want to ignore the commas inside the double quotes, so the entire quoted string will be considered field 1? If so, they need to be replaced upfront, and then converted back after the operations.
Would this come close to what you need:

Code:

awk -F\" '{for (i=2; i<=NF; i+=2) gsub (/,/, "\001", $i)} 1' OFS=\" file2 |
awk -F',' 'NR==FNR {a[$2]=$1; next } {print $0, a[$2] }' - file1 |
awk -F\" '{for (i=2; i<=NF; i+=2) gsub ("\001", ",", $i)} 1' OFS=\" 
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012 "10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG 027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012 
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-14-2017

Registered User

35, 0

Join Date: Dec 2008

Last Activity: 17 March 2019, 5:04 AM EDT

Posts: 35

Thanks Given: 3

Thanked 0 Times in 0 Posts

Hi, Yes!, I realized this today that its the double quote causing the problem. I used your solution and got this result

Code:

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
 "10.200.6.69_NE=3147930,S=11,CP=-1,PP=16||1|"),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
 "10.200.6.69_NE=3147930,S=13,CP=-1,PP=16||1|"),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0 
"10.200.6.69_NE=3149110,S=11,CP=-1,PP=16||1|"195.675,20LMG027-20LMG011,99.9995,202.6952012

The original line was like this

Code:

HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-11-ISV3-1(to LAMONGAN_M),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG011_DEKET_1296_RTN-980_IDU-1-13-ISV3-1(to LAMONGAN_S),East_Java,20LMG011_DEKET_1296_RTN-980_IDU-1,20LMG011,20LMG
027_1287_LAMONGAN_RTN980_IDU1,20LMG027,1+1(HSB),195.675,20LMG011-20LMG027,99.9995,202.6952012
HUAWEI,20LMG027_1287_LAMONGAN_RTN980_IDU1-11-ISV3-1(to DEKET_M),East_Java,20LMG027_1287_LAMONGAN_RTN980_IDU1,20LMG027,20LMG0
11_DEKET_1296_RTN-980_IDU-1,20LMG011,1+1(HSB),195.675,20LMG027-20LMG011,99.9995,202.6952012

It seems to have appended to the beginning of '2nd,4th and 6th' line rather than at end of line 1,2 and 3

I am very sure when I vi this file, theres only 3 line. Is awk treating it as 6 line instead as it appends the array hash value to the end of the first line, it moved to beggining of 2nd line? Im confused

Moderator's Comments:

Please use CODE tags as required by forum rules!

Last edited by RudiC; 09-14-2017 at 08:03 AM.. Reason: Added CODE tags.

tententen

View Public Profile for tententen

Find all posts by tententen

09-14-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Your result is a bit surprising as the proposal has been tested and found to work, as you can see in post#2.
What you show in post#3 lets me suspect you have DOS line terminators (^M, 0x0D, \r, <CR>) in your file. Eliminate those and try again.

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-14-2017

Registered User

35, 0

Join Date: Dec 2008

Last Activity: 17 March 2019, 5:04 AM EDT

Posts: 35

Thanks Given: 3

Thanked 0 Times in 0 Posts

Yes that was it!. Thank you sooo much!.

. Would you be so kind to explain the command please whenever you're free

. I would really highly appreciate it!.

Last edited by tententen; 09-14-2017 at 08:32 AM..

tententen

View Public Profile for tententen

Find all posts by tententen

09-14-2017

Registered User

15,129, 5,008

Join Date: Jul 2012

Last Activity: 4 May 2020, 4:31 PM EDT

Location: Aachen, Germany

Posts: 15,129

Thanks Given: 735

Thanked 5,008 Times in 4,483 Posts

Code:

awk -F\" '                                      # set field sep. to "
        {for (i=2; i<=NF; i+=2) \               # in every second (= inside quotes) field
         gsub (/,/, "\001", $i)                 # replace the comma with ^A (a placeholder)
        }
1                                               # print the modified line
' OFS=\" file2 |                                # set output field sep. to " (important!)
awk -F',' '                                     # set FS to comma
NR==FNR {a[$2] = $1                             # save $1 (text to append) to array indexed by $2
         next                                   # stop processing the actual line; goto next
        }
        {print $0, a[$2] }                      # print file2's line, followed by the app. txt, if exists 
' - file1 |                                     # work on input stream, i.e. stdin ("-", piped in) and file1
awk -F\" '                                      # undo comma replacements as in step 1
        {for (i=2; i<=NF; i+=2) \
                 gsub ("\001", ",", $i)
        }
1
' OFS=\"

This User Gave Thanks to RudiC For This Post:

RudiC

View Public Profile for RudiC

Find all posts by RudiC

09-16-2017

Registered User

35, 0

Join Date: Dec 2008

Last Activity: 17 March 2019, 5:04 AM EDT

Posts: 35

Thanks Given: 3

Thanked 0 Times in 0 Posts

Sorry quick question. Somehow, there are some rows which has double quotes. Seems like the ones with double quotes was ignored. I just noticed. Is there a way to also handle double quotes?

Sorry and many thanks!

tententen

View Public Profile for tententen

Find all posts by tententen

Shell Programming and Scripting

Matching column value from 2 different file using awk and append value from different column

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Compare 1st column from 2 file and if match print line from 1st file and append column 7 from 2nd

Discussion started by: tententen

2. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Discussion started by: as7951

3. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Discussion started by: as7951

4. Shell Programming and Scripting

Matching column then append to existing File as new column

Discussion started by: mychbears

5. Shell Programming and Scripting

Solution for replacement of 4th column with 3rd column in a file using awk/sed preserving delimters

Discussion started by: khblts

6. Shell Programming and Scripting

How to awk or grep the last column in file when date on column contains spaces?

Discussion started by: kieranfoley

7. Shell Programming and Scripting

Find lines with matching column 1 value, retain only the one with highest value in column 2

Discussion started by: pathunkathunk

8. Shell Programming and Scripting

Append column using awk/nawk

Discussion started by: aksijain

9. UNIX for Dummies Questions & Answers

append column and row header to a file in awk script.

Discussion started by: FUTURE_EINSTEIN

10. Shell Programming and Scripting

Awk: Need help replacing a specific column in a file by part of a column in another file

Discussion started by: aa2601