Linux - Join 2 csv files with common key


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Linux - Join 2 csv files with common key
# 1  
Old 02-02-2016
Linux - Join 2 csv files with common key

Hi,

I am trying to join 2 csv files, to create a 3rd output file with the joined data.

Below is an example of my Input Data:

Input File 1

Code:
NAME, FAV_FOOD, FAV_DRINK, ID, GENDER
Bob, Fish, Coke, 1, M
Lisa, Rice, Water, 2, F
Jenny, Noodle, Tea, 3, F
Ken, Pizza, Coffee, 4, M
Lisa, Cake, Milk, 2, F
Ken, Sandwich, Juice, 4, M

Input File 2

Code:
NAME, ID, AGE, COUNTRY
Lisa, 2, F, 22, UK 
Ken, 4, M, 12, Iceland
Jenny, 3, F, 18, France
Bob, 1, M, 31, Ireland

* The Keys to join are 'Name' and 'ID';
* The joined file should look like the below:

Output File

Code:
NAME, FAV_FOOD, FAV_DRINK, ID, GENDER, AGE, COUNTRY
Bob, Fish, Coke, 1, M, 31, Ireland
Lisa, Rice, Water, 2, F, 22, UK 
Jenny, Noodle, Tea, 3, F, 18, France
Ken, Pizza, Coffee, 4, M, 12, Iceland
Lisa, Cake, Milk, 2, F, 22, UK 
Ken, Sandwich, Juice, 4, M, 12, Iceland

Please could you suggest how I can perform this efficiently, as I will be joining large volumes of data from both files.

Many Thanks.

Last edited by RichZR; 02-02-2016 at 09:56 AM.. Reason: Amending Quotes to Code
# 2  
Old 02-02-2016
Any attempts/ideas/thoughts from your side?

---------- Post updated at 14:50 ---------- Previous update was at 14:49 ----------

And, better use code tags for data, in lieu of quote tags...
# 3  
Old 02-02-2016
Hello RichZR,

Adding one more query to Rudi's questions. I can see like fields are not always constant in Input_file2 eg. Ken, 4, M, , 12, Iceland and other lines have 5 fields only. So is it the case or it is a typo, could you please confirm on same. Also code tags as per forum rules for Inputs/commands/codes which you use in your posts.

Thanks,
R. Singh
# 4  
Old 02-02-2016
Quote:
Originally Posted by RavinderSingh13
Hello RichZR,

Adding one more query to Rudi's questions. I can see like fields are not always constant in Input_file2 eg. Ken, 4, M, , 12, Iceland and other lines have 5 fields only. So is it the case or it is a typo, could you please confirm on same. Also code tags as per forum rules for Inputs/commands/codes which you use in your posts.

Thanks,
R. Singh
Hi Ravinder,

Apologies - That was a typo from my side. I have amended this now in my original post. Also amended the Quote section to a Code Section.

---------- Post updated at 02:03 PM ---------- Previous update was at 01:58 PM ----------

Quote:
Originally Posted by RudiC
Any attempts/ideas/thoughts from your side?

---------- Post updated at 14:50 ---------- Previous update was at 14:49 ----------

And, better use code tags for data, in lieu of quote tags...
Hi Rudi,

I have found the following examples within the forum which I am sure could be tailored towards my requirement, however I am not massively experienced within Awk, to breakdown the following examples into sections I can understand to tweak:

Example 1:

Code:
awk 'FNR==NR{A[$1]=$1;next} ($1 in A){print $0 FS A[$1]} !($1 in A){print $0}' file1 FS=, file2

Example 2:

Code:
awk 'FNR==NR{T[$1]=$3 FS $4; next} {print $0, T[$1]}' file2 file1

Cheers.
# 5  
Old 02-02-2016
Hello RichZR,

Assuming you have missed a field in Input_file2 too, so here I have added field GENDER field into Input_file2 in very first line as follows.
Code:
cat Input_file2
NAME, ID, GENDER, AGE, COUNTRY
Lisa, 2, F, 22, UK
Ken, 4, M, 12, Iceland
Jenny, 3, F, 18, France
Bob, 1, M, 31, Ireland

Could you please try following and let me know if this helps you.
Code:
awk -F"," 'FNR==NR{A[$1 FS $2]=$4 FS $5;next} ($1 FS $(NF-1) in A){print $0 FS A[$1 FS $(NF-1)]}' Input_file2 Input_file1

Output will be as follows.
Code:
NAME, FAV_FOOD, FAV_DRINK, ID, GENDER, AGE, COUNTRY
Bob, Fish, Coke, 1, M, 31, Ireland
Lisa, Rice, Water, 2, F, 22, UK
Jenny, Noodle, Tea, 3, F, 18, France
Ken, Pizza, Coffee, 4, M, 12, Iceland
Lisa, Cake, Milk, 2, F, 22, UK
Ken, Sandwich, Juice, 4, M , 12, Iceland

EDIT: Wanted to add point here if any field is NOT present from Input_file1 which is present in Input_file2 it is going to leave that line.
Also adding an explanation here so that it will be good for you in case you get some same kind of requirement in future, hope this helps.
Code:
awk -F","                       ######## Making field separator as comma(,) here.
'FNR==NR{                       ######## Putting condition FNR==NR which will only be TRUE when first file(Input_file2) in this example will be TRUE, where FNR is a awk's variable which will be REST each time a new file is being read and NR will be keep on increasing till the all files being read completely. NR(Number of records)
A[$1 FS $2]=$4 FS $5;           ######## Making an array named A whose index is $1 and field separator and value of $2, then assigning A's value to $4 and $5's value, where $1 is first field of line, $2 is 2nd field, $3 is 3rd field and $4 is fourth field etc.
next}                           ######## here I am using next statement which tells awk NOT to process further statements.
($1 FS $(NF-1) in A){           ######## Now while second file(Input_file1) in this example is being read if $1 field separator and 2nd last column($(NF-1)) if these values are present as an index into array A. then perform the following statements.
print $0 FS A[$1 FS $(NF-1)]    ######## Now print the value of complete line with value of array A whose index is $1 field seprator $(NF-1).
}' Input_file2 Input_file1      ######## mentioning here both of the input files.

Thanks,
R. Singh

Last edited by RavinderSingh13; 02-02-2016 at 10:39 AM.. Reason: Adding an explanatoin for solution here for user's help.
These 2 Users Gave Thanks to RavinderSingh13 For This Post:
# 6  
Old 02-02-2016
On top of what RavinderSingh13 said, spaces aren't used consistently as well, esp. in the end of lines. So, don't expect a perfect result, either. Howsoever, try
Code:
awk '   
FNR == NR       {T[$1,$2] = $0
                 sub ("^" $1 FS $2 FS $3, "", T[$1,$2])
                 next
                }
$1,$4 in T      {print $0  T[$1,$4]
                }
' FS=, file2 file1
NAME, FAV_FOOD, FAV_DRINK, ID, GENDER, COUNTRY
Bob, Fish, Coke, 1, M , 31, Ireland
Lisa, Rice, Water, 2, F , 22, UK
Jenny, Noodle, Tea, 3, F , 18, France
Ken, Pizza, Coffee, 4, M , 12, Iceland
Lisa, Cake, Milk, 2, F , 22, UK
Ken, Sandwich, Juice, 4, M , 12, Iceland

These 2 Users Gave Thanks to RudiC For This Post:
# 7  
Old 02-02-2016
Hi RudiC ,
kindly explain below highlighted part , as without using it we are getting the same result.
sub ("^" $1 FS $2 FS $3, "", T[$1,$2])
Thanks,
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Merge selective columns from files based on common key

Hi, I am trying to selectively merge two files based on keys reported in the 1st column. File1: #file1-header1 file1-header2 111 qwe rtz uio 198 asd fgh jkl 165 yxc 789 poi uzt rew 89 lkj File2: #file2-header2 file2-header2 165 ghz nko2 ... (2 Replies)
Discussion started by: dovah
2 Replies

2. Shell Programming and Scripting

Merging files with common IDs without JOIN

Hi, I am trying to merge information across 2 files. The first file is a "master" file, with all IDS. File 2 contains a subset of IDs of those in File 1. I would like to match up individuals in File 1 and File 2, and add information in File 2 to that of File 1 if they appear. However, if an... (3 Replies)
Discussion started by: hubleo
3 Replies

3. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Hi all, I'm trying to join two .txt file tab delimitated based on a common column. File 1 transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00 comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies

4. Shell Programming and Scripting

Join two files with common and range identifiers

I have a problem joining two files. The first file abc.txt has 10k lines and has lots of fields but two fields fff1 and ppp1 to merge by. The second file xyz.txt is a master file with 1k lines and lots of fields but three fields to merge by fff1; rrr1 and qqq1. The two files need to be merged... (9 Replies)
Discussion started by: cfiles2012
9 Replies

5. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

6. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

7. Shell Programming and Scripting

join files based on a common field

Hi experts, Would you please help me with this? I have several files and I need to join the forth field of them based on the common first field. here's an example... first file: 280346 39.88 -75.08 547.8 280690 39.23 -74.83 538.7 280729 40.83 -75.08 499.2 280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies

8. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

9. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

10. Shell Programming and Scripting

Join 3 files using key column in a mapping file

I'm new of UNIX shell scripting. I'm recently generating a excel report in UNIX(file with delimiter is fine). How should I make a script to do it? 1 file to join comes from output of one UNIX command, the second from another UNIX command, and third from a database query. The key columes of all... (7 Replies)
Discussion started by: bigsmile
7 Replies
Login or Register to Ask a Question