Joining files using awk not extracting all columns from File 2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Joining files using awk not extracting all columns from File 2
# 8  
Old 04-08-2016
Hi rdrtx1/Ravinder,

Do you mind suggesting the same code for the files with delimiter ",". I tried using the below ,but didn't work

Code:
 awk 'BEGIN { FS = ","} NR==FNR{l=$1; $1=""; A[l]=$0;next}{print$0 (A[$1]?A[$1]:FS "missing")}' file2 file1

Below are the files:

File1

Code:
1,AA
2,BB
3,CC
4,DD

File 2

Code:
1,IND,100,200,300
2,AUS,400,500,600
5,USA,700,800,900

# 9  
Old 04-08-2016
Hello venkat_reddy,

You could fix that(mentioning code showed by you in your very recent post one.) by doing as follows.
Code:
awk 'BEGIN { FS = OFS = ","} NR==FNR{l=$1; $1=""; A[l]=$0;next}{print $0 (A[$1]?A[$1]:FS "missing")}' file2 file1

Above code will give following output.
Code:
1,AA,IND,100,200,300
2,BB,AUS,400,500,600
3,CC,missing
4,DD,missing

But as per your very first post you needed output in different format, so following may help you in case you need it.
Code:
awk 'BEGIN { FS = OFS = ","} NR==FNR{l=$1; $1=""; A[l]=$2;next}{print $0 FS (A[$1]?A[$1]:"missing")}' file2 file1

Output will be as follows.
Code:
1,AA,IND
2,BB,AUS
3,CC,missing
4,DD,missing

Hope this helps, in case this doesn't fit into your requirement please post more sample Input_file with expected outputs with full details of your requirement into code tags.

Thanks,
R. Singh
# 10  
Old 04-08-2016
Hi Ravinder,

I have two files i.e. File1 and File2 with "," as delimiter. They have one column in common(First column) .

File 1

Code:
1,AAA
2,BBB
3,CCC
4,DDD

File 2

Code:
1,IND,SL,BAN
2,AUS,ENG,SA
5,USA,CAN,WI

Now I want to do a left outer join from File1 with File 2.

Below is the expected output:

Code:
1,AAA,IND,SL,BAN
2,BBB,AUS,ENG,SA
3,CCC,,,
4,DDD,,,

Apologies for my changing my requirement from the initial one.
Thanks in advance
# 11  
Old 04-08-2016
Hello venkat_reddy,

Could you please try following and let me know if this helps you.
Code:
awk 'BEGIN{FS=","} FNR==NR{A[$1]=$0;Q=NF;next} ($1 in A){print A[$1];next} !($1 in A){for(j=NF;j<=Q;j++){W=W?W" "FS:FS};print $0 W;W=""}' OFS=, file2 file1

Output will be as follows.
Code:
1,IND,SL,BAN
2,AUS,ENG,SA
3,CC, , ,
4,DD, , ,

In case you doesn't need space between commas,, then following may help you in same.
Code:
awk 'BEGIN{FS=","} FNR==NR{A[$1]=$0;Q=NF;next} ($1 in A){print A[$1];next} !($1 in A){for(j=NF;j<=Q;j++){W=W FS};print $0 W;W=""}' OFS=, file2 file1

Output will be as follows.
Code:
1,IND,SL,BAN
2,AUS,ENG,SA
3,CC,,,
4,DD,,,

Hope this helps you.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 12  
Old 04-08-2016
Hi Ravinder,

That works with whatever sample files provided earlier.However when I tried the code with my actual files, it does not work though. I do see some Weird output .

File 1

Code:
1,P,PP,PPP,20160407,2016-04-07 13:55:00,2016-04-07 19:00:25
2,K,KK,KKK,20160407,2016-04-07 13:59:00,2016-04-07 19:00:25
3,C,CC,CCC,20160407,2016-04-07 23:06:30,2016-04-07 23:10:35
4,L,LL,LLL,20160407,2016-04-07 18:05:00,2016-04-08 00:30:31
5,M,MM,MMM,20160407,2016-04-08 03:08:00,2016-04-08 03:08:48
6,N,NN,NNN,20160407,2016-04-08 00:31:04,2016-04-08 00:31:04
8,O,OO,OOO,20160407,2016-04-07 21:25:00,2016-04-07 23:30:24
9,K,KK,KKK,20160407,2016-04-07 20:13:32,2016-04-07 20:32:35
10,S,SS,SSS,20160407,2016-04-07 04:00:27,2016-04-07 18:25:54
11,T,TT,TTT,20160407,2016-04-07 11:13:47,2016-04-07 18:27:05
12,R,RR,RRR,20160407,2016-04-07 03:36:24,2016-04-07 18:27:39
12,R,RR,RRR,20160407,2016-04-07 16:18:01,2016-04-07 18:27:39

File 2

Code:
1,07-APR-16 07.12.40.839372 PM,PP,1
2,07-APR-16 07.30.14.092718 PM,KK,2
3,07-APR-16 11.21.17.934001 PM,CC,3
4,08-APR-16 12.44.24.451781 AM,LL,4
5,08-APR-16 03.30.41.496570 AM,MM,13
6,08-APR-16 02.56.20.942287 AM,NN,11
7,08-APR-16 02.06.57.768181 AM,XX,6
8,08-APR-16 01.16.25.267521 AM,OO,5
10,08-APR-16 02.21.08.688799 AM,SS,7
11,08-APR-16 04.45.32.112525 AM,TT,14
12,08-APR-16 02.37.31.112826 AM,RR,9

Below is the Code I used:

Code:
awk 'BEGIN{FS=","} FNR==NR{A[$1]=$0;Q=NF;next} ($1 in A){print A[$1];next} !($1 in A){for(j=NF;j<=Q;j++){W=W FS};print $0 W;W=""}' OFS=,  file2 file1

Below is the output:

Code:
1,07-APR-16 07.12.40.839372 PM,PP,1
2,07-APR-16 07.30.14.092718 PM,KK,2
3,07-APR-16 11.21.17.934001 PM,CC,3
4,08-APR-16 12.44.24.451781 AM,LL,4
5,08-APR-16 03.30.41.496570 AM,MM,13
6,08-APR-16 02.56.20.942287 AM,NN,11
8,08-APR-16 01.16.25.267521 AM,OO,5
9,K,KK,KKK,20160407,2016-04-07 20:13:32,2016-04-07 20:32:35
10,08-APR-16 02.21.08.688799 AM,SS,7
11,08-APR-16 04.45.32.112525 AM,TT,14
12,08-APR-16 02.37.31.112826 AM,RR,9
12,08-APR-16 02.37.31.112826 AM,RR,9

Not sure whats causing this.Could be because of the duplicate values in column 1 in File 1 ?
Or is it the with the timestamp values in the data?

Please suggest
# 13  
Old 04-08-2016
Hello venkat_reddy,

You haven't told us in posts like which Input_file has more or less or equal number of fields. You could keep it same pattern(format) as actual one and hide the actual data while posting(sensitive data from Input_file). Could you please try following and let me know if this helps you. Here I am assuming your File1 will have always fields more than File2.
Code:
awk -F"," 'FNR==NR{A[$1]=$0;Q=NF;next} ($1 in A){print A[$1];next} !($1 in A){for(j=NF;j>=Q;j--){W=W FS};print $0 W;W=""}' OFS=,  File2 File1

Output will be as follows.
Code:
1,07-APR-16 07.12.40.839372 PM,PP,1
2,07-APR-16 07.30.14.092718 PM,KK,2
3,07-APR-16 11.21.17.934001 PM,CC,3
4,08-APR-16 12.44.24.451781 AM,LL,4
5,08-APR-16 03.30.41.496570 AM,MM,13
6,08-APR-16 02.56.20.942287 AM,NN,11
8,08-APR-16 01.16.25.267521 AM,OO,5
9,K,KK,KKK,20160407,2016-04-07 20:13:32,2016-04-07 20:32:35,,,,
10,08-APR-16 02.21.08.688799 AM,SS,7
11,08-APR-16 04.45.32.112525 AM,TT,14
12,08-APR-16 02.37.31.112826 AM,RR,9
12,08-APR-16 02.37.31.112826 AM,RR,9

It will print commas, number of times the difference between fields of File1 and File2(eg--> This example above it printed 4 commas, so meaning 4 fields are there in File2(in all the rows) and 7 fields in File1 so there are 3 values missing from FIle1), and this assumption has been made by your very first post. Hope this helps, in case your requirement is different, please do let us know on same then.

Thanks,
R. Singh
# 14  
Old 04-08-2016
Thanks for the response

File1 will have more columns than File2 .Both have various columns with different formats i.e CHAR,DATE,TIMESTAMP etc. Both have "," as delimitiers.

I tried the code given by you but unfortunately it is not the output I'm expecting

Below is the expected output for the same files you used above:

Code:
1,P,PP,PPP,20160407,2016-04-07 13:55:00,2016-04-07 19:00:25,07-APR-16 07.12.40.839372 PM,PP,1
2,K,KK,KKK,20160407,2016-04-07 13:59:00,2016-04-07 19:00:25,07-APR-16 07.30.14.092718 PM,KK,2
3,C,CC,CCC,20160407,2016-04-07 23:06:30,2016-04-07 23:10:35,07-APR-16 11.21.17.934001 PM,CC,3
4,L,LL,LLL,20160407,2016-04-07 18:05:00,2016-04-08 00:30:31,08-APR-16 12.44.24.451781 AM,LL,4
5,M,MM,MMM,20160407,2016-04-08 03:08:00,2016-04-08 03:08:48,08-APR-16 03.30.41.496570 AM,MM,13
6,N,NN,NNN,20160407,2016-04-08 00:31:04,2016-04-08 00:31:04,08-APR-16 02.56.20.942287 AM,NN,11
8,O,OO,OOO,20160407,2016-04-07 21:25:00,2016-04-07 23:30:24,08-APR-16 01.16.25.267521 AM,OO,5
9,K,KK,KKK,20160407,2016-04-07 20:13:32,2016-04-07 20:32:35,,,
10,S,SS,SSS,20160407,2016-04-07 04:00:27,2016-04-07 18:25:54,08-APR-16 02.21.08.688799 AM,SS,7
11,T,TT,TTT,20160407,2016-04-07 11:13:47,2016-04-07 18:27:05,08-APR-16 04.45.32.112525 AM,TT,14
12,R,RR,RRR,20160407,2016-04-07 03:36:24,2016-04-07 18:27:39,08-APR-16 02.37.31.112826 AM,RR,9
12,R,RR,RRR,20160407,2016-04-07 16:18:01,2016-04-07 18:27:39,08-APR-16 02.37.31.112826 AM,RR,9


I need all the data from File 1 along with data from File 2 on matched attribute i.e COLUMN 1 in both files(Left Outer join ). In case of non- matching attribute, I need null values for the columns being extracted in from File 2.


Hope you got my requirement.

Thanks
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Joining Two Files Matching Two Columns

Hi All, I am looking to join two files where column 1 of file A matches with column 1 of file B and column 5 of files A matches with column 2 of file B. After joining the files based on above condition, out should contain entire line of file A and column 3, 4 and 5 of file B. Here is sample... (8 Replies)
Discussion started by: angshuman
8 Replies

2. UNIX for Dummies Questions & Answers

Joining different columns from multiple files

Hello again, I am trying to join 3rd column of 3 files into the end on one file and save it separately... my data looks like this file 1 Bob, Green, 80 Mark, Brown, 70 Tina, Smith, 60 file 2 Bob, Green, 70 Mark, Brown, 60 Tina, Smith, 50 file 3 Bob, Green, 50 Mark, Brown,60 Tina,... (6 Replies)
Discussion started by: A-V
6 Replies

3. Shell Programming and Scripting

Other alternative for joining together columns from multiple files

Hi again, I have monthly one-column files of roughly around 10 years. Is there a more efficient way to concatenate these files column-wise other than using paste command? For instance: file1.txt 12 13 15 12 file2.txt 14 15 18 19 file3.txt 20 21 (8 Replies)
Discussion started by: ida1215
8 Replies

4. Shell Programming and Scripting

NR==FNR trick for joining columns from two files

foo.txt 1 rs2887286 0 1145994 C T 1 rs1240743 0 1323299 C A 1 rs1695824 0 1355433 G T 1 rs3766180 0 1468016 G A 1 rs7519837 0 1500664 A G 1 rs2272908 0 ... (12 Replies)
Discussion started by: genehunter
12 Replies

5. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

6. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

7. Shell Programming and Scripting

Transposing column to row, joining with another file, then sorting columns

Hello! I am very new to Linux and I do not know where to begin... I have a column with >64,000 elements (that are not in numberical order) like this: name 2 5 9 . . . 64,000 I would like to transpose this column into a row that will later become the header of a very large file... (2 Replies)
Discussion started by: doobedoo
2 Replies

8. Shell Programming and Scripting

extracting columns with awk

Friends, I have a file with fileds in the following order sda 4.80 114.12 128.69 978424 1103384 sdb 0.03 0.40 0.00 3431 0 sda 1.00 0.00 88.00 0 176 sdb ... (14 Replies)
Discussion started by: achak01
14 Replies

9. Shell Programming and Scripting

Joining two files based on columns/fields

I've got two files, File1 and File2 File 1 has got combination of col1, col2 and col3 which comes on file2 as well, file2 does not get col4. Now based on col1, col2 and col3, I would like to get col4 from file1 and all the columns from file2 in a new file Any ideas? File1 ------ Col1 col2... (11 Replies)
Discussion started by: rudoraj
11 Replies

10. Shell Programming and Scripting

Joining columns from two files, if the key matches

I am trying to join/paste columns from two files for the rows with matching first field. Any help will be appreciated. Files can not be sorted and may not have all rows in both files. Thanks. File1 aaa 111 bbb 222 ccc 333 File2 aaa sss mmmm ccc kkkk llll ddd xxx yyy Want to... (1 Reply)
Discussion started by: sk_sd
1 Replies
Login or Register to Ask a Question