Joining files using awk not extracting all columns from File 2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Joining files using awk not extracting all columns from File 2
# 15  
Old 04-09-2016
Quote:
Originally Posted by venkat_reddy
... ... ...

Hope you got my requirement.

Thanks
It is hard to get your requirements when you change them every time you add a new post to this thread. And, the requirements still are not clear.

Until post #12 in this thread we didn't know the type of the key field in your input files (and, in post #12, it still isn't stated; we just have to assume that that field is a numeric rather than a string value because the sample data shown is out of order for a string value). Until post #12, we also did not know that the key field in your input files could not only be missing from either file, but can also be duplicated in file 1.

Can key values also be duplicated in file 2?

If there can be duplicates in both files, what output is supposed to be produced? For example, a key exists two times in file 1 and three times in file 2, are there supposed to be six output records (a cross product of the input records) or are the first records with a key from each file matched, the 2nd records with that key matched, and the unmatched record with that key from file 2 written with empty values for the missing file 1 record data?

I think we are all assuming that both input files have been sorted on your key field (and the examples shown support this assumption), but your stated requirements leave this as an assumption; not something that anyone trying to meet your requirements can depend upon.
# 16  
Old 04-10-2016
@Don Cragun:

Sorry for changing the requirement from the initial one. Actually the key values can be duplicate only in file 1 but not file 2. Also the key value in file 1 and file 2 would be sorted numerically. If you are familiar with Database joins, I'm looking for left outer join between File 1 and File 2 but not cross product. Let me give some simple examples that would give you the better understanding of my requirement:

File 1

Code:
EmpID,EmpName,DeptID,EmpSal
1,AAA,10,100
2,BBB,20,200
3,CCC,30,300
3,DDD,40,400
5,EEE,50,500
6,FFF,60,600
8,GGG,70,700

File 2:

Code:
EmpID,EmpDOB,EmpHireDate,Active
1,Jun-04-1986,2012-03-23 12:40:00 PM,Y
2,Apr-12-1991,2010-05-12 08:50:00 PM,N
3,Dec-31-1978,2010-01-08 12:00:00 AM,Y
6,Mar-09-1989,2010-05-08 06:45:00 PM,N

Output:

Code:
EmpID,EmpName,DeptID,EmpSal,EmpID,EmpDOB,EmpHireDate,Active
1,AAA,10,100,Jun-04-1986,2012-03-23 12:40:00 PM,Y
2,BBB,20,200,Apr-12-1991,2010-05-12 08:50:00 PM,N
3,CCC,30,300,Dec-31-1978,2010-01-08 12:00:00 AM,Y
3,DDD,40,400,Dec-31-1978,2010-01-08 12:00:00 AM,Y
5,EEE,50,500,,
6,FFF,60,600,Mar-09-1989,2010-05-08 06:45:00 PM,N
8,GGG,70,700,,

Thanks in advance
# 17  
Old 04-10-2016
OK. So just to be sure I understand:
  1. You have one employee with EmpID 3 who is currently using two names (CCC and DDD), is currently working in two departments (30 and 40), drawing separate salaries from both departments, and was hired for both jobs at midnight on the morning on January 8, 2010 when this person was 21 years old?
  2. And, you have two employees that you are (or, maybe, were) paying relatively high salaries and you don't know when they were born, when they were hired, nor whether or not they are still showing up for work?
# 18  
Old 04-11-2016
Noting that the requested output in post #16 in this thread:
Quote:
Code:
EmpID,EmpName,DeptID,EmpSal,EmpID,EmpDOB,EmpHireDate,Active
1,AAA,10,100,Jun-04-1986,2012-03-23 12:40:00 PM,Y
2,BBB,20,200,Apr-12-1991,2010-05-12 08:50:00 PM,N
3,CCC,30,300,Dec-31-1978,2010-01-08 12:00:00 AM,Y
3,DDD,40,400,Dec-31-1978,2010-01-08 12:00:00 AM,Y
5,EEE,50,500,,
6,FFF,60,600,Mar-09-1989,2010-05-08 06:45:00 PM,N
8,GGG,70,700,,

has eight fields in the header line (including two occurrences of EmpID, seven fields in the lines where the EmpID appears in both input files, and six fields in the lines where the given EmpID does not appear in File 2 and making the not too wild assumptions that EmpID should only appear in the output header line once and that all output lines should contain seven fields; you might want to try something like:
Code:
awk '
BEGIN {	FS = OFS = ","
}
NR == 1 {
	for(i = 2; i <= NF; i++)
		nm = nm OFS
}
FNR == NR {
	id = $1
	$1 = ""
	d[id] = $0
	next
}
{	print $0 (($1 in d) ? d[$1] : nm)
}' "File 2" "File 1"

which produces the output:
Code:
EmpID,EmpName,DeptID,EmpSal,EmpDOB,EmpHireDate,Active
1,AAA,10,100,Jun-04-1986,2012-03-23 12:40:00 PM,Y
2,BBB,20,200,Apr-12-1991,2010-05-12 08:50:00 PM,N
3,CCC,30,300,Dec-31-1978,2010-01-08 12:00:00 AM,Y
3,DDD,40,400,Dec-31-1978,2010-01-08 12:00:00 AM,Y
5,EEE,50,500,,,
6,FFF,60,600,Mar-09-1989,2010-05-08 06:45:00 PM,N
8,GGG,70,700,,,

from your two sample input files. Is this reasonably close to the output you want.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
# 19  
Old 04-12-2016
@Don Cragun:

The header should have 7 Columns. Forgot to remove EMPID in the Column 5. Header should be as below:

Code:
EmpID,EmpName,DeptID,EmpSal,EmpDOB,EmpHireDate,Active

Sincere apologies for causing some confusion around this.

Anyway, I have executed the code given by you and the Output is in line with what I'm expecting.

Thanks much for the help even though my requirements were not communicated clearly

Last edited by venkat_reddy; 04-12-2016 at 01:01 PM.. Reason: forgot code brackets
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Joining Two Files Matching Two Columns

Hi All, I am looking to join two files where column 1 of file A matches with column 1 of file B and column 5 of files A matches with column 2 of file B. After joining the files based on above condition, out should contain entire line of file A and column 3, 4 and 5 of file B. Here is sample... (8 Replies)
Discussion started by: angshuman
8 Replies

2. UNIX for Dummies Questions & Answers

Joining different columns from multiple files

Hello again, I am trying to join 3rd column of 3 files into the end on one file and save it separately... my data looks like this file 1 Bob, Green, 80 Mark, Brown, 70 Tina, Smith, 60 file 2 Bob, Green, 70 Mark, Brown, 60 Tina, Smith, 50 file 3 Bob, Green, 50 Mark, Brown,60 Tina,... (6 Replies)
Discussion started by: A-V
6 Replies

3. Shell Programming and Scripting

Other alternative for joining together columns from multiple files

Hi again, I have monthly one-column files of roughly around 10 years. Is there a more efficient way to concatenate these files column-wise other than using paste command? For instance: file1.txt 12 13 15 12 file2.txt 14 15 18 19 file3.txt 20 21 (8 Replies)
Discussion started by: ida1215
8 Replies

4. Shell Programming and Scripting

NR==FNR trick for joining columns from two files

foo.txt 1 rs2887286 0 1145994 C T 1 rs1240743 0 1323299 C A 1 rs1695824 0 1355433 G T 1 rs3766180 0 1468016 G A 1 rs7519837 0 1500664 A G 1 rs2272908 0 ... (12 Replies)
Discussion started by: genehunter
12 Replies

5. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

6. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

7. Shell Programming and Scripting

Transposing column to row, joining with another file, then sorting columns

Hello! I am very new to Linux and I do not know where to begin... I have a column with >64,000 elements (that are not in numberical order) like this: name 2 5 9 . . . 64,000 I would like to transpose this column into a row that will later become the header of a very large file... (2 Replies)
Discussion started by: doobedoo
2 Replies

8. Shell Programming and Scripting

extracting columns with awk

Friends, I have a file with fileds in the following order sda 4.80 114.12 128.69 978424 1103384 sdb 0.03 0.40 0.00 3431 0 sda 1.00 0.00 88.00 0 176 sdb ... (14 Replies)
Discussion started by: achak01
14 Replies

9. Shell Programming and Scripting

Joining two files based on columns/fields

I've got two files, File1 and File2 File 1 has got combination of col1, col2 and col3 which comes on file2 as well, file2 does not get col4. Now based on col1, col2 and col3, I would like to get col4 from file1 and all the columns from file2 in a new file Any ideas? File1 ------ Col1 col2... (11 Replies)
Discussion started by: rudoraj
11 Replies

10. Shell Programming and Scripting

Joining columns from two files, if the key matches

I am trying to join/paste columns from two files for the rows with matching first field. Any help will be appreciated. Files can not be sorted and may not have all rows in both files. Thanks. File1 aaa 111 bbb 222 ccc 333 File2 aaa sss mmmm ccc kkkk llll ddd xxx yyy Want to... (1 Reply)
Discussion started by: sk_sd
1 Replies
Login or Register to Ask a Question