Merge two csv files using column name

07-24-2015

Registered User

11, 0

Join Date: Jun 2014

Last Activity: 11 January 2018, 9:36 PM EST

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

Merge two csv files using column name

Hi all,

I have two separate csv files(comma delimited) file 1 and file 2.

File 1 contains

Code:

PAN,NAME,Salary
AAAAA5467D,Raj,50000
AAFAC5467D,Ram,60000
BDCFA5677D,Kumar,90000

File 2 contains

Code:

PAN,NAME,Dept,Salary
ASDFG6756T,Karthik,ABC,450000
QWERT8765Y,JAX,CDR,780000
MKIOU8712T,Swathi,TRY,87900

I need to join these two files based on the column name(i.e) using PAN,NAME etc..and if extra columns is available it needs to be created in the output file.
My output needs to be like this,

File 3

Code:

PAN,NAME,DEPT,Salary
AAAAA5467D.Raj,,50000
AAFAC5467D.Ram,,60000
BDCFA5677D.Kumar,,90000
ASDFG6756T,Karthik,ABC,450000
QWERT8765Y,JAX,CDR,780000
MKIOU8712T,Swathi,TRY,87900

Please help me with the unix command..Thanks in advance

Last edited by Don Cragun; 07-25-2015 at 12:18 AM.. Reason: Add CODE tags, again.

Nivas

View Public Profile for Nivas

Find all posts by Nivas

07-25-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

What have you tried?

What operating system and shell are you using?

What tools do you want to use?

If File 1 contained the additional line:

Code:

MKIOU8712T,Swathi,80000

how would that be combined with the line:

Code:

MKIOU8712T,Swathi,TRY,87900

from File 2?

Is there some field that will always be present in every input file?

Is there some field or combination of fields that is to be used a way to match lines from two files? Or, all all lines from all input files to be considered independent from all other input files and the only change in the output from the input lines is to add empty fields to lines from input files that don't have one or more of the headings from the combined list of input files?

Are you always processing two input files to produce one output file, or can there be more than two input files?

What determines the order of the fields in the output file? If you're processing File 1 and File 2 in that order, why couldn't the output field order be:

Code:

PAN,NAME,Salary,Dept

instead of:

Code:

PAN,NAME,DEPT,Salary

and why did the heading you showed in File 3 change Dept (from File 2) to DEPT?

If one input file has the heading line: a,b,c,d and another input file has the heading line: D,C,A,B, are there supposed to be four fields in the output file or eight? (I.e., are heading labels case sensitive?)

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

07-25-2015

Registered User

11, 0

Join Date: Jun 2014

Last Activity: 11 January 2018, 9:36 PM EST

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

Don,

I am using HP unix operating system..In my two csv files, the values exist in file 1 will not exist in file 2..Both are independent.

I want to merge these two files in a single file..but have to join between these files using column name.. I have to achieve like full outer join in rdbms.

But I have to achieve using shell scripts..

I need only the four fields in the output not eight fields and there are only two input files.

The output can be in any order even like this

Code:
PAN,NAME,DEPT,Salary..

If you require more inputs, please let me know.
Thanks in advance

Nivas

View Public Profile for Nivas

Find all posts by Nivas

07-25-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Cute puzzle... Note that the output you said you wanted uses an output field separator of period (instead of comma) between the 1st two output fields for lines read from File 1. I assume that was unintentional and that you really wanted commas separating all output fields. To make processing easier, since you said that field headers are to be treated as case insensitive, the following code prints all output file headings using upper case letters.

Note that although this was tested with two input files, it will work with zero or more input files. And, to make your life easier, I strongly encourage you to avoid putting spaces in filenames.

You could try something like:

Code:

awk '
BEGIN {	FS = OFS = ","
	for(i = 1; i < ARGC; i++) {
		if((getline < ARGV[i]) > 0) {
			for(j = 1; j <= NF; j++) {
				$j = toupper($j)
				if(!($j in hdr)) {
					order[++hc] = hdr[$j] = $j
					printf("%s%s", fs, $j)
					fs = OFS
				}
			}
			close(ARGV[i])
		}
	}
	print ""
}
FNR == 1 {
	for(i = 1; i <= NF; i++)
		$i = toupper($i)
	for(i = 1; i <= hc; i++) {
		for(j = 1; j <= NF; j++)
			if($j == order[i])
				break
		out[i] = j
	}
	next
}
{	for(i = 1; i <= hc; i++)
		printf("%s%s", $out[i], i != hc ? OFS : "\n")
}' "File 1" "File 2"

With your sample input files, it produces the output:

Code:

PAN,NAME,SALARY,DEPT
AAAAA5467D,Raj,50000,
AAFAC5467D,Ram,60000,
BDCFA5677D,Kumar,90000,
ASDFG6756T,Karthik,450000,ABC
QWERT8765Y,JAX,780000,CDR
MKIOU8712T,Swathi,87900,TRY

If you reverse the order of the input files you feed this program, you would get the output:

Code:

PAN,NAME,DEPT,SALARY
ASDFG6756T,Karthik,ABC,450000
QWERT8765Y,JAX,CDR,780000
MKIOU8712T,Swathi,TRY,87900
AAAAA5467D,Raj,,50000
AAFAC5467D,Ram,,60000
BDCFA5677D,Kumar,,90000

If someone else wants to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

07-25-2015

Registered User

11, 0

Join Date: Jun 2014

Last Activity: 11 January 2018, 9:36 PM EST

Posts: 11

Thanks Given: 0

Thanked 0 Times in 0 Posts

Don,

Thank you so much for your response...But I am getting an output like this

Code:

PAN,NAME,SALARY
       SALARY,DEPT

AAAAA5467D,Raj,50000
		
AAFAC5467D,Ram,60000
		
BDCFA5677D,Kumar,90000

ASDFG6756T,Karthik,,450000,ABC
QWERT8765Y,JAX,,780000,CDR
MKIOU8712T,Swathi,,87900,TRY

Awaiting for your response

Moderator's Comments:

PLEASE use CODE tags when displaying all sample input, output, and code samples.

Last edited by Don Cragun; 07-25-2015 at 04:10 PM.. Reason: Add CODE tags, again.

Nivas

View Public Profile for Nivas

Find all posts by Nivas

07-25-2015

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

The obvious guess would be that one or both of your input files is in DOS format (with <carriage-return><line-feed> line terminators) instead of UNIX format (with <newline> line terminators). To confirm, show us the output from the command:

Code:

for f in "File 1" "File 2"; do echo "File: $f";head -n 2 "$f"|od -cb;done

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

UNIX for Dummies Questions & Answers

Merge two csv files using column name

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Merge the three csv files as one according to first coloumn.

Discussion started by: Raghuram717

2. Shell Programming and Scripting

Need awk or Shell script to compare Column-1 of two different CSV files and print if column-1 matche

Discussion started by: TestPractice

3. Shell Programming and Scripting

Compare and merge two big CSV files

Discussion started by: SirMannu

4. Shell Programming and Scripting

I am trying to merge all csv files from source path into 1 file

Discussion started by: cplusplus1

5. Shell Programming and Scripting

Merge CSV files

Discussion started by: rk4k

6. Shell Programming and Scripting

Merge *.csv files, each in separate sheets

Discussion started by: frhling

7. UNIX for Advanced & Expert Users

merge two column multiple files into one

Discussion started by: mary271

8. Shell Programming and Scripting

Merge CSV files and create a column with the filename from the original file

Discussion started by: fransanchezoria

9. Shell Programming and Scripting

Merge 2 CSV files using sed

Discussion started by: NewToSed

10. Shell Programming and Scripting

Merge 2 csv files with awk

Discussion started by: loloAix