Compare two csv files by two colums and create third file combining data from them.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compare two csv files by two colums and create third file combining data from them.
# 1  
Old 02-15-2009
Question Compare two csv files by two colums and create third file combining data from them.

I've got two large csv text table files with different number of columns each.
I have to compare them based on first two columns and create resulting file
that would in case of matched first two columns include all values from first one and all values (except first two colums) from second one. I searched forums and as well google - and was unable to find solution for that issue.

file1 example:

123 234 678 974 943
345 55 567 21 357
456 732 583 61 75

file2 example:

123 234 582 699 23 11 935
22 467 215 376 87 54
456 732 32 14

Expected resulting file:

123 234 678 974 943 582 699 23 11 935
456 732 583 61 75 32 14

just to compare I've been trying to use following code in awk, that I found in google:

awk 'NR==FNR {++a[$1,$2];next} a[$1,$2]' file1 file2 > file3

but I am stuck with understanding how to add data from second file for matched line... Smilie
# 2  
Old 02-15-2009
You can work on something like this:

Code:
awk  'NR==FNR { a[$1 FS $2]=$0; next } $1 FS $2 in a { printf a[$1 FS $2]; sub($1 FS $2,""); print }' file1 file2 > file3

# 3  
Old 02-15-2009
Code:
nawk '{
if (NR==FNR)
	_[$1$2]=$0
else
{
	if (_[$1$2]!=""){
		printf("%s",_[$1$2])
		for(i=3;i<=NF;i++)
			printf(" %s",$i)
		printf "\n"
	}
}
}' file1 file2


Last edited by Franklin52; 02-16-2009 at 10:40 AM.. Reason: adding code tags
# 4  
Old 02-16-2009
rubin, summer_cherry,

Thank you very much both Smilie - your code help me to resolve issue that I had. If it is possible - could you please comment your code - to get better understanding how this script works... I am still learning awk.
# 5  
Old 02-16-2009
Code:
NR==FNR {         # while in the first file, 
a[$1 FS $2]=$0    #+load array a indexed by first field, space ( default FS ), second field;  
next              #+then jump to the next record, without testing the following actions for the current record.  }
$1 FS $2 in a {   # Here starts the processing of file2. If field 1, space, and field 2 are found as indexes in array a, 
printf a[$1FS$2]  #+print the whole record ($0) of file 1 saved above in the array a ; 
sub($1 FS $2,"")  # substitute field 1, field 2 ( including the space in between ) with null string ;
print             # print the whole current record ( what's left ) of the second file.  }

# 6  
Old 02-16-2009
Hi.

You can also combine the first 2 fields to treat them as a single entity, use join (a sort is probably necessary), finally divide the first field into 2:
Code:
#!/usr/bin/env bash

# @(#) s2       Demonstrate join on 2 adjacent fields.

echo
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
echo "(Versions displayed with local utility \"version\")"
version >/dev/null 2>&1 && version "=o" $(_eat $0 $1) sed join
set -o nounset
echo

FILE1=data1
FILE2=data2

echo " Data file $FILE1:"
cat $FILE1

echo
echo " Data file $FILE2:"
cat $FILE2

# Pre-process input / sort, join / post-process output.
echo
echo " Results:"
sed 's/ /_/' $FILE1 | sort -k1,1 >t1
sed 's/ /_/' $FILE2 | sort -k1,1 >t2

join -j 1 t1 t2 | sed 's/_/ /'

echo
echo " Expected output:"
cat expected-output.txt

exit 0

Producing:
Code:
$ ./s2

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: SunOS, 5.10, i86pc
GNU bash 3.00.16
sed - ( /usr/xpg4/bin/sed Aug 9 2005 )
join - ( /usr/bin/join Jan 22 2005 )

 Data file data1:
123 234 678 974 943
345 55 567 21 357
456 732 583 61 75

 Data file data2:
123 234 582 699 23 11 935
22 467 215 376 87 54
456 732 32 14

 Results:
123 234 678 974 943 582 699 23 11 935
456 732 583 61 75 32 14

 Expected output:
123 234 678 974 943 582 699 23 11 935
456 732 583 61 75 32 14

cheers, drl
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to compare 2 files and create a result file with unmatched lines from first file.?

HI, I have 2 text files. file1 and file2. file1.txt (There are no duplicates in this file) 1234 3232 4343 3435 6564 6767 1213 file2.txt 1234,wq,wewe,qwqw 1234,as,dfdf,dfdf 4343,asas,sdds,dsds 6767,asas,fdfd,fdffd I need to search each number in file1.txt in file2.txt's 1st... (6 Replies)
Discussion started by: Little
6 Replies

2. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

3. Shell Programming and Scripting

Compare 2 colums in two files

Hi , I am trying to write a part in shell script. I have two files with one column each and would like to output it to a new file giving the records which are different.Pls help experts. File1 Column name 11 12 13 14 18 File2 Column name 11 12 14 17 19 (2 Replies)
Discussion started by: Rossdba
2 Replies

4. UNIX for Dummies Questions & Answers

How to create a .csv file from 2 different .txt files?

Hi, I need to create a .csv file from information that i have in two different tab delimited .txt file. I just want to select some of the columns of each .txt file and paste them into a .cvs file. My files look like: File 1 transcript_id Seq. Description Seq. Length ... (2 Replies)
Discussion started by: alisrpp
2 Replies

5. Shell Programming and Scripting

Compare 2 csv files in ksh and o/p the difference in a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 The... (7 Replies)
Discussion started by: Naresh101
7 Replies

6. Shell Programming and Scripting

Compare two CSV files and put the difference in third file with line no,field no and diff value.

I am having two csv files i need to compare these files and the output file should have the information of the differences at the field level. For Example, File 1: A,B,C,D,E,F 1,2,3,4,5,6 File 2: A,C,B,D,E,F 1,2,4,5,5,6 out put file: (12 Replies)
Discussion started by: karingulanagara
12 Replies

7. UNIX for Dummies Questions & Answers

Need help combining txt files w/ multiple lines into csv single cell - also need data merge

:confused:Hello -- i just joined the forums. I am a complete noob -- only about 1 week into learning how to program anything... and starting with linux. I am working in Linux terminal. I have a folder with a bunch of txt files. Each file has several lines of html code. I want to combine... (2 Replies)
Discussion started by: jetsetter
2 Replies

8. Shell Programming and Scripting

How to create a CSV File by reading fields from separate files

SHELL SCRIPT Hi, I have 3 separate files within a folder. Every File contains data in a single column like File1 contains data mayank sushant dheeraj File2 contains DSA_AT MG_AT FLAT_09 File3 contains data 123123 232323 (2 Replies)
Discussion started by: mayanksargoch
2 Replies

9. Shell Programming and Scripting

Merging files to create CSV file

Hi, I have different files of the same type, as: Time: 100 snr: 88 perf: 10 other: 222 Each of these files are created periodically. What I need to do is to merge all of them into one but having the following form: (2 Replies)
Discussion started by: Ravendark
2 Replies

10. Shell Programming and Scripting

Need to compare two csv files values and write into another csv file

Hi all, Am new to scripting. So i just need your ideas to help me out. Here goes my requirement. I have two csv files 1.csv 2.csv abc,1.24 abc,1 def,2.13 def,1 I need to compare the first column of 1.csv with 2.csv and if matches then need to compare... (2 Replies)
Discussion started by: chinnahyd
2 Replies
Login or Register to Ask a Question