combining 2 files with more than one match in second file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting combining 2 files with more than one match in second file
# 1  
Old 12-04-2007
combining 2 files with more than one match in second file

Hello,

I am attempting to combine two files where the second file can have more than one match with the lookup field (node) in the first file, onto one line of the output file. Also alerting if a lookup was not found in file2

=-=-=-=-=-=-=
Example of file1
node,type
=-=-=-=-=-=-=
bob,232
fred,202
jim,202
pat,232
sam,232
ted,118
tim,118

......
(up to around 15,000 lines)


=-=-=-=-=-=-=-=
Example of file2
node,serialnumber
=-=-=-=-=-=-=-=

alan,TEW9873FG
bob,XVT81283WS
fred,YHW81342BB
fred,YHW83281BA
fred,YHW98746CR
jake,QWE9896AQ
pat,LKJ55432GH
sam,QER8984UH
sam,QER6536WE
sam,QER2343ET
sid,ASW9093TW
ted,OPW4324TY
val,RWT6492OK
.....
(up to around 20,000 lines)

=-=-=-=-=-=-=-=-=
Output file required
=-=-=-=-=-=-=-=-=
bob,232,XVT81283WS
fred,202,YHW81342BB;YHW83281BA;YHW98746CR
jim,202,NO_SERIAL_FOUND
pat,232,LKJ55432GH
sam,232,QER8984UH;QER6536WE;QER2343ET
ted,118,OPW4324TY
tim,118,NO_SERIAL_FOUND

I have managed to combine lines using two read file loops in ksh but this will only give me a single line match between file1 file2. Also tried a
join -t',' which will give me results of nodes with more than one serial on a newline per serial (and sometimes not able find some of the matches at all)

Any advice how to tackle this would much appreciated,
Kind rgds,
John.
# 2  
Old 12-04-2007
Try....
Code:
$ head file?
==> file1 <==
bob,232
fred,202
jim,202
pat,232
sam,232
ted,118
tim,118

==> file2 <==
alan,TEW9873FG
bob,XVT81283WS
fred,YHW81342BB
fred,YHW83281BA
fred,YHW98746CR
jake,QWE9896AQ
pat,LKJ55432GH
sam,QER8984UH
sam,QER6536WE
sam,QER2343ET

$ awk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=(a[$1]?a[$1] ";":"") $2;next}$3=(a[$1]?a[$1]:"NO_SERIAL_FOUND")' file2 file1
bob,232,XVT81283WS
fred,202,YHW81342BB;YHW83281BA;YHW98746CR
jim,202,NO_SERIAL_FOUND
pat,232,LKJ55432GH
sam,232,QER8984UH;QER6536WE;QER2343ET
ted,118,OPW4324TY
tim,118,NO_SERIAL_FOUND

# 3  
Old 12-04-2007
awk

hi,

Up to now, it seems the most difficult one for me. And it really took me much time to find out the solution. Hope this can help you.

code:
Code:
nawk 'BEGIN{
FS=","
n=1
}

function isExist(k)
{
	flag=0
	for ( i in name)
	{	if(name[i]==k)
			flag=i
	}
	return flag
}

{
if (NR==FNR)
{
	name[n]=$1
	con[n]=$2
	n++
}
else
{
	if(isExist($1)!=0)
		con[isExist($1)]=sprintf("%s,%s",con[isExist($1)],$2)
}
}
END{
for (i=0;i<n;i++)
if (index(con[i],",")!=0)
	print name[i]","con[i]
else
	print name[i]","con[i]",NO_SERIAL_FOUND"
}' file1 file2

# 4  
Old 12-04-2007
Here goes using awk:

File1:
Quote:
$ cat f1.txt
bob,232
fred,202
jim,202
pat,232
sam,232
ted,118
tim,118
File2:
Quote:
$ cat f2.txt
ialan,TEW9873FG
bob,XVT81283WS
fred,YHW81342BB
fred,YHW83281BA
fred,YHW98746CR
jake,QWE9896AQ
pat,LKJ55432GH
sam,QER8984UH
sam,QER6536WE
sam,QER2343ET
sid,ASW9093TW
ted,OPW4324TY
val,RWT6492OK
Script:
Quote:
$ cat merge.awk
BEGIN {
first=FILENAME
while (getline > 0) {
split($0, a, ",");
x=a[1]; y=a[2]
if (FILENAME == first) {
arr[x] = $0
} else {
if (brr[x] == "") {
brr[x] = y
} else {
brr[x] = brr[x]";"y
}
}
}
}

END {
for (i in arr) {
if(brr[i] == "") {
xrr[i] = arr[i]",NO_SERIAL_FOUND"
} else {
xrr[i] = arr[i]","brr[i]
}
print xrr[i]
}
}
Output:
Quote:
$ awk -f merge.awk f1.txt f2.txt
bob,232,XVT81283WS
ted,118,OPW4324TY
fred,202,YHW81342BB;YHW83281BA;YHW98746CR
jim,202,NO_SERIAL_FOUND
sam,232,QER8984UH;QER6536WE;QER2343ET
pat,232,LKJ55432GH
tim,118,NO_SERIAL_FOUND
HTH
# 5  
Old 12-04-2007
Truely Awsome!!!

Ygor and summer_cherry

Wow that works like a treat!

How does it work with such a condensed amount of awk? I was tying myself in knots with all those nested read file loops in ksh.

As I'm going to need to reuse this output file and combine it with a 3rd 4th ...6th file etc

ie.
Now I have

=-=-=-=-=
Output file
=-=-=-=-=
bob,232,XVT81283WS
fred,202,YHW81342BB;YHW83281BA;YHW98746CR
jim,202,NO_SERIAL_FOUND
pat,232,LKJ55432GH
sam,232,QER8984UH;QER6536WE;QER2343ET
ted,118,OPW4324TY
tim,118,NO_SERIAL_FOUND

=-=-=-=-=-=-=-=-=
and input file
node,mac_address
=-=-=-=-=-=-=-=-=
bob,0845.5632.7e7a
bob,0845.5632.7e85
jake,9853.bb43.c823
jim,no mac address
sam,ea4c.512b.3462
ted,2782.34ac.eb21
tim,904b.0042.ad04
tim,904b.0042.ada1
tim,904b.0042.adc3

=-=-=-=-=-=-=-=-=
Combined output file
=-=-=-=-=-=-=-=-=
bob,232,XVT81283WS,0845.5632.7e7a;0845.5632.7e85
fred,202,YHW81342BB;YHW83281BA;YHW98746CR,NO_MAC_FOUND
jim,202,NO_SERIAL_FOUND,no mac address
pat,232,LKJ55432GH,NO_MAC_FOUND
sam,232,QER8984UH;QER6536WE;QER2343ET,ea4c.512b.3462
ted,118,OPW4324TY,2782.34ac.eb21
tim,118,NO_SERIAL_FOUND,904b.0042.ad04;904b.0042.ada1;904b.0042.adc3

Many thanks for all your help on this, I will continue try and decipher the wizardry of the awk in your replies.
Kind rgds,
John.
# 6  
Old 12-04-2007
Might be simpler in 2 steps

Hello
Just thinking I may well be now making this unnecessarily complicated as I could combine.

file1 with file2 to output_file1

file1 with file3 to output_file2

file1 with file4 to output_file3

then join output_file1 to output_file2 etc....

will need to understand syntax in awk though

nawk 'BEGIN{FS=OFS=","}NR==FNR{a[$1]=(a[$1]?a[$1] ";":"") $2;next}$3=(a[$1]?a[$1]:"NO_SERIAL_FOUND")'

Again many thanks to all of you who have been looking into this Smilie
Kind rgds,
John.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Combining certain columns of multiple files into one file

Hello Unix gurus, I have a large number of files (say X) each containing two columns of data and the same number of rows. I would like to combine these files to create a unique merged file containing X columns corresponding to the second column of each file (with a bonus of having the first... (3 Replies)
Discussion started by: ksennin
3 Replies

2. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

3. Shell Programming and Scripting

Combining files(every 15 min) as one file(hourly)

Hello, My system is generating two files every 15 minutes and file names are given automatically as below. (98,99,89,90 are the sequence numbers) File1_09242013131016_000000098 File1_09242013131516_000000099 File2_09242013124212_000000089 File2_09242013124712_000000090 I want to combine... (6 Replies)
Discussion started by: phoenex11
6 Replies

4. Shell Programming and Scripting

Combining columns from multiple files into one single output file

Hi, I have 3 files with one column value as shown File: a.txt ------------ Data_a1 Data_a2 File2: b.txt ------------ Data_b1 Data_b2 Data_b3 Data_b4 File3: c.txt ------------ Data_c1 Data_c2 Data_c3 Data_c4 Data_c5 (6 Replies)
Discussion started by: vfrg
6 Replies

5. Shell Programming and Scripting

Combining multiple column files into one with file name as first row

Hello All, I have several column files like this $cat a_b_s1.xls 1wert 2tg 3asd 4asdf 5asdf $cat c_d_s2.xls 1wert 2tg 3asd 4asdf 5asdf desired put put $cat combined.txt s1 s2 (2 Replies)
Discussion started by: avatar_007
2 Replies

6. UNIX for Dummies Questions & Answers

Need Help in reading N days files from a Directory & combining the files

Hi All, Request your expertise in tackling one requirement in my project,(i dont have much expertise in Shell Scripting). The requirement is as below, 1) We store the last run date of a process in a file. When the batch run the next time, it should read this file, get the last run date from... (1 Reply)
Discussion started by: dsfreddie
1 Replies

7. Shell Programming and Scripting

Combining columns from multiple files to one file

I'm trying to combine colums from multiple file to a single file but having some issues, appreciate your help. The filenames are the same except for the extension, path1.m0 --------- a b c d e f g h i path1.m1 --------- m n o p q r s t u File names are path1.m The... (3 Replies)
Discussion started by: rkmca
3 Replies

8. UNIX for Dummies Questions & Answers

Assistance with combining, sorting and saving multi files into one new file

Good morning. I have a piece of code that is currently taking multiple files and using the CAT.exe command to combine into one file that is then sorted in reverse order based on the 3rd field of the file, then displayed on screen. I am trying to change this so that the files are being combined into... (4 Replies)
Discussion started by: jaacmmason
4 Replies

9. UNIX for Dummies Questions & Answers

Combining lines of files to new file

Hi everybody. I have a number of files that i would like to combine. however not concatenating, but rather extract lines from the files. Example: File1 ------ File2 ------File3 ... line11 ---- line21 ---- line31 ... line12 ---- line22 ---- line32 ... line13 ... (3 Replies)
Discussion started by: kabbo
3 Replies

10. Shell Programming and Scripting

Compare two csv files by two colums and create third file combining data from them.

I've got two large csv text table files with different number of columns each. I have to compare them based on first two columns and create resulting file that would in case of matched first two columns include all values from first one and all values (except first two colums) from second one. I... (5 Replies)
Discussion started by: agb2008
5 Replies
Login or Register to Ask a Question