Merging two files by comparing three fields


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Merging two files by comparing three fields
# 1  
Old 06-06-2009
MySQL Merging two files by comparing three fields

Hi Experts,

I need your timely help. I have a problem with merging two files. Here my situation :

Here I have to compare first three fields from FILE1 with FILE2. If they are equal, I have to append the remaining values from FILE2 with FILE1 to create the output.

FILE1:
Class Category Item Field1 Field2 Field3
Class1 Sports Ball 11 12 13
Class2 Academic Bat 21 22 23
Class3 Academic Pen 31 32 33
Class4 Gift Birthday 41 42 43

FILE2:
Class Category Item Field4 Field5
Class1 Sports Ball 14 15
Class2 Academic Bat 24 25
Class3 Academic Pen 34 35
Class5 Books Maths 54 55

OUTPUT FILE (FILE3) :

Class Category Item Field1 Field2 Field3 Field4 Field5
Class1 Sports Ball 11 12 13 14 15
Class2 Academic Bat 21 22 23 24 25
Class3 Academic Pen 31 32 33 34 35
Class4 Gift Birthday 41 42 43 0 0


Your valuable suggestions are highly appreciated. Thankyou very much.

PS : If it contains only one field to compare from both file, "Join" command would help to create the output.

join -a1 -e "0" FILE1 FILE2

Last edited by Hunter85; 06-06-2009 at 05:12 PM..
# 2  
Old 06-06-2009
One way to do it with perl:

Code:
$
$ cat file1
Class1 Sports Ball 11 12 13
Class2 Academic Bat 21 22 23
Class3 Academic Pen 31 32 33
Class4 Gift Birthday 41 42 43
$
$ cat file2
Class1 Sports Ball 14 15
Class2 Academic Bat 24 25
Class3 Academic Pen 34 35
Class5 Books Maths 54 55
$
$ perl -ne 'BEGIN {open(F,"file2"); while(<F>){split; $x{$_[0].":".$_[1].":".$_[2]}=" $_[3] $_[4]"} close(F)}
>   { chomp; split; $y=$_[0].":".$_[1].":".$_[2]; print $_,defined $x{$y}?$x{$y}:" 0 0","\n" }' file1
Class1 Sports Ball 11 12 13 14 15
Class2 Academic Bat 21 22 23 24 25
Class3 Academic Pen 31 32 33 34 35
Class4 Gift Birthday 41 42 43 0 0
$
$

tyler_durden
# 3  
Old 06-08-2009
Tyler_durden : You are a cool man. Got o/p with small concern.
One doubt...

If the fields contain spaces in their values i,e

if class1 is "Class One"
class 2 is "Class Two-2"

OR Birthday may be "Birth Day"

the script is not working. But I need the script to be working as the same even if the fieldnames contains spaces.

To remove space from the field I may use : sed 's/ //g'

Guide me how to use this inside perl. Thankyou very much.

Last edited by Hunter85; 06-10-2009 at 10:07 AM..
# 4  
Old 06-10-2009
Quote:
Originally Posted by Hunter85
...
If the fields contain spaces in their values i,e

if class1 is "Class One"
class 2 is "Class Two-2"

OR Birthday may be "Birth Day"

the script is not working. But I need the script to be working as the same even if the fieldnames contains spaces.
...
If the field name contains spaces, then they are essentially two fields and not one field. The way the perl script worked earlier was:

(1) Loop through file2 and create a hash with key-value pairs as follows:

Code:
%x = ("Class1 Sports Ball",   "11 12 13",
      "Class2 Academic Bat",  "21 22 23",
      "Class3 Academic Pen",  "31 32 33",
      "Class4 Gift Birthday", "41 42 43");

(2) Now loop through file2, tokenize the input line, form the key using the first 3 tokens, print then entire line and then print the value of the key for this line. If value doesn't exist print " 0 0".

Now, for step (1) above, the hash key is formed using this expression:

Code:
$_[0].":".$_[1].":".$_[2]

So, for this input line of file2:

Code:
Class1 Sports Ball 11 12 13

the values would get assigned as follows:

Code:
$_[0] = "Class1"
$_[1] = "Sports"
$_[2] = "Ball"

But for a line like this in file2:

Code:
Class One Sports Ball 11 12 13

the first three values would get assigned as follows:

Code:
$_[0] = "Class"
$_[1] = "One"
$_[2] = "Sports"

Now, this would work if:

(a) the keys remain unique in both the files, and

(b) you tweak the script so that the key values are:

Code:
" $_[4] $_[5]"

instead of this:

Code:
" $_[3] $_[4]"

(You push the tokens one place while forming the key value because of that extra token.)

Here's the example of the revised code for the revised data:

Code:
$
$ cat file1
Class One Sports Ball 14 15
Class Two Academic Bat 24 25
Class Three Academic Pen 34 35
Class Four Books Maths 54 55
$
$ cat file2
Class One Sports Ball 11 12 13
Class Two Academic Bat 21 22 23
Class Three Academic Pen 31 32 33
Class Four Gift Birthday 41 42 43
$
$ perl -ne 'BEGIN {open(F,"file2");
>                  while(<F>){split; $x{$_[0].":".$_[1].":".$_[2]}=" $_[4] $_[5]"}
>                  close(F)}
>   { chomp; split;
>     $y=$_[0].":".$_[1].":".$_[2]; print $_,defined $x{$y}?$x{$y}:" 0 0","\n"
>   }' file1
Class One Sports Ball 14 15 11 12
Class Two Academic Bat 24 25 21 22
Class Three Academic Pen 34 35 31 32
Class Four Books Maths 54 55 0 0
$
$

Of course, I hope you could see the limitation of this approach.

You must know how many tokens would be created after the split and how they would be divided into keys and values.

As long as you know that, and are able to create keys and values consistently (after splitting), it might work well for your files.

But if you are thinking that that script would work for both these sets of files:

Code:
SET 1:
$
$ cat file1
Class1 Sports Ball 11 12 13
Class2 Academic Bat 21 22 23
Class3 Academic Pen 31 32 33
Class4 Gift Birthday 41 42 43
$
$ cat file2
Class1 Sports Ball 14 15
Class2 Academic Bat 24 25
Class3 Academic Pen 34 35
Class5 Books Maths 54 55
$
$

and

Code:
SET 2:
$
$ cat file1
Class One Sports Ball 14 15
Class Two Academic Bat 24 25
Class Three Academic Pen 34 35
Class Four Books Maths 54 55
$
$ cat file2
Class One Sports Ball 11 12 13
Class Two Academic Bat 21 22 23
Class Three Academic Pen 31 32 33
Class Four Gift Birthday 41 42 43
$

then you are mistaken; because tokens 1, 2 and 3 form the UNIQUE key in the first set of files, whereas tokens 1, 2, 3 and 4 form the UNIQUE key in the second set of files.

For cases like these, you may want to use a regex to split and create key-value pairs for hash.

Hope that helps,
tyler_durden
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Comparing two files by two matching fields

Long time listener first time poster. Hope someone can advise. I have two files, 1000+ lines in each, two fields in each file. After performing a sort, what is the best way to find exact matches where field $1 and $2 in file1 are also present in file2 on the same line, then output only those... (6 Replies)
Discussion started by: bstaff
6 Replies

2. UNIX for Advanced & Expert Users

Need urgent help in comparing two fields in two files

Hi all, I have two files as below. I need to compare field 2 of file 1 against field 1 of file 2 and field 5 of file 1 against filed 2 of file 2. If both matches , then create a result file 1 with first file data and if not matches , then create file with first fie data. Please help me in... (1 Reply)
Discussion started by: sivarajb
1 Replies

3. Shell Programming and Scripting

Comparing two files using four fields

Dear All, I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2... (6 Replies)
Discussion started by: NamS
6 Replies

4. Shell Programming and Scripting

Comparing two files using four fields

I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are... (1 Reply)
Discussion started by: NamS
1 Replies

5. UNIX for Dummies Questions & Answers

Comparing and merging two text files

Hey everybody, I am new here and already a question to ask, I just recently started some bioinformatic work for my PhD so I am slowly learning Anyway, here is my problem, I have two text files, one contains the complete data file with 43000 genes and their read counts for all my samples... (1 Reply)
Discussion started by: ant55
1 Replies

6. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

7. Shell Programming and Scripting

Problem in comparing 2 fields from 2 files

I've 2 files. Need to compare File1.Field1,File1.Field2 with File2.Field1,File2.Field2. If matches then create a new file. File1 10 A|ADB|967143.24|1006101.5 3E HK|DHB|24294.76|242513.89 ABN ACU|ADB|22104.69|51647.14 ABN BU|DBA|39137.14|109128.38 ABN|ADB|64466.89|167936.55 ABOC... (2 Replies)
Discussion started by: buster
2 Replies

8. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

9. Shell Programming and Scripting

Comparing two files and replacing fields

I have two files with ids and email addresses. File 2 cotains a subset of the records in file 1. The key field is the first field containing the id. file 1: 123|myadr@abc.com 456|myadr2@abc.com 789|myadr3@abc.com file 2: 456|adr456@xyz.com Where the record appears in the second... (3 Replies)
Discussion started by: tltroy
3 Replies

10. Shell Programming and Scripting

merging fields from 2 different files.

File 1 3337304 2 4 DH.ER@TORONTO.CA 20080504 04622 3337305 2 4 A@C.COM 20080504 04622 3337306 2 4 JO@NET.NET 20080504 04622 3337307 2 4 L@GMAIL.COM 20080504 05344 2479201 2 2 ORY@YAHOO.COM 20080504 05344 File 2 ... (5 Replies)
Discussion started by: rudoraj
5 Replies
Login or Register to Ask a Question