Sponsored Content
Top Forums Shell Programming and Scripting Merging two files by comparing three fields Post 302324437 by durden_tyler on Wednesday 10th of June 2009 05:18:57 PM
Old 06-10-2009
Quote:
Originally Posted by Hunter85
...
If the fields contain spaces in their values i,e

if class1 is "Class One"
class 2 is "Class Two-2"

OR Birthday may be "Birth Day"

the script is not working. But I need the script to be working as the same even if the fieldnames contains spaces.
...
If the field name contains spaces, then they are essentially two fields and not one field. The way the perl script worked earlier was:

(1) Loop through file2 and create a hash with key-value pairs as follows:

Code:
%x = ("Class1 Sports Ball",   "11 12 13",
      "Class2 Academic Bat",  "21 22 23",
      "Class3 Academic Pen",  "31 32 33",
      "Class4 Gift Birthday", "41 42 43");

(2) Now loop through file2, tokenize the input line, form the key using the first 3 tokens, print then entire line and then print the value of the key for this line. If value doesn't exist print " 0 0".

Now, for step (1) above, the hash key is formed using this expression:

Code:
$_[0].":".$_[1].":".$_[2]

So, for this input line of file2:

Code:
Class1 Sports Ball 11 12 13

the values would get assigned as follows:

Code:
$_[0] = "Class1"
$_[1] = "Sports"
$_[2] = "Ball"

But for a line like this in file2:

Code:
Class One Sports Ball 11 12 13

the first three values would get assigned as follows:

Code:
$_[0] = "Class"
$_[1] = "One"
$_[2] = "Sports"

Now, this would work if:

(a) the keys remain unique in both the files, and

(b) you tweak the script so that the key values are:

Code:
" $_[4] $_[5]"

instead of this:

Code:
" $_[3] $_[4]"

(You push the tokens one place while forming the key value because of that extra token.)

Here's the example of the revised code for the revised data:

Code:
$
$ cat file1
Class One Sports Ball 14 15
Class Two Academic Bat 24 25
Class Three Academic Pen 34 35
Class Four Books Maths 54 55
$
$ cat file2
Class One Sports Ball 11 12 13
Class Two Academic Bat 21 22 23
Class Three Academic Pen 31 32 33
Class Four Gift Birthday 41 42 43
$
$ perl -ne 'BEGIN {open(F,"file2");
>                  while(<F>){split; $x{$_[0].":".$_[1].":".$_[2]}=" $_[4] $_[5]"}
>                  close(F)}
>   { chomp; split;
>     $y=$_[0].":".$_[1].":".$_[2]; print $_,defined $x{$y}?$x{$y}:" 0 0","\n"
>   }' file1
Class One Sports Ball 14 15 11 12
Class Two Academic Bat 24 25 21 22
Class Three Academic Pen 34 35 31 32
Class Four Books Maths 54 55 0 0
$
$

Of course, I hope you could see the limitation of this approach.

You must know how many tokens would be created after the split and how they would be divided into keys and values.

As long as you know that, and are able to create keys and values consistently (after splitting), it might work well for your files.

But if you are thinking that that script would work for both these sets of files:

Code:
SET 1:
$
$ cat file1
Class1 Sports Ball 11 12 13
Class2 Academic Bat 21 22 23
Class3 Academic Pen 31 32 33
Class4 Gift Birthday 41 42 43
$
$ cat file2
Class1 Sports Ball 14 15
Class2 Academic Bat 24 25
Class3 Academic Pen 34 35
Class5 Books Maths 54 55
$
$

and

Code:
SET 2:
$
$ cat file1
Class One Sports Ball 14 15
Class Two Academic Bat 24 25
Class Three Academic Pen 34 35
Class Four Books Maths 54 55
$
$ cat file2
Class One Sports Ball 11 12 13
Class Two Academic Bat 21 22 23
Class Three Academic Pen 31 32 33
Class Four Gift Birthday 41 42 43
$

then you are mistaken; because tokens 1, 2 and 3 form the UNIQUE key in the first set of files, whereas tokens 1, 2, 3 and 4 form the UNIQUE key in the second set of files.

For cases like these, you may want to use a regex to split and create key-value pairs for hash.

Hope that helps,
tyler_durden
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

merging fields from 2 different files.

File 1 3337304 2 4 DH.ER@TORONTO.CA 20080504 04622 3337305 2 4 A@C.COM 20080504 04622 3337306 2 4 JO@NET.NET 20080504 04622 3337307 2 4 L@GMAIL.COM 20080504 05344 2479201 2 2 ORY@YAHOO.COM 20080504 05344 File 2 ... (5 Replies)
Discussion started by: rudoraj
5 Replies

2. Shell Programming and Scripting

Comparing two files and replacing fields

I have two files with ids and email addresses. File 2 cotains a subset of the records in file 1. The key field is the first field containing the id. file 1: 123|myadr@abc.com 456|myadr2@abc.com 789|myadr3@abc.com file 2: 456|adr456@xyz.com Where the record appears in the second... (3 Replies)
Discussion started by: tltroy
3 Replies

3. Shell Programming and Scripting

Comparing fields in two files

Hi, i want to compare two files by one field say $3 in file1 needs to compare with $2 in file2. sample file1 - reqd_charge_code 2263881188,24570896,439 2263881964,24339077,439 2263883220,22619162,228 2263884224,24631840,442 2263884246,22612161,442 sample file2 - rg_j ... (2 Replies)
Discussion started by: raghavendra.cse
2 Replies

4. Shell Programming and Scripting

Problem in comparing 2 fields from 2 files

I've 2 files. Need to compare File1.Field1,File1.Field2 with File2.Field1,File2.Field2. If matches then create a new file. File1 10 A|ADB|967143.24|1006101.5 3E HK|DHB|24294.76|242513.89 ABN ACU|ADB|22104.69|51647.14 ABN BU|DBA|39137.14|109128.38 ABN|ADB|64466.89|167936.55 ABOC... (2 Replies)
Discussion started by: buster
2 Replies

5. Programming

comparing two fields from two different files in AWK

Hi, I have two files formatted as following: File 1: (user_num_ID , realID) (the NR here is 41671) 1 cust_034_60 2 cust_80_91 3 cust_406_4 .. .. File 2: (realID , clusterNumber) (total NR here is 1000) cust_034_60 2 cust_406_4 3 .. .. (11 Replies)
Discussion started by: amarn
11 Replies

6. UNIX for Dummies Questions & Answers

Comparing and merging two text files

Hey everybody, I am new here and already a question to ask, I just recently started some bioinformatic work for my PhD so I am slowly learning Anyway, here is my problem, I have two text files, one contains the complete data file with 43000 genes and their read counts for all my samples... (1 Reply)
Discussion started by: ant55
1 Replies

7. Shell Programming and Scripting

Comparing two files using four fields

I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2 are... (1 Reply)
Discussion started by: NamS
1 Replies

8. Shell Programming and Scripting

Comparing two files using four fields

Dear All, I want to compare File1 and File2 (Separated by spaces) using four fields (Column 1,2,4,5). Logic: If column 1 and 2 of File1 and File2 match exactly and if the File2 has the same characters as any of the characters present in column 4 and 5 of file1 then those lines of file1 and file2... (6 Replies)
Discussion started by: NamS
6 Replies

9. UNIX for Advanced & Expert Users

Need urgent help in comparing two fields in two files

Hi all, I have two files as below. I need to compare field 2 of file 1 against field 1 of file 2 and field 5 of file 1 against filed 2 of file 2. If both matches , then create a result file 1 with first file data and if not matches , then create file with first fie data. Please help me in... (1 Reply)
Discussion started by: sivarajb
1 Replies

10. Shell Programming and Scripting

Comparing two files by two matching fields

Long time listener first time poster. Hope someone can advise. I have two files, 1000+ lines in each, two fields in each file. After performing a sort, what is the best way to find exact matches where field $1 and $2 in file1 are also present in file2 on the same line, then output only those... (6 Replies)
Discussion started by: bstaff
6 Replies
All times are GMT -4. The time now is 02:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy