Join two files with common and range identifiers


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Join two files with common and range identifiers
# 1  
Old 07-23-2012
Join two files with common and range identifiers

I have a problem joining two files. The first file abc.txt has 10k lines and has lots of fields but two fields fff1 and ppp1 to merge by. The second file xyz.txt is a master file with 1k lines and lots of fields but three fields to merge by fff1; rrr1 and qqq1.

The two files need to be merged by fff1 and whenever ppp1 lies between rrr1 and qqq1. So multiple lines from abc.txt with meet this criteria and data from xyz.txt will be copied whenever fff1 matches for the two files and ppp1 from abc.txt lies between rrr1 and qqq1 from xyz.txt.

I hope this is clear. I would welcome any suggestions and open to any script as long as it is efficient since the actual files are millions of lines. Thanks for your help.
# 2  
Old 07-23-2012
Quote:
Originally Posted by cfiles2012
I have a problem joining two files. The first file abc.txt has 10k lines and has lots of fields but two fields fff1 and ppp1 to merge by. The second file xyz.txt is a master file with 1k lines and lots of fields but three fields to merge by fff1; rrr1 and qqq1.

The two files need to be merged by fff1 and whenever ppp1 lies between rrr1 and qqq1. So multiple lines from abc.txt with meet this criteria and data from xyz.txt will be copied whenever fff1 matches for the two files and ppp1 from abc.txt lies between rrr1 and qqq1 from xyz.txt.

I hope this is clear. I would welcome any suggestions and open to any script as long as it is efficient since the actual files are millions of lines. Thanks for your help.
Please post sample input and output so that someone can help you promptly and easily.

Simply writing in plain words will be a pain. Thanks
# 3  
Old 07-23-2012
sample files

Here are two sample files

abc.txt

name fff1 ppp1
sam 1 12
pam 1 14
jen 2 8
trout 2 6
pat 3 12
don 3 16


xyz.txt

id fff1 qqq1 rrr1
234 1 10 15
456 2 4 7
324 2 8 10
334 3 10 20

The resulting file should be abcxyz.txt

name fff1 ppp1 id qqq1 rrrr1
sam 1 12 234 10 15
pam 1 14 234 10 15
jen 2 8 324 8 10
trout 2 6 456 4 7
pat 3 12 334 10 20
don 3 16 334 10 20
# 4  
Old 07-23-2012
awk

Hi,
welcome to unix.com!!!
Try this one,
Code:
awk 'FNR==NR{a[$2]=$3;v[$2]=$0;next;}{if(a[$2] &&a[$2]>=$3 && a[$2]<=$4){print v[$2],$0;}}' abc.txt xyz.txt

Cheers,
Ranga:-)

Last edited by rangarasan; 07-23-2012 at 04:43 PM..
# 5  
Old 07-23-2012
Quote:
Originally Posted by rangarasan
Code:
awk 'FNR==NR{a[$2]=$3;v[$2]=$0;next;}{if(a[$2] &&a[$2]>=$3 && a[$2]<=$4){print v[$2],$0;}}' abc.txt xyz.txt

Cheers,
Ranga:-)
That solution won't work. Multiple records in abc.txt can have the same $2. Your assignments to a and v are both clobbering.

Regards,
Alister
# 6  
Old 07-23-2012
Yes, i haven't noticed that.
Good catch. Thank you!!
But the requirement is
Code:
The two files need to be merged by fff1 and whenever ppp1 lies betweenrrr1 and qqq1.

right? The only man can answer for this. Its user.
Please confirm us.
Cheers,
Ranga:-)
# 7  
Old 07-23-2012
Code:
join -12 -22 -o1.1,1.2,1.3,2.1,2.3,2.4 abc.txt xyz.txt | awk '$3>=$5 && $3<=$6' > abcxyz.txt

That will work if both files are sorted on fff1 and if the header line at the top of the sample data files isn't present in the actual data.

Regards,
Alister
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Linux - Join 2 csv files with common key

Hi, I am trying to join 2 csv files, to create a 3rd output file with the joined data. Below is an example of my Input Data: Input File 1 NAME, FAV_FOOD, FAV_DRINK, ID, GENDER Bob, Fish, Coke, 1, M Lisa, Rice, Water, 2, F Jenny, Noodle, Tea, 3, F Ken, Pizza, Coffee, 4, M Lisa,... (7 Replies)
Discussion started by: RichZR
7 Replies

2. Shell Programming and Scripting

Merging files with common IDs without JOIN

Hi, I am trying to merge information across 2 files. The first file is a "master" file, with all IDS. File 2 contains a subset of IDs of those in File 1. I would like to match up individuals in File 1 and File 2, and add information in File 2 to that of File 1 if they appear. However, if an... (3 Replies)
Discussion started by: hubleo
3 Replies

3. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Hi all, I'm trying to join two .txt file tab delimitated based on a common column. File 1 transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00 comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies

4. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Hi, I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column). I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies

5. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

file1: Toronto:12439755:1076359:July 1, 1867:6 Quebec City:7560592:1542056:July 1, 1867:5 Halifax:938134:55284:July 1, 1867:4 Fredericton:751400:72908:July 1, 1867:3 Winnipeg:1170300:647797:July 15, 1870:7 Victoria:4168123:944735:July 20, 1871:10 Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies

6. Shell Programming and Scripting

join files based on a common field

Hi experts, Would you please help me with this? I have several files and I need to join the forth field of them based on the common first field. here's an example... first file: 280346 39.88 -75.08 547.8 280690 39.23 -74.83 538.7 280729 40.83 -75.08 499.2 280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies

7. Web Development

Perl join two files by "common" column

Hello; I am posting to get any help on my code that I have been struggling for some time. The project is to join two files each with 80k~180k rows. I want to merge them together by the shared common column. The problem of the shared column is partially matching, not exactly the same. File1:... (5 Replies)
Discussion started by: yifangt
5 Replies

8. Shell Programming and Scripting

Join multiple files based on 1 common column

I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ? file1 ax100 20 30 40 ax200 22 33 44 file2 ax100 10 20 40 ax200 12 13 44 file2 ax100 0 0 4 ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies

9. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies

10. Shell Programming and Scripting

List of common identifiers

Hi all, I have 4 file and I want to find the common identifier in each file. For example: FILE1 goat door bear cat FILE2 goat moose dog cat FILE3 goat yak tiger (6 Replies)
Discussion started by: phil_heath
6 Replies
Login or Register to Ask a Question