07-23-2012
Join two files with common and range identifiers
I have a problem joining two files. The first file abc.txt has 10k lines and has lots of fields but two fields fff1 and ppp1 to merge by. The second file xyz.txt is a master file with 1k lines and lots of fields but three fields to merge by fff1; rrr1 and qqq1.
The two files need to be merged by fff1 and whenever ppp1 lies between rrr1 and qqq1. So multiple lines from abc.txt with meet this criteria and data from xyz.txt will be copied whenever fff1 matches for the two files and ppp1 from abc.txt lies between rrr1 and qqq1 from xyz.txt.
I hope this is clear. I would welcome any suggestions and open to any script as long as it is efficient since the actual files are millions of lines. Thanks for your help.
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi all,
I have 4 file and I want to find the common identifier in each file. For example:
FILE1
goat
door
bear
cat
FILE2
goat
moose
dog
cat
FILE3
goat
yak
tiger (6 Replies)
Discussion started by: phil_heath
6 Replies
2. Shell Programming and Scripting
Hi All,
I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations:
1. I am restrained to 2 input files only.
2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies
3. Shell Programming and Scripting
I have n files (for ex:64 files) with one similar column. Is it possible to combine them all based on that column ?
file1
ax100 20 30 40
ax200 22 33 44
file2
ax100 10 20 40
ax200 12 13 44
file2
ax100 0 0 4
ax200 2 3 4 (9 Replies)
Discussion started by: quincyjones
9 Replies
4. Web Development
Hello;
I am posting to get any help on my code that I have been struggling for some time. The project is to join two files each with 80k~180k rows. I want to merge them together by the shared common column. The problem of the shared column is partially matching, not exactly the same.
File1:... (5 Replies)
Discussion started by: yifangt
5 Replies
5. Shell Programming and Scripting
Hi experts,
Would you please help me with this?
I have several files and I need to join the forth field of them based on the common first field.
here's an example...
first file:
280346 39.88 -75.08 547.8
280690 39.23 -74.83 538.7
280729 40.83 -75.08 499.2
280907 40.9 -74.4 507.8... (5 Replies)
Discussion started by: GoldenFire
5 Replies
6. UNIX for Dummies Questions & Answers
file1:
Toronto:12439755:1076359:July 1, 1867:6
Quebec City:7560592:1542056:July 1, 1867:5
Halifax:938134:55284:July 1, 1867:4
Fredericton:751400:72908:July 1, 1867:3
Winnipeg:1170300:647797:July 15, 1870:7
Victoria:4168123:944735:July 20, 1871:10
Charlottetown:137900:5660:July 1, 1873:2... (2 Replies)
Discussion started by: mindfreak
2 Replies
7. UNIX for Dummies Questions & Answers
Hi,
I have 20 tab delimited text files that have a common column (column 1). The files are named GSM1.txt through GSM20.txt. Each file has 3 columns (2 other columns in addition to the first common column).
I want to write a script to join the files by the first common column so that in the... (5 Replies)
Discussion started by: evelibertine
5 Replies
8. UNIX for Dummies Questions & Answers
Hi all,
I'm trying to join two .txt file tab delimitated based on a common column.
File 1
transcript_id gene_id length effective_length expected_count TPM FPKM IsoPct
comp1000201_c0_seq1 comp1000201_c0 337 183.51 0.00 0.00 0.00 0.00
comp1000297_c0_seq1 ... (1 Reply)
Discussion started by: alisrpp
1 Replies
9. Shell Programming and Scripting
Hi, I am trying to merge information across 2 files. The first file is a "master" file, with all IDS. File 2 contains a subset of IDs of those in File 1.
I would like to match up individuals in File 1 and File 2, and add information in File 2 to that of File 1 if they appear. However, if an... (3 Replies)
Discussion started by: hubleo
3 Replies
10. Shell Programming and Scripting
Hi,
I am trying to join 2 csv files, to create a 3rd output file with the joined data.
Below is an example of my Input Data:
Input File 1
NAME, FAV_FOOD, FAV_DRINK, ID, GENDER
Bob, Fish, Coke, 1, M
Lisa, Rice, Water, 2, F
Jenny, Noodle, Tea, 3, F
Ken, Pizza, Coffee, 4, M
Lisa,... (7 Replies)
Discussion started by: RichZR
7 Replies
JOIN(1) General Commands Manual JOIN(1)
NAME
join - relational database operator
SYNOPSIS
join [-an] [-e s] [-o list] [-tc] file1 file2
DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard
input is used.
File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in
each line.
There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con-
sists of the common field, then the rest of the line from file1, then the rest of the line from file2.
Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis-
carded.
These options are recognized:
-an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2.
-e s Replace empty output fields by string s.
-o list
Each output line comprises the fields specified in list, each element of which has the form n.m, where n is a file number and m is a
field number.
-tc Use character c as a separator (tab character). Every appearance of c in a line is significant.
SEE ALSO
sort(1), comm(1), awk(1).
BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort.
The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous.
7th Edition April 29, 1985 JOIN(1)