Join two files with common and range identifiers

07-23-2012

Registered User

3, 0

Join Date: Jul 2012

Last Activity: 24 July 2012, 10:31 AM EDT

Posts: 3

Thanks Given: 0

Thanked 0 Times in 0 Posts

Join two files with common and range identifiers

I have a problem joining two files. The first file abc.txt has 10k lines and has lots of fields but two fields fff1 and ppp1 to merge by. The second file xyz.txt is a master file with 1k lines and lots of fields but three fields to merge by fff1; rrr1 and qqq1.

The two files need to be merged by fff1 and whenever ppp1 lies between rrr1 and qqq1. So multiple lines from abc.txt with meet this criteria and data from xyz.txt will be copied whenever fff1 matches for the two files and ppp1 from abc.txt lies between rrr1 and qqq1 from xyz.txt.

I hope this is clear. I would welcome any suggestions and open to any script as long as it is efficient since the actual files are millions of lines. Thanks for your help.

cfiles2012

View Public Profile for cfiles2012

Find all posts by cfiles2012

join(1) General Commands Manual join(1) NAME
join - relational database operator SYNOPSIS
[options] file1 file2 DESCRIPTION
forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 or file2 is the standard input is used. file1 and file2 must be sorted in increasing collating sequence (see Environment Variables below) on the fields on which they are to be joined; normally the first in each line. The output contains one line for each pair of lines in file1 and file2 that have identical join fields. The output line normally consists of the common field followed by the rest of the line from file1, then the rest of the line from file2. The default input field separators are space, tab, or new-line. In this case, multiple separators count as one field separator, and lead- ing separators are ignored. The default output field separator is a space. Some of the below options use the argument n. This argument should be a or a referring to either file1 or file2, respectively. Options In addition to the normal output, produce a line for each unpairable line in file n, where n is or Replace empty output fields by string s. Join on field m of both files. The argument m must be delimited by space characters. This option and the following two are provided for backward compatibility. Use of the and options ( see below ) is recommended for portability. Join on field m of file1. Join on field m of file2. Each output line comprises the fields specified in list, each element of which has the form where n is a file number and m is a field number. The common field is not printed unless specifically requested. Use character c as a separator (tab character). Every appearance of c in a line is significant. The character c is used as the field sepa- rator for both input and output. Instead of the default output, produce a line only for each unpairable line in file_number, where file_number is or Join on field f of file 1. Fields are numbered starting with 1. Join on field f of file 2. Fields are numbered starting with 1. EXTERNAL INFLUENCES
Environment Variables determines the collating sequence expects from input files. determines the alternative blank character as an input field separator, and the interpretation of data within files as single and/or multi- byte characters. also determines whether the separator defined through the option is a single- or multi-byte character. If or is not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty variable. If is not specified or is set to the empty string, a default of ``C'' (see lang(5)) is used instead of If any internationaliza- tion variable contains an invalid setting, behaves as if all internationalization variables are set to ``C'' (see environ(5)). International Code Set Support Single- and multi-byte character code sets are supported with the exception that multi-byte-character file names are not supported. EXAMPLES
The following command line joins the password file and the group file, matching on the numeric group ID, and outputting the login name, the group name, and the login directory. It is assumed that the files have been sorted in the collating sequence defined by the or environment variable on the group ID fields. The following command produces an output consisting all possible combinations of lines that have identical first fields in the two sorted files sf1 and sf2, with each line consisting of the first and third fields from and the second and fourth fields from WARNINGS
With default field separation, the collating sequence is that of with the sequence is that of a plain sort. The conventions of and are incongruous. Numeric filenames may cause conflict when the option is used immediately before listing filenames. AUTHOR
was developed by OSF and HP. SEE ALSO
awk(1), comm(1), sort(1), uniq(1). STANDARDS CONFORMANCE
join(1)

Shell Programming and Scripting

Join two files with common and range identifiers

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Linux - Join 2 csv files with common key

Discussion started by: RichZR

2. Shell Programming and Scripting

Merging files with common IDs without JOIN

Discussion started by: hubleo

3. UNIX for Dummies Questions & Answers

How to join 2 .txt files based on a common column?

Discussion started by: alisrpp

4. UNIX for Dummies Questions & Answers

How to use the the join command to join multiple files by a common column

Discussion started by: evelibertine

5. UNIX for Dummies Questions & Answers

how to join two files using "Join" command with one common field in this problem?

Discussion started by: mindfreak

6. Shell Programming and Scripting

join files based on a common field

Discussion started by: GoldenFire

7. Web Development

Perl join two files by "common" column

Discussion started by: yifangt

8. Shell Programming and Scripting

Join multiple files based on 1 common column

Discussion started by: quincyjones

9. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Discussion started by: Katabatic

10. Shell Programming and Scripting

List of common identifiers

Discussion started by: phil_heath

LEARN ABOUT HPUX

join