Merge two files based on a 3rd key file Post: 302486144

Sponsored Content

Top Forums Shell Programming and Scripting Merge two files based on a 3rd key file Post 302486144 by anurag.singh on Friday 7th of January 2011 07:10:58 AM

01-07-2011

Registered User

post #3: (by myself)

Set field separator to pipe character i.e. |
If current file read is "from_file", store each record ($0) in array a indexed as 1dt field in that record, and move to next line (This step will execute for all lines in from_file AND ONLY for from_file only)
If current file read is "to_file", store each record ($0) in array b indexed as 1dt field in that record and move to next line (This step will execute for all lines in to_file AND ONLY for to_file only)
Set field separator to tilde (~) character (By this time, both from_file and to_file are processed, and below steps are executed only for 3rd file i.e. key_file)
If array a has non-null value at index $1 (1st field in key_file), set array a value as the 1st field (Override 1st field of this line with value in array a)
If array b has non-null value at index $2 (2nd field in key_file), set array b value as the 2nd field (Override 2nd field of this line with value in array b)
print lines in key_file (modified as per step 5 and 6) having output field separator as ~

Post #4: (By Scrutinizer) [If more than one file is passed to awk then, FNR gives line no in current file only, but NR gives current line from start (as if all files were in one file)]

Set field separator to pipe character i.e. |
Remove all trailing spaces from currrent line [Not done in earlier post]
If current file read is "fromfile" (NR==FNR will be true only for 1st file passed to the command), store line in array A with 1st field as index (As was done eariler) and also increment m (So m will have line no processed so far)
If current file read is "tofile", store line in array B with 1st field as index (As was done eariler). [NR-FNR==m will be true ONLY for 2nd file]
Set field separator to tilde (~) character (It is set after 1st and 2nd file is processed)
While processing lines in 3rd file, print array A value at index of 1st field and print array B value at index of 2nd field having output field separator as ~

Both posts are pretty much same, just implemented in different ways.

This User Gave Thanks to anurag.singh For This Post:

anurag.singh

View Public Profile for anurag.singh

Find all posts by anurag.singh

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge files based on key

Hi Friends, Can any one help me with merging these file based on two columns : File1: A|123|99|SAMS B|456|95|GEORGE D|789|85|HOVARD File2: S|123|99|NANcY|6357 S|123|99|GREGRO|83748 A|456|95|HARRY|827|somers S|456|95|ANTONY|546841|RUDOLPH|7263 B|456|95|SMITH|827|BOISE STATE|834...

2. Shell Programming and Scripting

merge two two txt files into one file based on one column

Hi, I have file1.txt and file2.txt and would like to create file3.txt based on one column in UNIX Eg: file1.txt 17328756,0000786623.pdf,0000786623 20115537,0000793892.pdf,0000793892 file2.txt 12521_74_4.zip,0000786623.pdf 12521_15_5.zip,0000793892.pdf Desired Output ...

3. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Hi All, I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations: 1. I am restrained to 2 input files only. 2. Only the "matched" fields are written out to the "matched" output file and...

4. Shell Programming and Scripting

Gawk / Awk Merge Lines based on Key

Hi Guys, After windows died on my netbook I installed Lubuntu and discovered Gawk about a month ago. After using Excel for 10+ years I'm amazed how quick and easily Gawk can process data but I'm stuck with a little problem merging data from multiple lines. I'm an SEO Consultant and provide...

5. Shell Programming and Scripting

Merge multiple lines in same file with common key using awk

I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic! ;) I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff...

6. Shell Programming and Scripting

Merge files based on the column value

Hi Friends, I have a file file1.txt 1|ABC|3|jul|dhj 2|NHU|4|kil|eu 3|hjd|34|hfd|43 file2.txt 1||3|KING|dhj 2|NHU||k| 3|hjd|34|hd|43 i want to merge file1.txt file2.txt based on the column null values in file2.txif there are any nulls in column values ,

7. Shell Programming and Scripting

Merge files based on columns

011111123444 1234 1 20000 011111123444 1235 1 30000 011111123446 1234 3 40000 011111123447 1234 4 50000 011111123448 1234 3 50000 File2: 011111123444,Rsttponrfgtrgtrkrfrgtrgrer 011111123446,Rsttponrfgtrgtr 011111123447,Rsttponrfgtrguii 011111123448,Rsttponrfgtrgtjiiu I have 2 files...

8. UNIX for Dummies Questions & Answers

Merge selective columns from files based on common key

Hi, I am trying to selectively merge two files based on keys reported in the 1st column. File1: #file1-header1 file1-header2 111 qwe rtz uio 198 asd fgh jkl 165 yxc 789 poi uzt rew 89 lkj File2: #file2-header2 file2-header2 165 ghz nko2 ...

9. Shell Programming and Scripting

awk - Merge two files based on one key

Hi, I am struggling with the an awk command to merge two files based on a common key. I want to append the value from File2 ($2) onto the end of File1 where $1 from each file matches - If no match then nothing is apended File1 COL1|COL2|COL3|COL4|COL5|COL6|COL7...

10. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Join and merge multiple files with duplicate key and fill void columns Hi guys, I have many files that I want to merge: file1.csv: 1|abc 1|def 2|ghi 2|jkl 3|mno 3|pqr file2.csv:

LEARN ABOUT HPUX

join

join(1) 						      General Commands Manual							   join(1)

NAME

       join - relational database operator

SYNOPSIS

       [options] file1 file2

DESCRIPTION

       forms,  on  the	standard output, a join of the two relations specified by the lines of file1 and file2.  If file1 or file2 is the standard
       input is used.

       file1 and file2 must be sorted in increasing collating sequence (see Environment Variables below) on the fields on which  they  are  to	be
       joined; normally the first in each line.

       The  output contains one line for each pair of lines in file1 and file2 that have identical join fields.  The output line normally consists
       of the common field followed by the rest of the line from file1, then the rest of the line from file2.

       The default input field separators are space, tab, or new-line.	In this case, multiple separators count as one field separator, and  lead-
       ing separators are ignored.  The default output field separator is a space.

       Some of the below options use the argument n.  This argument should be a or a referring to either file1 or file2, respectively.

   Options
       In addition to the normal output,
		   produce a line for each unpairable line in file n, where n is or

       Replace empty output fields by string
		   s.

       Join on field
		   m  of  both	files.	 The argument m must be delimited by space characters.	This option and the following two are provided for
		   backward compatibility.  Use of the and options ( see below ) is recommended for portability.

       Join on field
		   m of file1.

       Join on field
		   m of file2.

       Each output line comprises the fields specified in
		   list, each element of which has the form where n is a file number and m is a field number.  The common  field  is  not  printed
		   unless specifically requested.

       Use character
		   c  as a separator (tab character).  Every appearance of c in a line is significant.	The character c is used as the field sepa-
		   rator for both input and output.

       Instead of the default output,
		   produce a line only for each unpairable line in file_number, where file_number is or

       Join on field
		   f of file 1.  Fields are numbered starting with 1.

       Join on field
		   f of file 2.  Fields are numbered starting with 1.

EXTERNAL INFLUENCES

   Environment Variables
       determines the collating sequence expects from input files.

       determines the alternative blank character as an input field separator, and the interpretation of data within files as single and/or multi-
       byte characters.  also determines whether the separator defined through the option is a single- or multi-byte character.

       If  or  is  not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty
       variable.  If is not specified or is set to the empty string, a default of ``C'' (see lang(5)) is used instead of If any  internationaliza-
       tion variable contains an invalid setting, behaves as if all internationalization variables are set to ``C'' (see environ(5)).

   International Code Set Support
       Single- and multi-byte character code sets are supported with the exception that multi-byte-character file names are not supported.

EXAMPLES

       The following command line joins the password file and the group file, matching on the numeric group ID, and outputting the login name, the
       group name, and the login directory.  It is assumed that the files have been sorted in the collating sequence defined by the or environment
       variable on the group ID fields.

       The  following  command produces an output consisting all possible combinations of lines that have identical first fields in the two sorted
       files sf1 and sf2, with each line consisting of the first and third fields from and the second and fourth fields from

WARNINGS

       With default field separation, the collating sequence is that of with the sequence is that of a plain sort.

       The conventions of and are incongruous.

       Numeric filenames may cause conflict when the option is used immediately before listing filenames.

AUTHOR

       was developed by OSF and HP.

SEE ALSO

       awk(1), comm(1), sort(1), uniq(1).

STANDARDS CONFORMANCE

																	   join(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Merge files based on key

Discussion started by: sbasetty

2. Shell Programming and Scripting

merge two two txt files into one file based on one column

Discussion started by: techmoris

3. Shell Programming and Scripting

"Join" or "Merge" more than 2 files into single output based on common key (column)

Discussion started by: Katabatic

4. Shell Programming and Scripting

Gawk / Awk Merge Lines based on Key

Discussion started by: Jamesfirst