01-07-2011
post #3: (by myself)
- Set field separator to pipe character i.e. |
- If current file read is "from_file", store each record ($0) in array a indexed as 1dt field in that record, and move to next line (This step will execute for all lines in from_file AND ONLY for from_file only)
- If current file read is "to_file", store each record ($0) in array b indexed as 1dt field in that record and move to next line (This step will execute for all lines in to_file AND ONLY for to_file only)
- Set field separator to tilde (~) character (By this time, both from_file and to_file are processed, and below steps are executed only for 3rd file i.e. key_file)
- If array a has non-null value at index $1 (1st field in key_file), set array a value as the 1st field (Override 1st field of this line with value in array a)
- If array b has non-null value at index $2 (2nd field in key_file), set array b value as the 2nd field (Override 2nd field of this line with value in array b)
- print lines in key_file (modified as per step 5 and 6) having output field separator as ~
Post #4: (By Scrutinizer) [If more than one file is passed to awk then, FNR gives line no in current file only, but NR gives current line from start (as if all files were in one file)]
- Set field separator to pipe character i.e. |
- Remove all trailing spaces from currrent line [Not done in earlier post]
- If current file read is "fromfile" (NR==FNR will be true only for 1st file passed to the command), store line in array A with 1st field as index (As was done eariler) and also increment m (So m will have line no processed so far)
- If current file read is "tofile", store line in array B with 1st field as index (As was done eariler). [NR-FNR==m will be true ONLY for 2nd file]
- Set field separator to tilde (~) character (It is set after 1st and 2nd file is processed)
- While processing lines in 3rd file, print array A value at index of 1st field and print array B value at index of 2nd field having output field separator as ~
Both posts are pretty much same, just implemented in different ways.
This User Gave Thanks to anurag.singh For This Post:
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hi Friends,
Can any one help me with merging these file based on two columns :
File1:
A|123|99|SAMS
B|456|95|GEORGE
D|789|85|HOVARD
File2:
S|123|99|NANcY|6357
S|123|99|GREGRO|83748
A|456|95|HARRY|827|somers
S|456|95|ANTONY|546841|RUDOLPH|7263
B|456|95|SMITH|827|BOISE STATE|834... (3 Replies)
Discussion started by: sbasetty
3 Replies
2. Shell Programming and Scripting
Hi,
I have file1.txt and file2.txt and would like to create file3.txt based on one column in UNIX
Eg:
file1.txt
17328756,0000786623.pdf,0000786623
20115537,0000793892.pdf,0000793892
file2.txt
12521_74_4.zip,0000786623.pdf
12521_15_5.zip,0000793892.pdf
Desired Output
... (5 Replies)
Discussion started by: techmoris
5 Replies
3. Shell Programming and Scripting
Hi All,
I have working (Perl) code to combine 2 input files into a single output file using the join function that works to a point, but has the following limitations:
1. I am restrained to 2 input files only.
2. Only the "matched" fields are written out to the "matched" output file and... (1 Reply)
Discussion started by: Katabatic
1 Replies
4. Shell Programming and Scripting
Hi Guys,
After windows died on my netbook I installed Lubuntu and discovered Gawk about a month ago. After using Excel for 10+ years I'm amazed how quick and easily Gawk can process data but I'm stuck with a little problem merging data from multiple lines.
I'm an SEO Consultant and provide... (9 Replies)
Discussion started by: Jamesfirst
9 Replies
5. Shell Programming and Scripting
I've been a Unix admin for nearly 30 years and never learned AWK. I've seen several similar posts here, but haven't been able to adapt the answers to my situation. AWK is so damn cryptic! ;)
I have a single file with ~900 lines (CSV list). Each line starts with an ID, but with different stuff... (6 Replies)
Discussion started by: protosd
6 Replies
6. Shell Programming and Scripting
Hi Friends,
I have a file
file1.txt
1|ABC|3|jul|dhj
2|NHU|4|kil|eu
3|hjd|34|hfd|43
file2.txt
1||3|KING|dhj
2|NHU||k|
3|hjd|34|hd|43
i want to merge file1.txt file2.txt based on the column null values in file2.txif there are any nulls in column values , (5 Replies)
Discussion started by: i150371485
5 Replies
7. Shell Programming and Scripting
011111123444 1234 1 20000
011111123444 1235 1 30000
011111123446 1234 3 40000
011111123447 1234 4 50000
011111123448 1234 3 50000
File2:
011111123444,Rsttponrfgtrgtrkrfrgtrgrer
011111123446,Rsttponrfgtrgtr
011111123447,Rsttponrfgtrguii
011111123448,Rsttponrfgtrgtjiiu
I have 2 files... (4 Replies)
Discussion started by: vinus
4 Replies
8. UNIX for Dummies Questions & Answers
Hi, I am trying to selectively merge two files based on keys reported in the 1st column.
File1:
#file1-header1
file1-header2
111 qwe rtz uio
198 asd fgh jkl
165 yxc
789 poi uzt rew
89 lkj
File2:
#file2-header2
file2-header2
165 ghz nko2 ... (2 Replies)
Discussion started by: dovah
2 Replies
9. Shell Programming and Scripting
Hi,
I am struggling with the an awk command to merge two files based on a common key.
I want to append the value from File2 ($2) onto the end of File1 where $1 from each file matches - If no match then nothing is apended
File1
COL1|COL2|COL3|COL4|COL5|COL6|COL7... (3 Replies)
Discussion started by: Ads89
3 Replies
10. Shell Programming and Scripting
Join and merge multiple files with duplicate key and fill void columns
Hi guys,
I have many files that I want to merge:
file1.csv:
1|abc
1|def
2|ghi
2|jkl
3|mno
3|pqr
file2.csv: (5 Replies)
Discussion started by: yjacknewton
5 Replies
join(1) General Commands Manual join(1)
NAME
join - relational database operator
SYNOPSIS
[options] file1 file2
DESCRIPTION
forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 or file2 is the standard
input is used.
file1 and file2 must be sorted in increasing collating sequence (see Environment Variables below) on the fields on which they are to be
joined; normally the first in each line.
The output contains one line for each pair of lines in file1 and file2 that have identical join fields. The output line normally consists
of the common field followed by the rest of the line from file1, then the rest of the line from file2.
The default input field separators are space, tab, or new-line. In this case, multiple separators count as one field separator, and lead-
ing separators are ignored. The default output field separator is a space.
Some of the below options use the argument n. This argument should be a or a referring to either file1 or file2, respectively.
Options
In addition to the normal output,
produce a line for each unpairable line in file n, where n is or
Replace empty output fields by string
s.
Join on field
m of both files. The argument m must be delimited by space characters. This option and the following two are provided for
backward compatibility. Use of the and options ( see below ) is recommended for portability.
Join on field
m of file1.
Join on field
m of file2.
Each output line comprises the fields specified in
list, each element of which has the form where n is a file number and m is a field number. The common field is not printed
unless specifically requested.
Use character
c as a separator (tab character). Every appearance of c in a line is significant. The character c is used as the field sepa-
rator for both input and output.
Instead of the default output,
produce a line only for each unpairable line in file_number, where file_number is or
Join on field
f of file 1. Fields are numbered starting with 1.
Join on field
f of file 2. Fields are numbered starting with 1.
EXTERNAL INFLUENCES
Environment Variables
determines the collating sequence expects from input files.
determines the alternative blank character as an input field separator, and the interpretation of data within files as single and/or multi-
byte characters. also determines whether the separator defined through the option is a single- or multi-byte character.
If or is not specified in the environment or is set to the empty string, the value of is used as a default for each unspecified or empty
variable. If is not specified or is set to the empty string, a default of ``C'' (see lang(5)) is used instead of If any internationaliza-
tion variable contains an invalid setting, behaves as if all internationalization variables are set to ``C'' (see environ(5)).
International Code Set Support
Single- and multi-byte character code sets are supported with the exception that multi-byte-character file names are not supported.
EXAMPLES
The following command line joins the password file and the group file, matching on the numeric group ID, and outputting the login name, the
group name, and the login directory. It is assumed that the files have been sorted in the collating sequence defined by the or environment
variable on the group ID fields.
The following command produces an output consisting all possible combinations of lines that have identical first fields in the two sorted
files sf1 and sf2, with each line consisting of the first and third fields from and the second and fourth fields from
WARNINGS
With default field separation, the collating sequence is that of with the sequence is that of a plain sort.
The conventions of and are incongruous.
Numeric filenames may cause conflict when the option is used immediately before listing filenames.
AUTHOR
was developed by OSF and HP.
SEE ALSO
awk(1), comm(1), sort(1), uniq(1).
STANDARDS CONFORMANCE
join(1)