Sponsored Content
Top Forums Shell Programming and Scripting Joining files using awk not extracting all columns from File 2 Post 302970703 by Don Cragun on Monday 11th of April 2016 01:44:56 AM
Old 04-11-2016
Noting that the requested output in post #16 in this thread:
Quote:
Code:
EmpID,EmpName,DeptID,EmpSal,EmpID,EmpDOB,EmpHireDate,Active
1,AAA,10,100,Jun-04-1986,2012-03-23 12:40:00 PM,Y
2,BBB,20,200,Apr-12-1991,2010-05-12 08:50:00 PM,N
3,CCC,30,300,Dec-31-1978,2010-01-08 12:00:00 AM,Y
3,DDD,40,400,Dec-31-1978,2010-01-08 12:00:00 AM,Y
5,EEE,50,500,,
6,FFF,60,600,Mar-09-1989,2010-05-08 06:45:00 PM,N
8,GGG,70,700,,

has eight fields in the header line (including two occurrences of EmpID, seven fields in the lines where the EmpID appears in both input files, and six fields in the lines where the given EmpID does not appear in File 2 and making the not too wild assumptions that EmpID should only appear in the output header line once and that all output lines should contain seven fields; you might want to try something like:
Code:
awk '
BEGIN {	FS = OFS = ","
}
NR == 1 {
	for(i = 2; i <= NF; i++)
		nm = nm OFS
}
FNR == NR {
	id = $1
	$1 = ""
	d[id] = $0
	next
}
{	print $0 (($1 in d) ? d[$1] : nm)
}' "File 2" "File 1"

which produces the output:
Code:
EmpID,EmpName,DeptID,EmpSal,EmpDOB,EmpHireDate,Active
1,AAA,10,100,Jun-04-1986,2012-03-23 12:40:00 PM,Y
2,BBB,20,200,Apr-12-1991,2010-05-12 08:50:00 PM,N
3,CCC,30,300,Dec-31-1978,2010-01-08 12:00:00 AM,Y
3,DDD,40,400,Dec-31-1978,2010-01-08 12:00:00 AM,Y
5,EEE,50,500,,,
6,FFF,60,600,Mar-09-1989,2010-05-08 06:45:00 PM,N
8,GGG,70,700,,,

from your two sample input files. Is this reasonably close to the output you want.

As always, if you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Joining columns from two files, if the key matches

I am trying to join/paste columns from two files for the rows with matching first field. Any help will be appreciated. Files can not be sorted and may not have all rows in both files. Thanks. File1 aaa 111 bbb 222 ccc 333 File2 aaa sss mmmm ccc kkkk llll ddd xxx yyy Want to... (1 Reply)
Discussion started by: sk_sd
1 Replies

2. Shell Programming and Scripting

Joining two files based on columns/fields

I've got two files, File1 and File2 File 1 has got combination of col1, col2 and col3 which comes on file2 as well, file2 does not get col4. Now based on col1, col2 and col3, I would like to get col4 from file1 and all the columns from file2 in a new file Any ideas? File1 ------ Col1 col2... (11 Replies)
Discussion started by: rudoraj
11 Replies

3. Shell Programming and Scripting

extracting columns with awk

Friends, I have a file with fileds in the following order sda 4.80 114.12 128.69 978424 1103384 sdb 0.03 0.40 0.00 3431 0 sda 1.00 0.00 88.00 0 176 sdb ... (14 Replies)
Discussion started by: achak01
14 Replies

4. Shell Programming and Scripting

Transposing column to row, joining with another file, then sorting columns

Hello! I am very new to Linux and I do not know where to begin... I have a column with >64,000 elements (that are not in numberical order) like this: name 2 5 9 . . . 64,000 I would like to transpose this column into a row that will later become the header of a very large file... (2 Replies)
Discussion started by: doobedoo
2 Replies

5. UNIX for Dummies Questions & Answers

Extracting columns from multiple files with awk

hi everyone! I already posted it in scripts, I'm sorry, it's doubled I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next}... (1 Reply)
Discussion started by: orcaja
1 Replies

6. Shell Programming and Scripting

Extracting columns from multiple files with awk

hi everyone! I'd like to extract a single column from 5 different files and put them together in an output file. I saw a similar question for 2 input files, and the line of code workd very well, the code is: awk 'NR==FNR{a=$2; next} {print a, $2}' file1 file2 I added the file3, file4 and... (10 Replies)
Discussion started by: orcaja
10 Replies

7. Shell Programming and Scripting

NR==FNR trick for joining columns from two files

foo.txt 1 rs2887286 0 1145994 C T 1 rs1240743 0 1323299 C A 1 rs1695824 0 1355433 G T 1 rs3766180 0 1468016 G A 1 rs7519837 0 1500664 A G 1 rs2272908 0 ... (12 Replies)
Discussion started by: genehunter
12 Replies

8. Shell Programming and Scripting

Other alternative for joining together columns from multiple files

Hi again, I have monthly one-column files of roughly around 10 years. Is there a more efficient way to concatenate these files column-wise other than using paste command? For instance: file1.txt 12 13 15 12 file2.txt 14 15 18 19 file3.txt 20 21 (8 Replies)
Discussion started by: ida1215
8 Replies

9. UNIX for Dummies Questions & Answers

Joining different columns from multiple files

Hello again, I am trying to join 3rd column of 3 files into the end on one file and save it separately... my data looks like this file 1 Bob, Green, 80 Mark, Brown, 70 Tina, Smith, 60 file 2 Bob, Green, 70 Mark, Brown, 60 Tina, Smith, 50 file 3 Bob, Green, 50 Mark, Brown,60 Tina,... (6 Replies)
Discussion started by: A-V
6 Replies

10. Shell Programming and Scripting

Joining Two Files Matching Two Columns

Hi All, I am looking to join two files where column 1 of file A matches with column 1 of file B and column 5 of files A matches with column 2 of file B. After joining the files based on above condition, out should contain entire line of file A and column 3, 4 and 5 of file B. Here is sample... (8 Replies)
Discussion started by: angshuman
8 Replies
awk(1)							      General Commands Manual							    awk(1)

Name
       awk - pattern scanning and processing language

Syntax
       awk [-Fc] [-f prog] [-] [file...]

Description
       The  command scans each input file for lines that match any of a set of patterns specified in prog.  With each pattern in prog there can be
       an associated action that will be performed when a line of a file matches the pattern.  The set of patterns may appear literally  as  prog,
       or in a file specified as -f prog.

       Files  are  read  in  order;  if there are no files, the standard input is read.  The file name `-' means the standard input.  Each line is
       matched against the pattern portion of every pattern-action statement; the associated action is performed for each matched pattern.

       An input line is made up of fields separated by white space.  (This default can be changed by using FS, as described  below.)   The  fields
       are denoted $1, $2, ... ; $0 refers to the entire line.

       A pattern-action statement has the form

	    pattern { action }

       A missing { action } means print the line; a missing pattern always matches.

       An action is a sequence of statements.  A statement can be one of the following:

	    if ( conditional ) statement [ else statement ]
	    while ( conditional ) statement
	    for ( expression ; conditional ; expression ) statement
	    break
	    continue
	    { [ statement ] ... }
	    variable = expression
	    print [ expression-list ] [ >expression ]
	    printf format [ , expression-list ] [ >expression ]
	    next # skip remaining patterns on this input line
	    exit # skip the rest of the input

       Statements  are terminated by semicolons, new lines or right braces.  An empty expression-list stands for the whole line.  Expressions take
       on string or numeric values as appropriate, and are built using the operators +, -, *, /, %,  and concatenation	(indicated  by	a  blank).
       The  C operators ++, --, +=, -=, *=, /=, and %= are also available in expressions.  Variables may be scalars, array elements (denoted x[i])
       or fields.  Variables are initialized to the null string.  Array subscripts may be any string, not necessarily numeric; this allows  for  a
       form of associative memory.  String constants are quoted "...".

       The  print  statement prints its arguments on the standard output (or on a file if >file is present), separated by the current output field
       separator, and terminated by the output record separator.  The statement formats its expression list according to the format.  For  further
       information, see

       The  built-in  function	length	returns the length of its argument taken as a string, or of the whole line if no argument.  There are also
       built-in functions exp, log, sqrt, and int.  The last truncates its argument to an integer.  substr(s, m, n) returns the  n-character  sub-
       string  of  s that begins at position m.  The function sprintf(fmt, expr, expr, ...)  formats the expressions according to the format given
       by fmt and returns the resulting string.

       Patterns are arbitrary Boolean combinations (!, ||, &&, and parentheses)  of  regular  expressions  and	relational  expressions.   Regular
       expressions  must be surrounded by slashes and are as in egrep.	Isolated regular expressions in a pattern apply to the entire line.  Regu-
       lar expressions may also occur in relational expressions.

       A pattern may consist of two patterns separated by a comma; in this case, the action is performed for all lines between	an  occurrence	of
       the first pattern and the next occurrence of the second.

       A relational expression is one of the following:

	    expression matchop regular-expression
	    expression relop expression

       where a relop is any of the six relational operators in C, and a matchop is either ~ (for contains) or !~ (for does not contain).  A condi-
       tional is an arithmetic expression, a relational expression, or a Boolean combination of these.

       The special patterns BEGIN and END may be used to capture control before the first input line is read and after the last.   BEGIN  must	be
       the first pattern, END the last.

       A single character c may be used to separate the fields by starting the program with

	    BEGIN { FS = "c" }

       or by using the -Fc option.

       Other  variable	names  with special meanings include NF, the number of fields in the current record; NR, the ordinal number of the current
       record; FILENAME, the name of the current input file; OFS, the output field separator (default blank); ORS,  the  output  record  separator
       (default new line); and OFMT, the output format for numbers (default "%.6g").

Options
       -	 Used for standard input file.

       -Fc	 Sets interfield separator to named character.

       -fprog	 Uses prog file for patterns and actions.

Examples
       Print lines longer than 72 characters:
	    length > 72

       Print first two fields in opposite order:
	    { print $2, $1 }

       Add up first column, print sum and average:
		 { s += $1 }
	    END  { print "sum is", s, " average is", s/NR }

       Print fields in reverse order:
	    { for (i = NF; i > 0; --i) print $i }

       Print all lines between start/stop pairs:
	    /start/, /stop/

       Print all lines whose first field is different from previous one:
	    $1 != prev { print; prev = $1 }

Restrictions
       There  are  no explicit conversions between numbers and strings.  To force an expression to be treated as a number add 0 to it; to force it
       to be treated as a string concatenate "" to it.

See Also
       lex(1), sed(1)
       "Awk - A Pattern Scanning and Processing Language" ULTRIX Supplementary Documents Vol. II: Programmer

																	    awk(1)
All times are GMT -4. The time now is 11:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy