How to combine two files with awk?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to combine two files with awk?
# 1  
Old 10-26-2012
How to combine two files with awk?

Hi, everyone!
I have two files, I want to combine them as follows:

HTML Code:
File1
AAAA 23 45
AAAB 44 56
AAAC 34 65
AAAD 34 87

File2
AAAA 34 54
AAAE 34 56

Combined file
AAAA 23 45 34 54
AAAB 44 56
AAAC 34 65
AAAD 34 87
AAAE       34 56
I searched online, and just found merged file with same field value or joint two file in different columns.

How should I combine two files according a key field, print record together when the key field has same value, otherwise, print in different rows.

Thank you!
# 2  
Old 10-26-2012
You better use join.

try

Code:
join file1 file2 -a 1 -a 2

with awk..

Code:
awk 'FNR==NR{a[$1]=$0;next}{if(a[$1]){print a[$1],$2,$3;delete a[$1]}else{print $1"\t"$2,$3}}END{for(i in a){if(a[i]){print a[i]}}}' file2 file1

This User Gave Thanks to pamu For This Post:
# 3  
Old 10-26-2012
@Pamu can you please explain the Join command..
# 4  
Old 10-26-2012
Quote:
Originally Posted by bmk
@Pamu can you please explain the Join command..
Code:
-a FILENUM
          print  unpairable  lines coming from file FILENUM, where FILENUM
          is 1 or 2, corresponding to FILE1 or FILE2

Please check https://www.unix.com/man-pages.php?query=join&apropos=0§ion=1&os=linux
# 5  
Old 10-26-2012
@Pamu,you are really Great...
# 6  
Old 10-26-2012
When I ran the commands suggested by pamu, I got a syntax error from the join command (the operands need to follow the options when using a standards conforming shell). Changing the join command to:
Code:
join -a 1 -a 2 file1 file2

produced the output:
Code:
AAAA 23 45 34 54
AAAB 44 56
AAAC 34 65
AAAD 34 87
AAAE 34 56

and the output produced by the awk command was:
Code:
AAAA 34 54 23 45
AAAB	44 56
AAAC	34 65
AAAD	34 87
AAAE 34 56

I didn't think either of these matched the desired output. The following awk script:
Code:
awk 'BEGIN{OFS = "\t"}
{       if($1 in out) out[$1] = out[$1] OFS $2 OFS $3
        else if(NR == FNR) {
                out[$1] = $1 OFS $2 OFS $3
                order[++oc] = $1
        } else {out[$1] = $1 OFS OFS OFS $2 OFS $3
                order[++oc] = $1
        }
}
END{for(i = 1; i <= oc; print out[order[i++]]);}' file1 file2

produces:
Code:
AAAA	23	45	34	54
AAAB	44	56
AAAC	34	65
AAAD	34	87
AAAE			34	56

which seems a little closer to the requested output format. It uses tabs instead of spaces as the output field separator so it is more obvious which input file contained the output data when an entry only appears in file2. I could force all of the output columns consume the actual width of the longest entry in that column (as I have done in some other solutions I've presented) but I didn't think that was needed here.
This User Gave Thanks to Don Cragun For This Post:
# 7  
Old 10-26-2012
Quote:
Originally Posted by Don Cragun
When I ran the commands suggested by pamu, I got a syntax error from the join command (the operands need to follow the options when using a standards conforming shell). Changing the join command to:
I don't know what could be the problem here(may be OS version)
I am using BASH.
Code:
$ join file1 file2 -a 1 -a 2
AAAA 23 45 34 54
AAAB 44 56
AAAC 34 65
AAAD 34 87
AAAE 34 56

Quote:
Originally Posted by Don Cragun
the output produced by the awk command was:
Code:
AAAA 34 54 23 45
AAAB    44 56
AAAC    34 65
AAAD    34 87
AAAE 34 56

I didn't think either of these matched the desired output.
I did small mistake there file2 consider as first, it should be second. And as i don't worry too much about spacings.Smilie
Now output looks like(This also not looks like OP's desired outputSmilie considering spacings)

Code:
AAAA 23 45 34 54
AAAE    34 56
AAAB 44 56
AAAC 34 65
AAAD 34 87

But if you have removed spacings part the output is what OP want.

Yes your script gives exact output as OP want.
You always comes with perfect solution..Smilie

Thanks
This User Gave Thanks to pamu For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies

2. UNIX for Beginners Questions & Answers

Combine awk scripts

Hi, Below command is working as expected, but would like to know how to club the two AWK scripts in the command into one echo -e "MMS000101S0203430A|20180412E|\nMMB0001INVESTMENT||107-86193-01-03|\nMMB0001FUND||107-86193-04-01|\nMMC9991 " | awk -F'|' -v OFS=, '/^MMC9991/{print r"|"s,t; next}... (3 Replies)
Discussion started by: JSKOBS
3 Replies

3. Shell Programming and Scripting

Combine two awk commands

Hi, Can someone please guide me how to combine the following two awk calls in one? I noticed that it is very often situation for me, and I think that it can be replaced with one awk call. The question is more general, not the exact one. echo "A B C/D" | awk '{print $3}' | awk -F/ '{print... (4 Replies)
Discussion started by: mirusnet
4 Replies

4. Shell Programming and Scripting

Combine awk commands into one

my code: gawk 'NR>'"${LASTLINENUM}"' && NR<='"${LINEENDNUM}"'' ${LOGFILE} | gawk '{l=$0;} /'"${STRING1}"'/ && /'"${STRING2}"'/ {for (i=NR-'"${BEFOREGLAF}"'; i<=NR+'"${AFTERGLAF}"'; i++) o=i; t++;} END { for(i=1; i<=NR; i++) if (o) print l; print t+=0;}' i would like to combine this into one... (5 Replies)
Discussion started by: SkySmart
5 Replies

5. Shell Programming and Scripting

Combine these two into one liner awk?

ignore the simplicity of the foo file, my actual file is much more hardcore but this should give you the jist of it. need to combine the two awks into one liner. essentially, need to return the value of one particular field in a file that has multiple comma separated fields. thanks guys cat foo... (1 Reply)
Discussion started by: jack.bauer
1 Replies

6. Shell Programming and Scripting

[awk] combine and convert time from log files

dear all, an awk newbie need your help.... i have log files with this format: mylog1a.log: "08/10/2012","5:05 PM" "Hostname","Device Address","Count" "","10.10.10.18","10234" mylog2a.log: "08/11/2012","5:05 PM" "Hostname","Device Address","Count" "","10.10.10.18","12543" ... (18 Replies)
Discussion started by: makan
18 Replies

7. Shell Programming and Scripting

combine multiple files by column into one files already sorted!

I have multiple files; each file contains a certain data in a column view simply i want to combine all those files into one file in columns example file1: a b c d file 2: 1 2 3 4 file 3: G (4 Replies)
Discussion started by: ahmedamro
4 Replies

8. Shell Programming and Scripting

combine awk and tr -d

Hi Everyone, awk 'BEGIN{print strftime("%c",1272814948)}' | tr -d '\n' how to change tr -d '\n' to be part of the awk? means awk this pchoh time, and awk also remove '\n', instead of using "|" to combine "tr" command. Thanks (2 Replies)
Discussion started by: jimmy_y
2 Replies

9. Shell Programming and Scripting

Combine awk statements

I have an awk statement that works but I am calling awk twice and I know there has to be a way to combine the two statements into one. The purpose is to pull out just the ip address from loopback1. cat config.txt | nawk 'BEGIN {FS="\n"}{RS="!"}{if ( $0 ~ "interface loopback1" ) print$4}' | nawk... (5 Replies)
Discussion started by: numele
5 Replies

10. Shell Programming and Scripting

awk : combine 3 variables into 1

Within one of my awk scripts, I have three variables extracted and calculated on. When done, I simply want to combine the three. The following works, but looks weird. My script reads a field that has text and numbers, knowing the last four comprise MMYY (month and year) # YY are last two... (2 Replies)
Discussion started by: joeyg
2 Replies
Login or Register to Ask a Question