How to combine two files with awk?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to combine two files with awk?
# 8  
Old 10-26-2012
---------- Post updated at 10:53 AM ---------- Previous update was at 10:42 AM ----------

Quote:
Originally Posted by Don Cragun
The following awk script:
Code:
awk 'BEGIN{OFS = "\t"}
{       if($1 in out) out[$1] = out[$1] OFS $2 OFS $3
        else if(NR == FNR) {
                out[$1] = $1 OFS $2 OFS $3
                order[++oc] = $1
        } else {out[$1] = $1 OFS OFS OFS $2 OFS $3
                order[++oc] = $1
        }
}
END{for(i = 1; i <= oc; print out[order[i++]]);}' file1 file2

Nice code! Thank you very much!

There is new problem in my file.
file1
HTML Code:
AAAA    23      45
AAAA    101     203
AAAB    44      56
AAAC    34      65
AAAD    34      87
when I use the awk command file above, it showed the results like this:

HTML Code:
AAAA	23	45		101	203		34	54	
AAAB	44	56	
AAAC	34	65	
AAAD	34	87	
AAA3				34	56	
How can I get a result like this
HTML Code:
AAAA	23	45	34	54	
AAAA    101	203	
AAAB	44	56	
AAAC	34	65	
AAAD	34	87	
AAA3			34	56	
# 9  
Old 10-26-2012
If you don't have any problem with spacings..Smilie

Code:
awk 'FNR==NR{if(!a[$1]){a[$1]=$0}else{a[$0]=$0};next}{if(a[$1]){print a[$1],$2,$3;delete a[$1]}else{print $1"\t"$2,$3}}END{for(i in a){if(a[i]){print a[i]}}}' file1 file2

This User Gave Thanks to pamu For This Post:
# 10  
Old 10-26-2012
Quote:
Originally Posted by pamu
If you don't have any problem with spacings..Smilie

Code:
awk 'FNR==NR{if(!a[$1]){a[$1]=$0}else{a[$0]=$0};next}{if(a[$1]){print a[$1],$2,$3;delete a[$1]}else{print $1"\t"$2,$3}}END{for(i in a){if(a[i]){print a[i]}}}' file1 file2

Thank you!Smilie
The space is not a problem to me, but I try to figure out which file the data come from.
# 11  
Old 10-26-2012
Quote:
Originally Posted by xshang
Thank you!Smilie
The space is not a problem to me, but I try to figure out which file the data come from.
Ok. Try this.. It will certainly give idea about the data. From which file it is coming.(not tested- You can change spacing as how you want)

Code:
awk 'FNR==NR{if(!a[$1]){a[$1]=$1"\t"$2"\t"$3}else{a[$0]=$1"\t"$2"\t"$3};next}{if(a[$1]){print a[$1],$2,$3;delete a[$1]}else{print $1"\t\t"$2,$3}}END{for(i in a){if(a[i]){print a[i]}}}' OFS="\t" file1 file2

This User Gave Thanks to pamu For This Post:
# 12  
Old 10-26-2012
Quote:
Originally Posted by pamu
Ok. Try this.. It will certainly give idea about the data. From which file it is coming.(not tested- You can change spacing as how you want)

Code:
awk 'FNR==NR{if(!a[$1]){a[$1]=$1"\t"$2"\t"$3}else{a[$0]=$1"\t"$2"\t"$3};next}{if(a[$1]){print a[$1],$2,$3;delete a[$1]}else{print $1"\t\t"$2,$3}}END{for(i in a){if(a[i]){print a[i]}}}' OFS="\t" file1 file2

It works! Thank you! I will study the code.
# 13  
Old 10-27-2012
When I tried pamu's awk script with the input files:
Code:
file1:
AAAA 23 45
AAAA 101 203
AAAB 44 56
AAAC 34 65
AAAD 34 87

file2:
AAAA 34 54
AAAA 201 202
AAAA 301 302
AAAE 34 56
AAAE 234 456

it produced the following output:
Code:
AAAA	23	45	34	54
AAAA		201	202
AAAA		301	302
AAAE		34	56
AAAE		234	456
AAAB	44	56
AAAC	34	65
AAAD	34	87
AAAA	101	203

I was surprised that the lines marked in color weren't combined. So, I updated (OK, given the difference introduce by the change in requirements, rewrote) my script so that with the same input files it produces the following output:
Code:
AAAA	23	45	34	54
AAAA	101	203	201	202
AAAA			301	302
AAAB	44	56
AAAC	34	65
AAAD	34	87
AAAE			34	56
AAAE			234	456

with the corresponding line marked in the same color.

The script I used to produce this is:
Code:
awk 'BEGIN{OFS = "\t"}
{       # Data dictionary:
        #       d[NR]   data from input fields 2 and 3 input record NF
        #       d23[$1] CSL of d[] subscripts to display in output fields 2&3
        #               for a given value of input field 1
        #       d45[$1] CSL of d[] subscripts to display in output fields 4&5
        #               for a given value of input field 1
        #       i, j    loop control
        #       m       max(o23c, o45c)
        #       o1[x]   order of appearances of different input field 1 values
        #       o1c     # of entries in o1[]
        #       o23[x]  array of entries from d23[] for current field 1 output
        #               procssing
        #       o23c    # of entries in o23[]
        #       o45[x]  array of entries from d45[] for current field 1 output
        #               procssing
        #       o45c    # of entries in o45[]
        #       out[$1] Array with indices of field 1 values seen
        # Gather data from both files. Output fields 2&3 will be merged with
        # output fields 4&5 in the END processing.
        d[NR] = OFS $2 OFS $3
        if(!($1 in f1)) {
                o1[++o1c] = $1
                f1[$1]
        }
        if(FNR == NR)
                # We are working on the 1st file (output fields 2&3)
                d23[$1] = d23[$1] ? d23[$1] "," NR : NR
         else   # We are working on the 2nd file (output fields 4&5)
                d45[$1] = d45[$1] ? d45[$1] "," NR : NR

}
END {   # Print the gathered data
        for(i = 1; i <= o1c; i++) {
                # Split the lists for the field 1 value associated with o1[i].
                m = o23c = split(d23[o1[i]], o23, ",")
                o45c = split(d45[o1[i]], o45, ",")
                if(o45c > m) m = o45c
                for(j = 1; j <= m; j++) {
                        # Print the field 1, 2&3, and 4&5 data in the order it
                        # was seen in the input files.
                        printf("%s%s%s\n", o1[i],
                                j > o23c ? OFS OFS : d[o23[j]],
                                j > o45c ? "" : d[o45[j]])
                }
        }
}' file1 file2

If the comments are stripped and it is converted to a single line awk script, it is about 50% bigger than pamu's script. The increase in size is partly due to processing to present output lines in the same order as the first field first appeared in one of the input files. I hope that the comments and indentation will help you understand what the script is doing more easily than reading pamu's single line awk script.
This User Gave Thanks to Don Cragun For This Post:
# 14  
Old 10-29-2012
Hi Don,

I surely love your comment's, It is very useful to understand any script..Smilie
I really like your approach. Testing against hardest input.

Thanks.

This is my script..

Code:
awk 'BEGIN{OFS = "\t"}
{
    if(FNR == NR) 
        {a[$1,++X[$1]]=$1 OFS $2 OFS $3;next}}
            
    {if(a[$1,++Y[$1]])
            {print a[$1,Y[$1]],$2,$3;delete a[$1,Y[$1]]}
            else{print $1"\t\t\t"$2,$3}}
    END{for(i in a){if(a[i]){print a[i]}}
}' file1 file2

I am sorry. I have not added any comments(i don't think we really need comments here..Smilie)
(Sorry for late reply, i m away from computer for sat-sun)

Thanks,
pamuSmilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies

2. UNIX for Beginners Questions & Answers

Combine awk scripts

Hi, Below command is working as expected, but would like to know how to club the two AWK scripts in the command into one echo -e "MMS000101S0203430A|20180412E|\nMMB0001INVESTMENT||107-86193-01-03|\nMMB0001FUND||107-86193-04-01|\nMMC9991 " | awk -F'|' -v OFS=, '/^MMC9991/{print r"|"s,t; next}... (3 Replies)
Discussion started by: JSKOBS
3 Replies

3. Shell Programming and Scripting

Combine two awk commands

Hi, Can someone please guide me how to combine the following two awk calls in one? I noticed that it is very often situation for me, and I think that it can be replaced with one awk call. The question is more general, not the exact one. echo "A B C/D" | awk '{print $3}' | awk -F/ '{print... (4 Replies)
Discussion started by: mirusnet
4 Replies

4. Shell Programming and Scripting

Combine awk commands into one

my code: gawk 'NR>'"${LASTLINENUM}"' && NR<='"${LINEENDNUM}"'' ${LOGFILE} | gawk '{l=$0;} /'"${STRING1}"'/ && /'"${STRING2}"'/ {for (i=NR-'"${BEFOREGLAF}"'; i<=NR+'"${AFTERGLAF}"'; i++) o=i; t++;} END { for(i=1; i<=NR; i++) if (o) print l; print t+=0;}' i would like to combine this into one... (5 Replies)
Discussion started by: SkySmart
5 Replies

5. Shell Programming and Scripting

Combine these two into one liner awk?

ignore the simplicity of the foo file, my actual file is much more hardcore but this should give you the jist of it. need to combine the two awks into one liner. essentially, need to return the value of one particular field in a file that has multiple comma separated fields. thanks guys cat foo... (1 Reply)
Discussion started by: jack.bauer
1 Replies

6. Shell Programming and Scripting

[awk] combine and convert time from log files

dear all, an awk newbie need your help.... i have log files with this format: mylog1a.log: "08/10/2012","5:05 PM" "Hostname","Device Address","Count" "","10.10.10.18","10234" mylog2a.log: "08/11/2012","5:05 PM" "Hostname","Device Address","Count" "","10.10.10.18","12543" ... (18 Replies)
Discussion started by: makan
18 Replies

7. Shell Programming and Scripting

combine multiple files by column into one files already sorted!

I have multiple files; each file contains a certain data in a column view simply i want to combine all those files into one file in columns example file1: a b c d file 2: 1 2 3 4 file 3: G (4 Replies)
Discussion started by: ahmedamro
4 Replies

8. Shell Programming and Scripting

combine awk and tr -d

Hi Everyone, awk 'BEGIN{print strftime("%c",1272814948)}' | tr -d '\n' how to change tr -d '\n' to be part of the awk? means awk this pchoh time, and awk also remove '\n', instead of using "|" to combine "tr" command. Thanks (2 Replies)
Discussion started by: jimmy_y
2 Replies

9. Shell Programming and Scripting

Combine awk statements

I have an awk statement that works but I am calling awk twice and I know there has to be a way to combine the two statements into one. The purpose is to pull out just the ip address from loopback1. cat config.txt | nawk 'BEGIN {FS="\n"}{RS="!"}{if ( $0 ~ "interface loopback1" ) print$4}' | nawk... (5 Replies)
Discussion started by: numele
5 Replies

10. Shell Programming and Scripting

awk : combine 3 variables into 1

Within one of my awk scripts, I have three variables extracted and calculated on. When done, I simply want to combine the three. The following works, but looks weird. My script reads a field that has text and numbers, knowing the last four comprise MMYY (month and year) # YY are last two... (2 Replies)
Discussion started by: joeyg
2 Replies
Login or Register to Ask a Question