Common records using AWK


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Common records using AWK
# 1  
Old 01-30-2012
Common records using AWK

Hi,

To be honest, I am really impressed and amazed at the pace I find solutions for un-solved coding mysteries in this forum.

I have a file like this

input1.txt
Code:
x y z 1 2 3 
a b c 4 -3 7
k l m n 0 p
1 2 a b c 4

input2
Code:
x y z 9 0 -1
a b c 0 6 9
k l m 8 o p
1 2 a f x 9

Output
Code:
x y z 1 2 3 9 0 -1
a b c 4 -3 7 0 6 9
k l m n 0 p 8 o p
1 2 a b c 4 f x 9

The number of columns might change. To make it simple, my final output files should contain the common columns between two files and their respective varying columns side by side.

I tried using the (a[$1}=$2; next) method in awk, but it doesn't work. Can a join do this?

Any helps are appreciated.

Thanks in advance

Last edited by Franklin52; 01-31-2012 at 05:01 AM.. Reason: Please use code tags for code and data samples, thank you
# 2  
Old 01-30-2012
Try this script:
Code:
#!/bin/bash
FILE1="input1.txt"
FILE2="input2.txt"
TR1=`awk '{for (i=1;i<=NF;i++){a[i]=a[i]" "$i}}END{for (i=1;i<=NF;i++){print a[i]}}' $FILE1`
TR2=`awk '{for (i=1;i<=NF;i++){a[i]=a[i]" "$i}}END{for (i=1;i<=NF;i++){print a[i]}}' $FILE2`
TR3=`comm -12 <(echo "$TR1"|sort) <(echo "$TR2"|sort)`
TR3="$TR3\n"`comm -23 <(echo "$TR1"|sort) <(echo "$TR2"|sort)`
TR3="$TR3\n"`comm -13 <(echo "$TR1"|sort) <(echo "$TR2"|sort)`
OUT=`echo -e "$TR3" | awk '{for (i=1;i<=NF;i++){a[i]=a[i]" "$i}}END{for (i=1;i<=NF;i++){print a[i]}}'`
echo "$OUT"

# 3  
Old 01-30-2012
Code:
awk '{
  ind=sprintf("%s %s %s", $1, $2, $3)
  str[ind]=sprintf("%s %s %s %s", str[ind], $4, $5, $6)
}
END {
  for (i in str) {
    printf("%s%s\n", i, str[i])
  }
}' input1.txt input2.txt

This User Gave Thanks to chihung For This Post:
# 4  
Old 01-31-2012
Another awk solution:
Code:
awk 'NR==FNR{a[$1,$2,$3]=$0; next} {printf a[$1,$2,$3]; $1=$2=$3=""; gsub(/^ */," ",$0); print $0}' file1 file2

Alternatively:
Code:
 cut -d" " -f4- file2 | paste -d" " file1 -

The awk solution works on unsorted files also, whereas the second one assumes sorted inputs.

Last edited by mirni; 01-31-2012 at 04:12 AM..
This User Gave Thanks to mirni For This Post:
# 5  
Old 01-31-2012
Quote:
Originally Posted by chihung
Code:
awk '{
  ind=sprintf("%s %s %s", $1, $2, $3)
  str[ind]=sprintf("%s %s %s %s", str[ind], $4, $5, $6)
}
END {
  for (i in str) {
    printf("%s%s\n", i, str[i])
  }
}' input1.txt input2.txt

Works perfectly.

---------- Post updated at 10:10 AM ---------- Previous update was at 10:09 AM ----------

Quote:
Originally Posted by mirni
Another awk solution:
Code:
awk 'NR==FNR{a[$1,$2,$3]=$0; next} {printf a[$1,$2,$3]; $1=$2=$3=""; gsub(/^ */," ",$0); print $0}' file1 file2

Alternatively:
Code:
 cut -d" " -f4- file2 | paste -d" " file1 -

The awk solution works on unsorted files also, whereas the second one assumes sorted inputs.
I tried the awk, works great. The cut command does the same but it pastes matches columns. Both works fine but I prefered the awk.

Life wouldn't have been easy without unix.com.

---------- Post updated at 05:08 PM ---------- Previous update was at 10:10 AM ----------

Can someone tell me if there are more then 3 common columns and still I want to match on the first three columns, how do I do it?

If my example changes to

1.txt
Code:
x y z 1 2 3 4 5 6 7 8 9
a b c d e f 9 7 8 9 90 1

2.txt
Code:
x y z 2 4 5 6 7 8 1 0 0
a b c g h i 9 3 1 4 5 6

Output
Code:
x y z 1 2 3 4 5 6 7 8 9 2 4 5 6 7 8 1 0 0
a b c d e f 9 7 8 9 90 1 g h i 9 3 1 4 5 6

I tried the following from the previous response, but no luck

Code:
awk '{ ind=sprintf("%s %s %s", $1, $2, $3); str[ind]=sprintf("%s %s %s %s %s %s %s %s", str[ind], $4, $5, $6, $7, $8, $9, $10) } END { for (i in str) { printf("%s%s\n", i, str[i]) } }' 1.txt 2.txt


Last edited by Franklin52; 02-01-2012 at 03:57 AM.. Reason: Please use code tags for code and data samples, thank you
# 6  
Old 01-31-2012
In my previous post, you have two solutions.
# 7  
Old 02-01-2012
Quote:
Originally Posted by mirni
In my previous post, you have two solutions.
I tried both of them. But, the awk script works only for matching first three columns and prints 4th, 5th and 6th columns. But, what I need is to match the first three columns and print all the records no matter how many they are.

The cut command prints each record against each record which was not helpful.

Appreciate your time. Thanks in advance.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk common between files

Hello there: I want to find common among files. They all have one column. Format for data: CEU_snp_CHR21.txt 21:10758305 21:10827533 21:10913441 21:10920098 21:10952160 21:10966322 21:10985991 NAT_CHR21_variants.txt 21:10971951 (3 Replies)
Discussion started by: genome
3 Replies

2. UNIX for Beginners Questions & Answers

Comparing fastq files and outputting common records

I have two files: File_1: @M04961:22:000000000-B5VGJ:1:1101:9280:7106 1:N:0:86 GGCATGAAAACATACAAACCGTCTTTCCAGAAATTGTTCCAAGTATCGGCAACAGCTTTATCAATACCATGAAAAATATCAACCACACCAGAAGCAGCAT + GGGGGGGGGGGGGGGGGCCGGGGGF,EDFFGEDFG,@DGGCGGEGGG7DCGGGF68CGFFFGGGG@CGDGFFDFEFEFF:30CGAFFDFEFF8CAF;;8F ... (3 Replies)
Discussion started by: Xterra
3 Replies

3. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Good morning all, I have a problem that is one step beyond a standard awk compare. I would like to compare three files which have several thousand records against a fourth file. All of them have a value in each row that is identical, and one value in each of those rows which may be duplicated... (1 Reply)
Discussion started by: nashton
1 Replies

4. UNIX for Dummies Questions & Answers

Values with common field in same line with awk

Hi all ! I almost did it but got a small problem. input: cars red cars blue cars green truck black Wanted: cars red-blue-green truck black Attempt: gawk 'BEGIN{FS="\t"}{a = a (a?"-":"")$2; $2=a; print $1 FS $2}' input But I also got the intermediate records... (2 Replies)
Discussion started by: beca123456
2 Replies

5. Shell Programming and Scripting

Two columns-Common records - 20 files

Hi Friends, I have an input file like this cat input1 x 1 y 2 z 3 a 2 b 4 c 6 d 9 cat input2 x 7 h 8 k 9 l 5 m 9 d 12 (5 Replies)
Discussion started by: jacobs.smith
5 Replies

6. Shell Programming and Scripting

Help in awk to read the common txt

Dear all, I have small script which seems to be working but seems to have some bug. It suppose to read commonTxt and then print the noOfLines in outputFile. It is working for most of the txt but unable to add some of the variables values. Can somebody please spend looking at the thread and... (3 Replies)
Discussion started by: emily
3 Replies

7. UNIX for Dummies Questions & Answers

keeping last record among group of records with common fields (awk)

input: ref.1;rack.1;1 #group1 ref.1;rack.1;2 #group1 ref.1;rack.2;1 #group2 ref.2;rack.3;1 #group3 ref.2;rack.3;2 #group3 ref.2;rack.3;3 #group3 Among records from same group (i.e. with same 1st and 2nd field - separated by ";"), I would need to keep the last record... (5 Replies)
Discussion started by: beca123456
5 Replies

8. Shell Programming and Scripting

Common records

Hi, I have the following files, A M 2 3 B E 4 5 C I 5 6 D O 4 5 A M 3 4 B E 5 2 F U 7 9 J K 2 3 OUTPUT A M 2 3 3 4 B E 4 5 5 2 thanks in advance, (7 Replies)
Discussion started by: jacobs.smith
7 Replies

9. Shell Programming and Scripting

Common records after matching on different columns

Hi, I have the following files. cat 1.txt cat 2.txt output.txt The logic is as follows.... (10 Replies)
Discussion started by: jacobs.smith
10 Replies

10. Shell Programming and Scripting

merge based on common, awk help

All, $ cat x.txt z 11 az x 12 ax y 13 ay $ cat y.txt ay TT ax NN Output required: y 13 ay TT x 12 ax NN (3 Replies)
Discussion started by: jkl_jkl
3 Replies
Login or Register to Ask a Question