Common records using AWK

01-30-2012

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Common records using AWK

Hi,

To be honest, I am really impressed and amazed at the pace I find solutions for un-solved coding mysteries in this forum.

I have a file like this

input1.txt

Code:

x y z 1 2 3 
a b c 4 -3 7
k l m n 0 p
1 2 a b c 4

input2

Code:

x y z 9 0 -1
a b c 0 6 9
k l m 8 o p
1 2 a f x 9

Output

Code:

x y z 1 2 3 9 0 -1
a b c 4 -3 7 0 6 9
k l m n 0 p 8 o p
1 2 a b c 4 f x 9

The number of columns might change. To make it simple, my final output files should contain the common columns between two files and their respective varying columns side by side.

I tried using the (a[$1}=$2; next) method in awk, but it doesn't work. Can a join do this?

Any helps are appreciated.

Thanks in advance

Last edited by Franklin52; 01-31-2012 at 05:01 AM.. Reason: Please use code tags for code and data samples, thank you

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

01-30-2012

Registered User

3,733, 1,154

Join Date: Apr 2009

Last Activity: 3 August 2016, 11:03 AM EDT

Posts: 3,733

Thanks Given: 7

Thanked 1,154 Times in 1,124 Posts

Try this script:

Code:

#!/bin/bash
FILE1="input1.txt"
FILE2="input2.txt"
TR1=`awk '{for (i=1;i<=NF;i++){a[i]=a[i]" "$i}}END{for (i=1;i<=NF;i++){print a[i]}}' $FILE1`
TR2=`awk '{for (i=1;i<=NF;i++){a[i]=a[i]" "$i}}END{for (i=1;i<=NF;i++){print a[i]}}' $FILE2`
TR3=`comm -12 <(echo "$TR1"|sort) <(echo "$TR2"|sort)`
TR3="$TR3\n"`comm -23 <(echo "$TR1"|sort) <(echo "$TR2"|sort)`
TR3="$TR3\n"`comm -13 <(echo "$TR1"|sort) <(echo "$TR2"|sort)`
OUT=`echo -e "$TR3" | awk '{for (i=1;i<=NF;i++){a[i]=a[i]" "$i}}END{for (i=1;i<=NF;i++){print a[i]}}'`
echo "$OUT"

bartus11

View Public Profile for bartus11

Find all posts by bartus11

01-30-2012

Registered User

191, 46

Join Date: Jun 2008

Last Activity: 31 July 2012, 10:08 PM EDT

Location: Singapore

Posts: 191

Thanks Given: 3

Thanked 46 Times in 45 Posts

Code:

awk '{
  ind=sprintf("%s %s %s", $1, $2, $3)
  str[ind]=sprintf("%s %s %s %s", str[ind], $4, $5, $6)
}
END {
  for (i in str) {
    printf("%s%s\n", i, str[i])
  }
}' input1.txt input2.txt

This User Gave Thanks to chihung For This Post:

chihung

View Public Profile for chihung

Find all posts by chihung

01-31-2012

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

Another awk solution:

Code:

awk 'NR==FNR{a[$1,$2,$3]=$0; next} {printf a[$1,$2,$3]; $1=$2=$3=""; gsub(/^ */," ",$0); print $0}' file1 file2

Alternatively:

Code:

 cut -d" " -f4- file2 | paste -d" " file1 -

The awk solution works on unsorted files also, whereas the second one assumes sorted inputs.

Last edited by mirni; 01-31-2012 at 04:12 AM..

This User Gave Thanks to mirni For This Post:

mirni

View Public Profile for mirni

Find all posts by mirni

01-31-2012

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Quote:

Originally Posted by chihung

Code:

awk '{
  ind=sprintf("%s %s %s", $1, $2, $3)
  str[ind]=sprintf("%s %s %s %s", str[ind], $4, $5, $6)
}
END {
  for (i in str) {
    printf("%s%s\n", i, str[i])
  }
}' input1.txt input2.txt

Works perfectly.

---------- Post updated at 10:10 AM ---------- Previous update was at 10:09 AM ----------

Quote:

Originally Posted by mirni

Another awk solution:

Code:

awk 'NR==FNR{a[$1,$2,$3]=$0; next} {printf a[$1,$2,$3]; $1=$2=$3=""; gsub(/^ */," ",$0); print $0}' file1 file2

Alternatively:

Code:

 cut -d" " -f4- file2 | paste -d" " file1 -

The awk solution works on unsorted files also, whereas the second one assumes sorted inputs.

I tried the awk, works great. The cut command does the same but it pastes matches columns. Both works fine but I prefered the awk.

Life wouldn't have been easy without unix.com.

---------- Post updated at 05:08 PM ---------- Previous update was at 10:10 AM ----------

Can someone tell me if there are more then 3 common columns and still I want to match on the first three columns, how do I do it?

If my example changes to

1.txt

Code:

x y z 1 2 3 4 5 6 7 8 9
a b c d e f 9 7 8 9 90 1

2.txt

Code:

x y z 2 4 5 6 7 8 1 0 0
a b c g h i 9 3 1 4 5 6

Output

Code:

x y z 1 2 3 4 5 6 7 8 9 2 4 5 6 7 8 1 0 0
a b c d e f 9 7 8 9 90 1 g h i 9 3 1 4 5 6

I tried the following from the previous response, but no luck

Code:

awk '{ ind=sprintf("%s %s %s", $1, $2, $3); str[ind]=sprintf("%s %s %s %s %s %s %s %s", str[ind], $4, $5, $6, $7, $8, $9, $10) } END { for (i in str) { printf("%s%s\n", i, str[i]) } }' 1.txt 2.txt

Last edited by Franklin52; 02-01-2012 at 03:57 AM.. Reason: Please use code tags for code and data samples, thank you

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

01-31-2012

Registered User

686, 179

Join Date: Mar 2011

Last Activity: 17 March 2020, 9:58 PM EDT

Posts: 686

Thanks Given: 51

Thanked 179 Times in 171 Posts

In my previous post, you have two solutions.

mirni

View Public Profile for mirni

Find all posts by mirni

02-01-2012

Banned

363, 7

Join Date: Jan 2012

Last Activity: 24 June 2017, 6:25 PM EDT

Posts: 363

Thanks Given: 318

Thanked 7 Times in 7 Posts

Quote:

Originally Posted by mirni

In my previous post, you have two solutions.

I tried both of them. But, the awk script works only for matching first three columns and prints 4th, 5th and 6th columns. But, what I need is to match the first three columns and print all the records no matter how many they are.

The cut command prints each record against each record which was not helpful.

Appreciate your time. Thanks in advance.

jacobs.smith

View Public Profile for jacobs.smith

Find all posts by jacobs.smith

Shell Programming and Scripting

Common records using AWK

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk common between files

Discussion started by: genome

2. UNIX for Beginners Questions & Answers

Comparing fastq files and outputting common records

Discussion started by: Xterra

3. Shell Programming and Scripting

Compare multiple files, identify common records and combine unique values into one file

Discussion started by: nashton

4. UNIX for Dummies Questions & Answers

Values with common field in same line with awk

Discussion started by: beca123456

5. Shell Programming and Scripting

Two columns-Common records - 20 files

Discussion started by: jacobs.smith

6. Shell Programming and Scripting

Help in awk to read the common txt

Discussion started by: emily

7. UNIX for Dummies Questions & Answers

keeping last record among group of records with common fields (awk)

Discussion started by: beca123456

8. Shell Programming and Scripting

Common records

Discussion started by: jacobs.smith

9. Shell Programming and Scripting

Common records after matching on different columns

Discussion started by: jacobs.smith

10. Shell Programming and Scripting

merge based on common, awk help

Discussion started by: jkl_jkl