Finding reciprocal columns

02-01-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by RawToast

@RudiC - I tried:

Code:

awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]}' test.txt > test_out.txt

@Don - I am on OSX 10.8.5 - my awk command usually works for simple stuff like:

Code:

awk '{print $1}' file

Do you see any difference between what you used:

Code:

awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]}' test.txt > test_out.txt

and what RudiC suggested:

Code:

awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]}' test.txt test.txt > test_out.txt

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-02-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

To print 0 if a reciprocal value is missing, force into numerical mode: ...{print $0, T[$2,$1]+0} .

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

02-02-2014

Registered User

4, 0

Join Date: Feb 2014

Last Activity: 3 February 2014, 7:22 AM EST

Posts: 4

Thanks Given: 0

Thanked 0 Times in 0 Posts

@Don - O I C... thanks.

@all- OK I feel like I almost have this working, but still isn't perfect. I tried the following as command (note inclusion of Scrutinizer's suggestion):

Code:

awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]+0}' test.txt test.txt

the input file test.txt has the following (delimiters are tabs):

Code:

A	A	200
A	B	100
A	C	90
B	B	203
B	A	101
B	C	87
C	C	300
C	A	91
C	B	86

The output I get is:

Code:

This is pretty close to what I want, but if I could get the returned value on the same line that would be great. Not sure why it isn't on the same line, could this be a formatting issue with the input file?

The output I am looking for is:

Code:

A	A	200     0
A	B	100     101
A	C	90     91
B	B	203     0
B	A	101     100
B	C	87     86
C	C	300     0
C	A	91     90
C	B	86     87

Actually have a 0 for the 4th column for rows without reciprocals isn't essential. So

Code:

A    A    200   200
A    B    100    101

would be fine also.

Thanks for all you help with this guys.

Last edited by RawToast; 02-02-2014 at 07:14 AM..

RawToast

View Public Profile for RawToast

Find all posts by RawToast

02-02-2014

Moderator

1,837, 668

Join Date: Nov 2012

Last Activity: 30 June 2020, 12:07 PM EDT

Posts: 1,837

Thanks Given: 180

Thanked 668 Times in 590 Posts

Try :

Code:

$ awk 'BEGIN{T["NaN"]=0}NR==FNR{T[$1 FS $2]=$3; next} {print $0, T[$2 FS $1 == $1 FS $2 ? "NaN": $2 FS $1]}' file file

OR

Code:

$ awk 'NR==FNR{T[$1 FS $2]=$3; next} {print $0, $1 FS $2 == $2 FS $1 ? 0 : T[$2 FS $1]}' file fle

Code:

A	A	200 0
A	B	100 101
A	C	90 91
B	B	203 0
B	A	101 100
B	C	87 86
C	C	300 0
C	A	91 90
C	B	86 87

This is my understanding about reciprocal

For example, the reciprocal of 2/3 is 3/2 (or 1-1/2) , because 2/3 x 3/2 = 1. The reciprocal of 7 is 1/7 because 7 x 1/7 = 1.

Akshay Hegde

View Public Profile for Akshay Hegde

Find all posts by Akshay Hegde

02-02-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Ok, so now we have a clear understanding of what the outout should look like. So you want values that have themselves as counter values to be 0 in the last column. I noticed in your output there was also a zero, presumably from an empty line. But I don not know why some parts start on a new line. To counteract the empty line and to cater for entries that might lack a reversed counterpart I used NF and reintroduced the +0 bit. I took RudiC's and Akshay's suggestions and made a couple of changes:

Code:

awk 'NR==FNR{T[$1,$2]=$3; next} NF{print $0, ($1 == $2) ? 0 : T[$2,$1]+0}'  file file

Last edited by Scrutinizer; 02-02-2014 at 12:28 PM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

02-02-2014

Registered User

2,288, 480

Join Date: Apr 2007

Last Activity: 3 May 2020, 8:28 AM EDT

Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris

Posts: 2,288

Thanks Given: 430

Thanked 480 Times in 395 Posts

Hi.

Here is a script that breaks down the tasks into pieces of work. Each step also writes the intermediate output so that the form can be examined.

Code:

#!/usr/bin/env bash

# @(#) s1	Demonstrate join of first 2 columns, ignoring order of tokens.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk sed

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
awk '
	{ if ( $1 < $2 ) {
	  first = $1 ; second = $2
	} else {
	  first = $2 ; second = $1
	}
	s1 = first"_"second
	print s1, $3
    }
' $FILE |
tee f1 |
sort |
tee f2 |
awk '
BEGIN	{ FS = OFS = " " ; previous = "" ; line = "" ; first = "true"}
first == "true" { first = "false" ; previous = $1 ; line = $0 ; next }
$1 == previous	{ line = line " " $2 ; next }
		{ print line ; previous = $1 ; line = $0 }
END	{ print line }
' |
tee f3 |
awk '
NF <= 2	{ print $0, 0; next }
	{ print }
' |
tee f4 |
sed 's/_/ /'

exit 0

producing:

Code:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5
sed GNU sed version 4.1.5

-----
 Input data file data1:
A  A  200
A  B  100
A  C  90
B  B  203
B  A  101
B  C  87
C  C  300
C  A  91
C  B  86

-----
 Results:
A A 200 0
A B 100 101
A C 90 91
B B 203 0
B C 86 87
C C 300 0

The tasks are:
1) Make lines into a canonical form in the sense of arranging the first 2 columns in a specific order. Also join the first two columns so that future tasks can be easily done.

2) Sort the items,

3) Collect contiguous lines that have an identical first column, a kind of self-join,

4) Add a trailing "0" if the number of fields is not appropriate,

5) Separate the first 2 columns.

I agree with implicit remark of Akshay Hegde that reciprocal is not a word I would use to describe this.

Best wishes ... cheers, drl

PS This also works on OS-X:

Code:

$ ./s1

Environment: LC_ALL = POSIX, LANG = POSIX
(Versions displayed with local utility "version")
OSX 10.3.9
bash GNU bash 2.05b.0
grep - ( /usr/bin/grep, 29 Aug 2008 )
awk - ( /usr/bin/awk, 29 Aug 2008 )
sed - ( /usr/bin/sed, 29 Aug 2008 )

-----
 Input data file data1:
A  A  200
A  B  100
A  C  90
B  B  203
B  A  101
B  C  87
C  C  300
C  A  91
C  B  86

-----
 Results:
A A 200 0
A B 100 101
A C 90 91
B B 203 0
B C 86 87
C C 300 0

Last edited by drl; 02-02-2014 at 11:14 AM..

drl

View Public Profile for drl

Find all posts by drl

02-02-2014

Registered User

1,709, 666

Join Date: Jan 2013

Last Activity: 20 May 2020, 1:43 PM EDT

Location: Loughborough

Posts: 1,709

Thanks Given: 838

Thanked 666 Times in 467 Posts

Hi drl...

I might be missing something here but you have created two versions of the db()
function, one being a NOP, and don't seem to call either of them.

wisecracker

View Public Profile for wisecracker

Find all posts by wisecracker

UNIX for Dummies Questions & Answers

Finding reciprocal columns

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Discussion started by: Sanchari

2. Shell Programming and Scripting

Finding difference between two columns of unequal length

Discussion started by: jamie_123

3. Shell Programming and Scripting

UNIX scripting for finding duplicates and null records in pk columns

Discussion started by: praveenraj.1991

4. Shell Programming and Scripting

Finding value bigger than zero in all columns

Discussion started by: amits22

5. Shell Programming and Scripting

Finding standard deviation for all columns in a data file

Discussion started by: ks_reddy

6. Shell Programming and Scripting

finding duplicates in csv based on key columns

Discussion started by: baskivs

7. Shell Programming and Scripting

Matching same columns and finding the smallest match

Discussion started by: phil_heath

8. Shell Programming and Scripting

finding duplicates in columns and removing lines

Discussion started by: totus

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Discussion started by: jerome Sukumar