Finding reciprocal columns


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Finding reciprocal columns
# 8  
Old 02-01-2014
Quote:
Originally Posted by RawToast
@RudiC - I tried:

Code:
awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]}' test.txt > test_out.txt

@Don - I am on OSX 10.8.5 - my awk command usually works for simple stuff like:

Code:
awk '{print $1}' file

Do you see any difference between what you used:
Code:
awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]}' test.txt > test_out.txt

and what RudiC suggested:
Code:
awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]}' test.txt test.txt > test_out.txt

This User Gave Thanks to Don Cragun For This Post:
# 9  
Old 02-02-2014
To print 0 if a reciprocal value is missing, force into numerical mode: ...{print $0, T[$2,$1]+0} .
# 10  
Old 02-02-2014
@Don - O I C... thanks.

@all- OK I feel like I almost have this working, but still isn't perfect. I tried the following as command (note inclusion of Scrutinizer's suggestion):
Code:
awk 'NR==FNR{T[$1,$2]=$3; next} {print $0, T[$2,$1]+0}' test.txt test.txt

the input file test.txt has the following (delimiters are tabs):
Code:
A	A	200
A	B	100
A	C	90
B	B	203
B	A	101
B	C	87
C	C	300
C	A	91
C	B	86

The output I get is:
Code:
A	A	200
 200
A	B	100
 101
A	C	90
 91
B	B	203
 203
B	A	101
 100
B	C	87
 86
C	C	300
 300
C	A	91
 90
C	B	86
 87

 0

This is pretty close to what I want, but if I could get the returned value on the same line that would be great. Not sure why it isn't on the same line, could this be a formatting issue with the input file?

The output I am looking for is:
Code:
A	A	200     0
A	B	100     101
A	C	90     91
B	B	203     0
B	A	101     100
B	C	87     86
C	C	300     0
C	A	91     90
C	B	86     87

Actually have a 0 for the 4th column for rows without reciprocals isn't essential. So
Code:
A    A    200   200
A    B    100    101

would be fine also.

Thanks for all you help with this guys.

Last edited by RawToast; 02-02-2014 at 07:14 AM..
# 11  
Old 02-02-2014
Try :

Code:
$ awk 'BEGIN{T["NaN"]=0}NR==FNR{T[$1 FS $2]=$3; next} {print $0, T[$2 FS $1 == $1 FS $2 ? "NaN": $2 FS $1]}' file file

OR
Code:
$ awk 'NR==FNR{T[$1 FS $2]=$3; next} {print $0, $1 FS $2 == $2 FS $1 ? 0 : T[$2 FS $1]}' file fle

Code:
A	A	200 0
A	B	100 101
A	C	90 91
B	B	203 0
B	A	101 100
B	C	87 86
C	C	300 0
C	A	91 90
C	B	86 87


This is my understanding about reciprocal

For example, the reciprocal of 2/3 is 3/2 (or 1-1/2) , because 2/3 x 3/2 = 1. The reciprocal of 7 is 1/7 because 7 x 1/7 = 1.
# 12  
Old 02-02-2014
Ok, so now we have a clear understanding of what the outout should look like. So you want values that have themselves as counter values to be 0 in the last column. I noticed in your output there was also a zero, presumably from an empty line. But I don not know why some parts start on a new line. To counteract the empty line and to cater for entries that might lack a reversed counterpart I used NF and reintroduced the +0 bit. I took RudiC's and Akshay's suggestions and made a couple of changes:

Code:
awk 'NR==FNR{T[$1,$2]=$3; next} NF{print $0, ($1 == $2) ? 0 : T[$2,$1]+0}'  file file


Last edited by Scrutinizer; 02-02-2014 at 12:28 PM..
# 13  
Old 02-02-2014
Hi.

Here is a script that breaks down the tasks into pieces of work. Each step also writes the intermediate output so that the form can be examined.
Code:
#!/usr/bin/env bash

# @(#) s1	Demonstrate join of first 2 columns, ignoring order of tokens.

# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk sed

FILE=${1-data1}

pl " Input data file $FILE:"
cat $FILE

pl " Results:"
awk '
	{ if ( $1 < $2 ) {
	  first = $1 ; second = $2
	} else {
	  first = $2 ; second = $1
	}
	s1 = first"_"second
	print s1, $3
    }
' $FILE |
tee f1 |
sort |
tee f2 |
awk '
BEGIN	{ FS = OFS = " " ; previous = "" ; line = "" ; first = "true"}
first == "true" { first = "false" ; previous = $1 ; line = $0 ; next }
$1 == previous	{ line = line " " $2 ; next }
		{ print line ; previous = $1 ; line = $0 }
END	{ print line }
' |
tee f3 |
awk '
NF <= 2	{ print $0, 0; next }
	{ print }
' |
tee f4 |
sed 's/_/ /'

exit 0

producing:
Code:
$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution        : Debian 5.0.8 (lenny, workstation) 
bash GNU bash 3.2.39
awk GNU Awk 3.1.5
sed GNU sed version 4.1.5

-----
 Input data file data1:
A  A  200
A  B  100
A  C  90
B  B  203
B  A  101
B  C  87
C  C  300
C  A  91
C  B  86

-----
 Results:
A A 200 0
A B 100 101
A C 90 91
B B 203 0
B C 86 87
C C 300 0

The tasks are:
1) Make lines into a canonical form in the sense of arranging the first 2 columns in a specific order. Also join the first two columns so that future tasks can be easily done.

2) Sort the items,

3) Collect contiguous lines that have an identical first column, a kind of self-join,

4) Add a trailing "0" if the number of fields is not appropriate,

5) Separate the first 2 columns.

I agree with implicit remark of Akshay Hegde that reciprocal is not a word I would use to describe this.

Best wishes ... cheers, drl

PS This also works on OS-X:
Code:
$ ./s1

Environment: LC_ALL = POSIX, LANG = POSIX
(Versions displayed with local utility "version")
OSX 10.3.9
bash GNU bash 2.05b.0
grep - ( /usr/bin/grep, 29 Aug 2008 )
awk - ( /usr/bin/awk, 29 Aug 2008 )
sed - ( /usr/bin/sed, 29 Aug 2008 )

-----
 Input data file data1:
A  A  200
A  B  100
A  C  90
B  B  203
B  A  101
B  C  87
C  C  300
C  A  91
C  B  86

-----
 Results:
A A 200 0
A B 100 101
A C 90 91
B B 203 0
B C 86 87
C C 300 0


Last edited by drl; 02-02-2014 at 11:14 AM..
# 14  
Old 02-02-2014
Hi drl...

I might be missing something here but you have created two versions of the db()
function, one being a NOP, and don't seem to call either of them.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Finding common entries between 10 columns

Hello, I need to find the intersection across 10 columns. Kindly help. my file (INPUT.csv) looks like this 4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Discussion started by: Sanchari
1 Replies

2. Shell Programming and Scripting

Finding difference between two columns of unequal length

Hi, I have two files which look like this cat waitstate.txt 18.2 82.1 cat gostate.txt 5.6 5.8 6.1 6.3 6.6 6.9 7.2 7.5 (4 Replies)
Discussion started by: jamie_123
4 Replies

3. Shell Programming and Scripting

UNIX scripting for finding duplicates and null records in pk columns

Hi, I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns.. i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which... (5 Replies)
Discussion started by: praveenraj.1991
5 Replies

4. Shell Programming and Scripting

Finding value bigger than zero in all columns

Hi everybody, I am a complete novice and please forgive if its answered gazillion times I have a file which looks like this 1 0 2 0 0 0 0 0 0 3 0 1 18 2 6 0 1 7 0 2 4 0 0 0 1 17 16 1 1 0 0 I have to add... (4 Replies)
Discussion started by: amits22
4 Replies

5. Shell Programming and Scripting

Finding standard deviation for all columns in a data file

Hi All, I want someone to modify the below script from this forum so that it can be used for all columns in the file( instead of only printing column 3 mean and standard deviation values). I don't know how to loop around all the columns. ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

6. Shell Programming and Scripting

finding duplicates in csv based on key columns

Hi team, I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record. can one help me on finding the duplicates, Thanks in advance. ... (2 Replies)
Discussion started by: baskivs
2 Replies

7. Shell Programming and Scripting

Matching same columns and finding the smallest match

Hi all, I am wondering if its possible to solve my problem with a simple code. Basically I have a file that looks like this (tab delimited) bob 8 250 tina 8 225 sam 8 225 ellen 9 315 kyle 9 275 sally 9 135 So what I want to do is match columns 2 and 5. If columns 2 and 5... (2 Replies)
Discussion started by: phil_heath
2 Replies

8. Shell Programming and Scripting

finding duplicates in columns and removing lines

I am trying to figure out how to scan a file like so: 1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com 2 margies office","555-555-5555","ralph@mail.com","www.ralph.com 3 kims office","555-555-5555","kims@mail.com","www.ralph.com 4 tims... (17 Replies)
Discussion started by: totus
17 Replies

9. Shell Programming and Scripting

finding duplicate files by size and finding pattern matching and its count

Hi, I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern. Note:These are the samples of two files,but i can have more... (2 Replies)
Discussion started by: jerome Sukumar
2 Replies
Login or Register to Ask a Question