@all- OK I feel like I almost have this working, but still isn't perfect. I tried the following as command (note inclusion of Scrutinizer's suggestion):
the input file test.txt has the following (delimiters are tabs):
Code:
A A 200
A B 100
A C 90
B B 203
B A 101
B C 87
C C 300
C A 91
C B 86
The output I get is:
Code:
A A 200
200
A B 100
101
A C 90
91
B B 203
203
B A 101
100
B C 87
86
C C 300
300
C A 91
90
C B 86
87
0
This is pretty close to what I want, but if I could get the returned value on the same line that would be great. Not sure why it isn't on the same line, could this be a formatting issue with the input file?
The output I am looking for is:
Code:
A A 200 0
A B 100 101
A C 90 91
B B 203 0
B A 101 100
B C 87 86
C C 300 0
C A 91 90
C B 86 87
Actually have a 0 for the 4th column for rows without reciprocals isn't essential. So
Ok, so now we have a clear understanding of what the outout should look like. So you want values that have themselves as counter values to be 0 in the last column. I noticed in your output there was also a zero, presumably from an empty line. But I don not know why some parts start on a new line. To counteract the empty line and to cater for entries that might lack a reversed counterpart I used NF and reintroduced the +0 bit. I took RudiC's and Akshay's suggestions and made a couple of changes:
Location: Saint Paul, MN USA / BSD, CentOS, Debian, OS X, Solaris
Posts: 2,288
Thanks Given: 430
Thanked 480 Times in 395 Posts
Hi.
Here is a script that breaks down the tasks into pieces of work. Each step also writes the intermediate output so that the form can be examined.
Code:
#!/usr/bin/env bash
# @(#) s1 Demonstrate join of first 2 columns, ignoring order of tokens.
# Utility functions: print-as-echo, print-line-with-visual-space, debug.
# export PATH="/usr/local/bin:/usr/bin:/bin"
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f $C ] && $C awk sed
FILE=${1-data1}
pl " Input data file $FILE:"
cat $FILE
pl " Results:"
awk '
{ if ( $1 < $2 ) {
first = $1 ; second = $2
} else {
first = $2 ; second = $1
}
s1 = first"_"second
print s1, $3
}
' $FILE |
tee f1 |
sort |
tee f2 |
awk '
BEGIN { FS = OFS = " " ; previous = "" ; line = "" ; first = "true"}
first == "true" { first = "false" ; previous = $1 ; line = $0 ; next }
$1 == previous { line = line " " $2 ; next }
{ print line ; previous = $1 ; line = $0 }
END { print line }
' |
tee f3 |
awk '
NF <= 2 { print $0, 0; next }
{ print }
' |
tee f4 |
sed 's/_/ /'
exit 0
producing:
Code:
$ ./s1
Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 2.6.26-2-amd64, x86_64
Distribution : Debian 5.0.8 (lenny, workstation)
bash GNU bash 3.2.39
awk GNU Awk 3.1.5
sed GNU sed version 4.1.5
-----
Input data file data1:
A A 200
A B 100
A C 90
B B 203
B A 101
B C 87
C C 300
C A 91
C B 86
-----
Results:
A A 200 0
A B 100 101
A C 90 91
B B 203 0
B C 86 87
C C 300 0
The tasks are:
1) Make lines into a canonical form in the sense of arranging the first 2 columns in a specific order. Also join the first two columns so that future tasks can be easily done.
2) Sort the items,
3) Collect contiguous lines that have an identical first column, a kind of self-join,
4) Add a trailing "0" if the number of fields is not appropriate,
5) Separate the first 2 columns.
I agree with implicit remark of Akshay Hegde that reciprocal is not a word I would use to describe this.
Best wishes ... cheers, drl
PS This also works on OS-X:
Code:
$ ./s1
Environment: LC_ALL = POSIX, LANG = POSIX
(Versions displayed with local utility "version")
OSX 10.3.9
bash GNU bash 2.05b.0
grep - ( /usr/bin/grep, 29 Aug 2008 )
awk - ( /usr/bin/awk, 29 Aug 2008 )
sed - ( /usr/bin/sed, 29 Aug 2008 )
-----
Input data file data1:
A A 200
A B 100
A C 90
B B 203
B A 101
B C 87
C C 300
C A 91
C B 86
-----
Results:
A A 200 0
A B 100 101
A C 90 91
B B 203 0
B C 86 87
C C 300 0
Hello, I need to find the intersection across 10 columns. Kindly help.
my file (INPUT.csv) looks like this
4_R 4_S 8_R 8_S 12_R 12_S 24_R 24_S
LOC_Os01g01010 LOC_Os01g01010 LOC_Os01g01010 LOC_Os04g48290 LOC_Os01g01010 LOC_Os01g01010... (1 Reply)
Hi,
I have a requirement.for eg: i have a text file with pipe symbol as delimiter(|) with 4 columns a,b,c,d. Here a and b are primary key columns..
i want to process that file to find the duplicates and null values are in primary key columns(a,b) . I want to write the unique records in which... (5 Replies)
Hi everybody,
I am a complete novice and please forgive if its answered gazillion times
I have a file which looks like this
1 0 2 0 0 0 0 0
0 3 0 1 18 2 6 0
1 7 0 2 4 0 0 0
1 17 16 1 1 0 0
I have to add... (4 Replies)
Hi All,
I want someone to modify the below script from this forum so that it can be used for all columns in the file( instead of only printing column 3 mean and standard deviation values). I don't know how to loop around all the columns.
... (3 Replies)
Hi team,
I have 20 columns csv files. i want to find the duplicates in that file based on the column1 column10 column4 column6 coulnn8 coulunm2 . if those columns have same values . then it should be a duplicate record.
can one help me on finding the duplicates,
Thanks in advance.
... (2 Replies)
Hi all,
I am wondering if its possible to solve my problem with a simple code.
Basically I have a file that looks like this (tab delimited)
bob 8 250 tina 8 225
sam 8 225 ellen 9 315
kyle 9 275 sally 9 135
So what I want to do is match columns 2 and 5. If columns 2 and 5... (2 Replies)
I am trying to figure out how to scan a file like so:
1 ralphs office","555-555-5555","ralph@mail.com","www.ralph.com
2 margies office","555-555-5555","ralph@mail.com","www.ralph.com
3 kims office","555-555-5555","kims@mail.com","www.ralph.com
4 tims... (17 Replies)
Hi,
I have a challenging task,in which i have to find the duplicate files by its name and size,then i need to take anyone of the file.Then i need to open the file and find for more than one pattern and count of that pattern.
Note:These are the samples of two files,but i can have more... (2 Replies)