How to select rows that have opposite values (A vs B, or B vs A) on first two columns?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers How to select rows that have opposite values (A vs B, or B vs A) on first two columns?
# 1  
Old 03-31-2017
How to select rows that have opposite values (A vs B, or B vs A) on first two columns?

I have a dateset like this:
Code:
Gly1  Gly2  2  1  0
Gly3  Gly4  3  4  5
Gly3  Gly5  1  3  2
Gly2  Gly1  3  6  2
Gly4  Gly3  2  2  1
Gly6  Gly4  4  2  1

what I expected is:

Code:
Gly1  Gly2  2  1  0
Gly2  Gly1  3  6  2
Gly3  Gly4  3  4  5
Gly4  Gly3  2  2  1

A vs B, or B vs A are the same comparsion, I want to extract those lines and merge them.
How can I do with this? Thanks in advance.

It's blastn ouput dataset like this, more than 300,000 lines.
Code:
Glyma.10G168000 Glyma.10G168600 99.09 220 2 0 87 306 726 945 9e-110 396
Glyma.10G170700 Glyma.09G251300 91.00 200 18 0 115 314 130 329 6e-72 270
Glyma.10G051500 Glyma.02G058400 79.63 486 99 0 101 586 86 571 4e-95 350
Glyma.10G088600 Glyma.10G085900 98.47 522 8 0 1 522 1 522 0.0 920
Glyma.10G088600 Glyma.10G086200 96.93 522 16 0 1 522 1 522 0.0 876
Glyma.10G088600 Glyma.10G086300 96.93 522 16 0 1 522 1 522 0.0 876


Last edited by Don Cragun; 04-03-2017 at 05:52 PM.. Reason: Add missing CODE tags.
# 2  
Old 03-31-2017
Code:
awk 'FNR==NR {a[$1,$2]=$0;next} (($2,$1) in a)'  myFile myFile


Last edited by vgersh99; 03-31-2017 at 09:10 PM..
This User Gave Thanks to vgersh99 For This Post:
# 3  
Old 03-31-2017
You don't need to read the file twice... Try also:
Code:
awk '
($1, $2) in x {
	print x[$1, $2]
	print
	delete x[$1, $2]
	next
}
{	x[$2, $1] = $0
}' dataset

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
These 2 Users Gave Thanks to Don Cragun For This Post:
# 4  
Old 03-31-2017
Hello nengcheng,

Could you please try following and let me know if this helps you.
In case your Input_file has always string named Gly in it then following may help you in same.
Code:
awk '{A[$1,$2]=$0;gsub(/[a-zA-Z]/,"",$1);gsub(/[a-zA-Z]/,"",$2);MAX=$1>$2?$1:$2;VAL=VAL>MAX?VAL:MAX} END{for(i=1;i<VAL;i++){if(A["Gly"i,"Gly"i+1] && A["Gly"i+1,"Gly"i]){print A["Gly"i,"Gly"i+1] RS A["Gly"i+1,"Gly"i]}}}'  Input_file

If you don't want to hardcode any string values and your Input_file is same as your sample file which you have shown to us then following may help you in same.
Code:
awk '{LINE=$0;gsub(/[a-zA-Z]/,"",$1);gsub(/[a-zA-Z]/,"",$2);A[$1,$2]=LINE;MAX=$1>$2?$1:$2;VAL=VAL>MAX?VAL:MAX} END{for(i=1;i<VAL;i++){if(A[i,i+1] && A[i+1,i]){print A[i,i+1] RS A[i+1,i]}}}'  Input_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 04-01-2017
Code:
$ 
$ cat dataset.txt
Gly1  Gly2  2  1  0
Gly3  Gly4  3  4  5
Gly3  Gly5  1  3  2
Gly2  Gly1  3  6  2
Gly4  Gly3  2  2  1
Gly6  Gly4  4  2  1
$ 
$ perl -lane 'if (defined $x{"@F[1,0]"}){print $x{"@F[1,0]"}; print $_}; $x{"@F[0,1]"} = $_' dataset.txt
Gly1  Gly2  2  1  0
Gly2  Gly1  3  6  2
Gly3  Gly4  3  4  5
Gly4  Gly3  2  2  1
$ 
$

This User Gave Thanks to durden_tyler For This Post:
# 6  
Old 04-03-2017
Quote:
Originally Posted by Don Cragun
You don't need to read the file twice... Try also:
Code:
awk '
($1, $2) in x {
    print x[$1, $2]
    print
    delete x[$1, $2]
    next
}
{    x[$2, $1] = $0
}' dataset

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
Thanks very much! I add more details. so far, I think your answer gives the result I expected. [1] 173372
I am beginner, one more question, How do I output the rest of the lines from my file.
# 7  
Old 04-03-2017
Quote:
Originally Posted by nengcheng
Thanks very much! I add more details. so far, I think your answer gives the result I expected. [1] 173372
I am beginner, one more question, How do I output the rest of the lines from my file.
I thought the purpose of your request was to extract matched pairs of lines and that is what the code I suggested does. I don't know what:
Quote:
It's blastn ouput dataset like this, more than 300,000 lines.
Code:
Glyma.10G168000 Glyma.10G168600 99.09 220 2 0 87 306 726 945 9e-110 396
Glyma.10G170700 Glyma.09G251300 91.00 200 18 0 115 314 130 329 6e-72 270
Glyma.10G051500 Glyma.02G058400 79.63 486 99 0 101 586 86 571 4e-95 350
Glyma.10G088600 Glyma.10G085900 98.47 522 8 0 1 522 1 522 0.0 920
Glyma.10G088600 Glyma.10G086200 96.93 522 16 0 1 522 1 522 0.0 876
Glyma.10G088600 Glyma.10G086300 96.93 522 16 0 1 522 1 522 0.0 876

means, or where this data came from. (It certainly is not present in the sample dataset you provided and it is not output that would be produced by the script I suggested in post #3 in this thread.)

If your requirements have changed, please start a new thread with your new problem and clearly describe the input data you are going to be processing, describe the output(s) you want to produce from that input, provide a representative sample input dataset (in CODE tags), show us the output(s) that should be produced from that sample input (in CODE tags), and show us the code that you have written to try to solve your problem (in CODE tags).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with shell script: selecting rows that have the same values in two columns

Hello, everyone I am beginner for shell programming. I want to print all lines that have the same values in first two columns data: a b 1 2 a a 3 4 b b 5 6 a b 4 6 what I expected is : a a 3 4 b b 5 6 but I searched for one hour in... (2 Replies)
Discussion started by: nengcheng
2 Replies

2. Shell Programming and Scripting

Extract rows with different values at 2 columns

Hallo, I would need to extract only rows which has different value in the second and third column. Thank you very much for any advices Input: A 0 0 B 0 1 C 1 1 D 1 3 Output B 0 1 D 1 3 (4 Replies)
Discussion started by: kamcamonty
4 Replies

3. Shell Programming and Scripting

Compare 2 csv files by columns, then extract certain columns of matcing rows

Hi all, I'm pretty much a newbie to UNIX. I would appreciate any help with UNIX coding on comparing two large csv files (greater than 10 GB in size), and output a file with matching columns. I want to compare file1 and file2 by 'id' and 'chain' columns, then extract exact matching rows'... (5 Replies)
Discussion started by: bkane3
5 Replies

4. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Hi all, I know this sounds suspiciously like a homework course; but, it is not. My goal is to take a file, and match my "ID" column to the "Date" column, if those conditions are true, add the total number of minutes worked and place it in this file, while not printing the original rows that I... (6 Replies)
Discussion started by: mtucker6784
6 Replies

5. Shell Programming and Scripting

Deleting all the fields(columns) from a .csv file if all rows in that columns are blanks

Hi Friends, I have come across some files where some of the columns don not have data. Key, Data1,Data2,Data3,Data4,Data5 A,5,6,,10,, A,3,4,,3,, B,1,,4,5,, B,2,,3,4,, If we see the above data on Data5 column do not have any row got filled. So remove only that column(Here Data5) and... (4 Replies)
Discussion started by: ks_reddy
4 Replies

6. UNIX for Dummies Questions & Answers

Extracting rows from a text file based on the values of two columns (given ranges)

Hi, I have a tab delimited text file with multiple columns. The second and third columns include numbers that have not been sorted. I want to extract rows where the second column includes a value between -0.01 and 0.01 (including both numbers) and the first third column includes a value between... (1 Reply)
Discussion started by: evelibertine
1 Replies

7. Shell Programming and Scripting

Extract values from a matrix given the rows and columns

Hi All, I have a huge (and its really huge!) matrix about 400GB in size (2 million rows by 1.5 million columns) . I am trying to optimize its space by creating a sparse representation of it. Miniature version of the matrix looks like this (matrix.mtx): 3.4543 65.7876 54.564 2.12344... (4 Replies)
Discussion started by: shoaibjameel123
4 Replies

8. Shell Programming and Scripting

Selecting rows based on values in columns

Hi My pipe delimited .txt file contains rows with 10 columns. Can anyone advise how I output to file only those rows with the letters ‘ci' as the first 2 characters in the 3rd column ? Many thanks (4 Replies)
Discussion started by: malts18
4 Replies

9. Programming

SQL: the opposite of "SELECT now() -interval 1 day"

Hi there, if i run SELECT now() -interval 1 day I get all items within the last 24 hours. How would I reverse/adjust this so that i get everything that ISNT in the last 24 hours ? any help on this would be greatly appreciated Cheers (1 Reply)
Discussion started by: rethink
1 Replies

10. Shell Programming and Scripting

perl script to print to values in rows and columns

Hi guys I want to print the values by using this script but its giving the no of rows and columns as input instead of values Would you plz help me on this FILE- chr1.txt 1981 1 1971 1 1961 1 1941 1 perl script #!/usr/bin/perl -w $infile1 = 'chr1.txt'; $outfile3 = 'out3.txt'; ... (3 Replies)
Discussion started by: nogu0001
3 Replies
Login or Register to Ask a Question