Matching Pairs with AWK or Shell


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching Pairs with AWK or Shell
# 1  
Old 05-28-2010
Question Matching Pairs with AWK or Shell

Hello,
I have a file Pairs.txt composed of 3 columns. For simplicity, this is the file:

A B C
B D W
X Y Z
Z W K
B A C
Y J Q
F M P
Y X Z

I am trying to reduce the file such that I only keep the first occurrence of any "pair". In the example above, I would only want to keep one occurrence of A-B, and I would want to delete B-A. The same idea for example X-Y above. (Note that I would still want to keep the last column, so if I am keeping A-B, then I still want the entire row of A-B, so I would keep A B C).

As such, the desired output:
A B C
B D W
X Y Z
Z W K
Y J Q
F M P

The only solution I could think of is to:
1. Read the entire file
2. Use a loop to read each row X, then reverse the first 2 columns of row X
3. Search for the occurrence/repeat of that reversal in the remainder of the file
3a. If it occurs, then delete this row X
3b. If it doesn't, keep row X

This can be done with C++ if I must, but I am sure there is a faster way of doing this in a script?
Is there an easier solution, and a way to implement this in a shell or awk script?

Thank you in advance!
DG
# 2  
Old 05-28-2010
Code:
awk '{tmp1=$1 $2;
         tmp2=$2 $1;
         if(tmp1 in arr || tmp2 in arr) {next}
         arr[$1 $2]=$0
       }
       END {for(i in arr) { print arr[i]}}'  inputfile

Start with this..
# 3  
Old 05-28-2010
Try the following AWK program :
Code:
awk '
! Pairs[$1, $2] {
   Pairs[$1, $2] = Pairs[$2, $1] = 1;
   print $0
}
' info.txt

Input file info.txt :
Code:
A B C
B D W
X Y Z
Z W K
B A C
Y J Q
F M P
Y X Z

Output :
Code:
A B C
B D W
X Y Z
Z W K
Y J Q
F M P

Jean-Pierre.

---------- Post updated at 21:44 ---------- Previous update was at 21:40 ----------

Quote:
Originally Posted by jim mcnamara
Code:
awk '{tmp1=$1 $2;
         tmp2=$2 $1;
         if(tmp1 in arr || tmp2 in arr) {next}
         arr[$1 $2]=$0
       }
       END {for(i in arr) { print arr[i]}}'  inputfile

Start with this..
Just a little problem with this solution, the two following records are considered as pairs:
Code:
XY Z A
X YZ B

Jean-Pierre.
# 4  
Old 05-28-2010
Thank you both for the speedy replies!
I tried both of your suggestions and they both work fine with me.

I am just slightly confused about Jean-Pierre's comment for the solution of jim mcnamara,
if we have the case of
XY Z A
X YZ B


I guess it can be accepted, because I am only comparing the first two columns.
In the example above, are you suggesting there is a problem if we are comparing 3 pairs of variables?
Technically both solutions work if we assume (XY) is one item, ie, XY does not represent X+Y. In this case we are still reverting back to comparing 2 columns. Right?

Thanks,
DG
# 5  
Old 05-28-2010
Quote:
Originally Posted by InfoSeeker
Thank you both for the speedy replies!
I tried both of your suggestions and they both work fine with me.

I am just slightly confused about Jean-Pierre's comment for the solution of jim mcnamara,
if we have the case of
XY Z A
X YZ B


I guess it can be accepted, because I am only comparing the first two columns.
In the example above, are you suggesting there is a problem if we are comparing 3 pairs of variables?
Technically both solutions work if we assume (XY) is one item, ie, XY does not represent X+Y. In this case we are still reverting back to comparing 2 columns. Right?

Thanks,
DG
The jim solution use an array to memorize the pairs formed by the 2 first columns.
The problem is that the key used is the concatenation of the two columns, so the 2 lines gives the same key that is XY+Z=X+YZ=XYZ.

Jean-Pierre.
# 6  
Old 05-30-2010
perl may help you some.

Code:
while(<DATA>){
	my @tmp = sort split;
	my $key = join "", @tmp;
	if(not exists $hash{$key}){
		$hash{$key}=1;
		print;
	}
}
__DATA__
A B C
B D W
X Y Z
Z W K
B A C
Y J Q
F M P
Y X Z

# 7  
Old 05-30-2010
Code:
perl -lane 'print unless $h{join " ", sort @F[0,1]}++' Pairs.txt

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to combine all matching dates and remove non-matching

Using the awk below I am able to combine all the matching dates in $1, but I can not seem to remove the non-matching from the file. Thank you :). file 20161109104500.0+0000,x,5631 20161109104500.0+0000,y,2 20161109104500.0+0000,z,2 20161109104500.0+0000,a,4117... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

Big pattern file matching within another pattern file in awk or shell

Hi I need to do a patten match between files . I am new to shell scripting and have come up with this so far. It take 50 seconds to process files of 2mb size . I need to tune this code as file size will be around 50mb and need to save time. Main issue is that I need to search the pattern from... (2 Replies)
Discussion started by: nitin_daharwal
2 Replies

3. Shell Programming and Scripting

Extracting key/value pairs in awk

I am extracting a number of key/value pairs in awk using following: awk ' /xyz_session_id/ { n=index($0,"xyz_session_id"); id=substr($0,n+15,25); a=$4; } END{ for (ix in a) { print a } }' I don't like this Index + substr with manually calculated... (5 Replies)
Discussion started by: migurus
5 Replies

4. Shell Programming and Scripting

Looping through pairs of files with awk

Hi all, please help me construct the command. i want to loop through all files named bam* and bed*. My awk works for a particular pair but there are too many pairs to do manually. I have generated multiple files in a folder in a given pattern. The files are named like bam_fixed1.bam... (2 Replies)
Discussion started by: newbie83
2 Replies

5. Shell Programming and Scripting

Parsing line with name:value pairs in shell script

HI , I have the following type of lines in a file and need to create a csv file that can be bcp'ed into a db The problem that I have is the delimited of the <name :value> is a space but some of the values in the pairs have space . eg msg_src_time:03/05/13 10:40:17.919 Need sugesstions on... (9 Replies)
Discussion started by: tasmac
9 Replies

6. Shell Programming and Scripting

awk pattern matching and shell issue.

Please help me in this issue. I am unable to get the job,seems the awk not browsing the files. Please find my tries below. I have attached two files : 1.tobesearched.txt - a glimpse of a huge log file. 2.searchstring.txt - searching keys. these are the two scripts i tried writing: ... (7 Replies)
Discussion started by: deboprio
7 Replies

7. Shell Programming and Scripting

awk: matching and not matching

Hello all, simple matching and if not match problem that i can't figure out. file1 hostname: 30 10 * * * /home/toto/start PROD instance_name1 -p 00 9 * * * /home/toto/start PROD instance_name2 -p 15 8 * * * /home/toto/start PROD instance_name3 -p hostname2: 00 8 * * *... (5 Replies)
Discussion started by: maverick72
5 Replies

8. UNIX for Dummies Questions & Answers

awk code to process column pairs and find those with more than 1 set of possible values

Hi, I have a very wide dataset with pairs of columns starting from column 7 onwards (see example below). 0 123456 -1 0 0 -9 0 0 1 2 2 2 1 1 1 1 2 2... 0 123457 -1 0 0 -9 1 2 1 1 2 2 0 0 0 0 2 2... 0 123458 -1 0 0 -9 0 0 1 2 2 2 1 1 2 2 1 2... 0 123459 -1 0 0 -9 1 2 0 0 2 2 1 1 1 2 1 1...... (2 Replies)
Discussion started by: kasan0
2 Replies

9. Shell Programming and Scripting

Searching a delimited Key value pairs in shell script

Hello, I have property file with key value pairs separated by pipe , I am trying to write a script which reads the property file and search and print value of specific key. I tried with Sed, I am successfull. The file is as follows ... (4 Replies)
Discussion started by: ANK
4 Replies

10. Shell Programming and Scripting

AWK matching problem

hi i have a list as follow: where "aba" can be zero or one. I want match all "aba" and count how many are set to one. I mean in the previous case i should get just "2": I tried as follow: awk '/^aba/ BEGIN{i=0}{if($2==1) i=i+1}END{print i}' file but i get some errors. If i do: ... (4 Replies)
Discussion started by: Dedalus
4 Replies
Login or Register to Ask a Question