Find duplicated values in two columns out of three


 
Thread Tools Search this Thread
# 1  
Find duplicated values in two columns out of three

hi!
could u help in the following? I have the data (long list!) that looks like (three coumns white space separated):
Code:
rs3094315 0.0665173 742429
rs12562034 0.0738998 758311
rs3934834 0.396449 995669
rs9442372 0.402693 1008567
rs3737728 0.406271 1011278
rs6687776 0.435429 1020428
rs9651273 0.435896 1021403
rs4970405 0.440268 1038818

And i know that values in the first column are unique, whereas in the second in the third there are duplicates. In other words two different "rs" may correspond to same values in the 2nd and 3rd columns. I need to find the duplicates in 2 and 3 columns and then remove whole line that will contain one unique rs and duplicated values in 2 and 3 coulumns.
Thank u in advance! kush

Last edited by Scrutinizer; 11-01-2012 at 10:13 AM.. Reason: code tags
# 2  
for checking column 2 and 3. It will remove duplicates if both column 2 and 3 matches.

Code:
awk '!X[$2,$3]++' file

This User Gave Thanks to pamu For This Post:
# 3  
dear Pamu, thank u for quick reaction! i tried the code, and it removes 80 duplicates, but i know that there are 3544 in total duplicated values in the 2nd and 3rd columns in my file. And may i specify, these duplicates in my file look like this example below (note last two lines where rs are unique, but values in 2 and 3 columns are same, which i call duplicates):
rs9442372 0.402693 1008567 rs3737728 0.406271 1011278 rs6687776 0.435429 1020428 rs9651273 0.435896 1021403 rs4970405 0.440268 1038818
rs4567890 0.440005 1041120
rs6598722 0.440005 1041120


May be u have more suggestions?
Sorry, if i ask in wrong way.
# 4  
I assmune you want to remove lines if both $2 and $3 are same.

Please check

Code:
$ cat file
rs9442372 0.402693 1008567
rs3737728 0.406271 1011278
rs6687776 0.435429 1020428
rs9651273 0.435896 1021403
rs4970405 0.440005 1038818 # This has duplicates only in column 
rs4567890 0.440005 1041120 # duplicates in columns 2 and 3
rs6598722 0.440005 1041120 # duplicates in columns 2 and 3

For removing duplicates on column $2 and $3 simultaneously.
Code:
$ awk '!X[$2,$3]++' file
rs9442372 0.402693 1008567
rs3737728 0.406271 1011278
rs6687776 0.435429 1020428
rs9651273 0.435896 1021403
rs4970405 0.440005 1038818
rs4567890 0.440005 1041120

For removing duplicates on column $2 and $3 separately.
Code:
$ awk '!X[$2]++ && !Y[$3]++' file
rs9442372 0.402693 1008567
rs3737728 0.406271 1011278
rs6687776 0.435429 1020428
rs9651273 0.435896 1021403
rs4970405 0.440005 1038818

Is anything you want to include here..?

I hope this helps you..Smilie

pamu
# 5  
yes, thank u, Pamu, i cleared for myself the structure of my file and i got why i get such result after running ur code! And i'll utilise ur code.
Thank u very much! (and hope u all have patience for such users as me)
 

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

How to delete 'duplicated' column values and make a delimited file too?

Hi, I have the following output from an Oracle SQL statement and I want to remove duplicated column values. I know it is possible using Oracle analytical/statistical functions but unfortunately I don't know how to use any of those. So now, I've gone to PLAN B using awk/sed maybe or any... (5 Replies)
Discussion started by: newbie_01
5 Replies

3. UNIX for Dummies Questions & Answers

Find Null values in Columns and fail execution by displaying error message

Hi All, I am new to shell scripting. I have a requirement as part of my job to find out null/empty values in column 2 and column 3 from a CSV file and exit the further execution of script by displaying a simple error message. I have developed a script to do this by reading various articles... (7 Replies)
Discussion started by: tpk
7 Replies

4. Shell Programming and Scripting

Adding columns with values dependent on existing columns

Hello I have a file as below chr1 start ref alt code1 code2 chr1 18884 C CAAAA 2 0 chr1 135419 TATACA T 2 0 chr1 332045 T TTG 0 2 chr1 453838 T TAC 2 0 chr1 567652 T TG 1 0 chr1 602541 ... (2 Replies)
Discussion started by: plumb_r
2 Replies

5. UNIX for Dummies Questions & Answers

Removing columns from a text file that do not have any values in second and third columns

I have a text file that has three columns. But at the end of the text file, there are trailing lines that have missing second and third columns: 4 0.04972604 KLHL28 4 0.0497332 CSTB 4 0.04979822 AIF1 4 0.04983331 DECR2 4 0.04990344 KATNB1 4 4 4 4 How can I remove the trailing... (3 Replies)
Discussion started by: evelibertine
3 Replies

6. Shell Programming and Scripting

Get values from different columns from file2 when match values of file1

Hi everyone, I have file1 and file2 comma separated both. file1 is: Header1,Header2,Header3,Header4,Header5,Header6,Header7,Header8,Header9,Header10 Code7,,,,,,,,, Code5,,,,,,,,, Code3,,,,,,,,, Code9,,,,,,,,, Code2,,,,,,,,,file2... (17 Replies)
Discussion started by: cgkmal
17 Replies

7. Shell Programming and Scripting

Shell Script - find, recursively, all files that are duplicated

Hi. I have a problem that i can't seem to resolve. I need to create a script that list all the files, that are found recursively, with the same name. For example if a file exists in more than one directory with the same name it list all the files that he founds with all the info. Could someone... (5 Replies)
Discussion started by: KitFisto
5 Replies

8. Shell Programming and Scripting

using sed to get rid of duplicated columns...

I can not figure out this one, so I turn to unix.com for help, I have a file, in which there are some lines containing continuously duplicate columns, like the following adb abc abc asd adfj 123 123 123 345 234 444 444 444 444 444 23 and the output I want is adb abc asd adfj 123 345... (5 Replies)
Discussion started by: fedora
5 Replies

9. Shell Programming and Scripting

Help removing lines with duplicated columns

Hi Guys... Please Could you help me with the following ? aaaa bbbb cccc sdsd aaaa bbbb cccc qwer as you can see, the 2 lines are matched in three fields... how can I delete this pupicate ? I mean to delete the second one if 3 fields were duplicated ? Thanks (14 Replies)
Discussion started by: yahyaaa
14 Replies

10. Shell Programming and Scripting

remove duplicated columns

hi all, i have a file contain multicolumns, this file is sorted by col2 and col3. i want to remove the duplicated columns if the col2 and col3 are the same in another line. example fileA AA BB CC DD CC XX CC DD BB CC ZZ FF DD FF HH HH the output is AA BB CC DD BB CC ZZ FF... (6 Replies)
Discussion started by: kamel.seg
6 Replies

Featured Tech Videos