awk : extracting unique lines based on columns


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk : extracting unique lines based on columns
# 1  
Old 05-01-2010
awk : extracting unique lines based on columns

Hi,

snp.txt
Code:
CHR_A   SNP_A           BP_A_st         BP_A_End        CHR_B   BP_B            SNP_B           R2              p-SNP_A         p-SNP_B
5       rs1988728       74904317        74904318        5       74960646        rs1427924       0.377333        0.000740085     0.013930081
5       rs1988728       74904317        74904318        5       74960918        rs9293656       0.370860 0.000740085     0.00939958
1       rs268955        30166376        30166377        1       30286312        rs12145453      0.015673        0.000740425     0.008207172
1       rs268955        30166376        30166377        1       30289115        rs12142520      0.0120884       0.000740425     0.045320982
19      rs6510185       36070251        36070252        19      36263387        rs11673246      0.0105482       0.000740565     0.034650246
19      rs6510185       36070251        36070252        19      36115734        rs17571341      0.00406461      0.000740565     0.015351578
19      rs6510185       36070251        36070252        19      36267571        rs11880163      0.00040869      0.000740565     0.016354903
5       rs5744566       74866563        74866564        5       74913022        rs3213801       0.385063        0.000740641     0.018259986
5       rs5744566       74866563        74866564        5       74955165        rs6861279       0.380825        0.000740641     0.014054183

Making sure that col 2 is unique.
Output
Code:
CHR_A   SNP_A           BP_A_st         BP_A_End        CHR_B   BP_B            SNP_B           R2              p-SNP_A         p-SNP_B
5       rs1988728       74904317        74904318        5       74960646        rs1427924       0.377333        0.000740085     0.013930081
1       rs268955        30166376        30166377        1       30286312        rs12145453      0.015673        0.000740425     0.008207172
19      rs6510185       36070251        36070252        19      36263387        rs11673246      0.0105482       0.000740565     0.034650246
5       rs5744566       74866563        74866564        5       74913022        rs3213801       0.385063        0.000740641     0.018259986

Would like to get a solution with awk or shell.
Thanks
# 2  
Old 05-01-2010
check whether it works for you
Code:
for i in `cat uniq.txt | awk '{print $2}' |sort| uniq`;do grep -m 1 $i uniq.txt;done | sort -r

# 3  
Old 05-01-2010
Code:
$ awk 'a !~ $2; {a=$2}' snp.txt

# 4  
Old 05-01-2010
Quote:
Originally Posted by pseudocoder
Code:
$ awk 'a !~ $2; {a=$2}' snp.txt


please explain the logic

what is a !~ $2

thanks
kamaraj
# 5  
Old 05-01-2010
I am sorry, but that did not work.
When I tried with the example above, both codes gave me correct output.
But when I applied it to my actual datafile of 289727 lines
itkamaraj code gave 56446 lines
pseudocoder gave 57747 lines.

Last edited by genehunter; 05-01-2010 at 02:49 AM..
# 6  
Old 05-01-2010
Quote:
Originally Posted by genehunter
I am sorry, but that did not work.
I still have duplicates.

is my solution works ?
# 7  
Old 05-01-2010
please see post above.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

2. Shell Programming and Scripting

Remove lines with unique information in indicated columns

Hi, I have the 3-column, tab-separated following data: dot is-big 2 dot is-round 3 dot is-gray 4 cat is-big 3 hot in-summer 5 I want to remove all of those lines in which the values of Columns 1 and 2 are identical. In this way, the results would be as follows: dot is-big 2 cat... (4 Replies)
Discussion started by: owwow14
4 Replies

3. Shell Programming and Scripting

Find unique lines based off of bytes

Hello All, I have two VERY large .csv files that I want to compare values based on substrings. If the lines are unique, then print the line. For example, if I run a diff file1.csv and file2.csv I get results similar to +_id34,brown,car,2006 +_id1,blue,train,1985... (5 Replies)
Discussion started by: jl487
5 Replies

4. Shell Programming and Scripting

count the unique records based on certain columns

Hi everyone, I have a file result.txt with records as following and another file mirna.txt with a list of miRNAs e.g. miR22, miR123, miR13 etc. Gene Transcript miRNA Gar Nm_111233 miR22 Gar Nm_123440 miR22 Gar Nm_129939 miR22 Hel Nm_233900 miR13 Hel ... (6 Replies)
Discussion started by: miclow
6 Replies

5. Shell Programming and Scripting

How to merge columns into lines, using unique keys?

I would really appreciate a sulution for this : invoice# client# 5929 231 4358 231 2185 231 6234 231 1166 464 1264 464 3432 464 1720 464 9747 464 1133 791 4930 791 5496 791 6291 791 8681 989 3023 989 (2 Replies)
Discussion started by: hemo21
2 Replies

6. UNIX Desktop Questions & Answers

Extracting only unique data between two columns

:wall:Hi there, I am trying to extract/filter a unique data between specific columns from a tab deliminated file, that has a number of columns: input file as follow: 5 rs1 70 A C 7 1 1 Blue 5 rs9 66 A E ... (2 Replies)
Discussion started by: houkto
2 Replies

7. Shell Programming and Scripting

Extracting several lines of text after a unique string

I'm attempting to write a script to identify users who have sudo access on a server. I only want to extract the ID's of the sudo users after a unique line of text. The list of sudo users goes to the EOF so I only need the script to start after the unique line of text. I already have a script to... (1 Reply)
Discussion started by: bouncer
1 Replies

8. Shell Programming and Scripting

Extracting Text Between Two Unique Lines

Hi all! Im trying to extract a portion of text from a file and put it into a new file. I need all the lines between <Placement> and </Placement> including the Placemark lines themselves. Is there a way to extract all instances of these and not just the first one found? I've tried using sed and... (4 Replies)
Discussion started by: Grizzly
4 Replies

9. Shell Programming and Scripting

extracting unique lines from text file

I have a file with 14million lines and I would like to extract all the unique lines from the file into another text file. For example: Contents of file1 happy sad smile happy funny sad I want to run a command against file one that only returns the unique lines (ie 1 line for happy... (3 Replies)
Discussion started by: soliberus
3 Replies

10. Shell Programming and Scripting

Remove lines, Sorted with Time based columns using AWK & SORT

Hi having a file as follows MediaErr.log 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:12:16 84 Server1 Policy1 Schedule1 master1 05/08/2008 02:22:47 84 Server1 Policy1 Schedule1 master1 05/08/2008 03:41:26 84 Server1 Policy1 ... (1 Reply)
Discussion started by: karthikn7974
1 Replies
Login or Register to Ask a Question