Remove duplicate rows based on one column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Remove duplicate rows based on one column
# 1  
Old 07-21-2014
Remove duplicate rows based on one column

Dear members, I need to filter a file based on the 8th column (that is id), and does not mather the other columns, because I want just one id (1 line of each id) and remove the duplicates lines based on this id (8th column), and does not matter wich duplicate will be removed.

example of my file
Code:
 intergenic      ENSGALG000000285(dist=73),ENSGALG000000057(dist=13)     10      1017921        1017921        -       T    Chr10_10179217
  UTR3    ENSGALG00000005703      10      1018210        1018210        A       -       Chr10_10182099
  intronic        ENSGALG0000000570      10      12185225        12185228        TAAA    -       Chr10_10185224
    intronic        ENSGALG00000005703      10      10188875        10188877        TCC     TTCCC   Chr2_10188875
  intronic        ENSGALG00000005703      10      10188875        10188875        -       TC      Chr2_10188875
intronic        ENSGALG0000002345      10      10312300        10312300        -       AAAA    Chr15_10312291
intronic        ENSGALG0000002345      10      10312300        10312300        -       AAA     Chr15_10312291
intronic        ENSGALG0000002345      10      10312300        10312300        -       AA      Chr15_10312291


I want:
Code:
intergenic      ENSGALG000000285(dist=73),ENSGALG000000057(dist=13)     10      1017921        1017921        -       T    Chr10_10179217
  UTR3    ENSGALG00000005703      10      1018210        1018210        A       -       Chr10_10182099
  intronic        ENSGALG0000000570      10      12185225        12185228        TAAA    -       Chr10_10185224
    intronic        ENSGALG00000005703      10      10188875        10188877        TCC     TTCCC   Chr2_10188875
intronic        ENSGALG0000002345      10      10312300        10312300        -       AA      Chr15_10312291

How can I remove those duplicates?
thanks very much

Last edited by Don Cragun; 07-21-2014 at 05:27 PM.. Reason: CODE tags added; dozens of FONT tags removed.
# 2  
Old 07-21-2014
Code:
 awk '!a[$8]++' file

This User Gave Thanks to in2nix4life For This Post:
# 3  
Old 07-21-2014
Thanks
It worked! Smilie
# 4  
Old 07-21-2014
You're welcome. Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Extract and exclude rows based on duplicate values

Hello I have a file like this: > cat examplefile ghi|NN603762|eee mno|NN607265|ttt pqr|NN613879|yyy stu|NN615002|uuu jkl|NN607265|rrr vwx|NN615002|iii yzA|NN618555|ooo def|NN190486|www BCD|NN628717|ppp abc|NN190486|qqq EFG|NN628717|aaa HIJ|NN628717|sss > I can sort the file by... (5 Replies)
Discussion started by: CHoggarth
5 Replies

2. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

3. UNIX for Dummies Questions & Answers

merging rows into new file based on rows and first column

I have 2 files, file01= 7 columns, row unknown (but few) file02= 7 columns, row unknown (but many) now I want to create an output with the first field that is shared in both of them and then subtract the results from the rest of the fields and print there e.g. file 01 James|0|50|25|10|50|30... (1 Reply)
Discussion started by: A-V
1 Replies

4. UNIX for Dummies Questions & Answers

Remove duplicate rows when >10 based on single column value

Hello, I'm trying to delete duplicates when there are more than 10 duplicates, based on the value of the first column. e.g. a 1 a 2 a 3 b 1 c 1 gives b 1 c 1 but requires 11 duplicates before it deletes. Thanks for the help Video tutorial on how to use code tags in The UNIX... (11 Replies)
Discussion started by: informaticist
11 Replies

5. Shell Programming and Scripting

remove consecutive duplicate rows

I have some data that looks like, 1 3300665.mol 3300665 5177008 102.093 2 3300665.mol 3300665 5177008 102.093 3 3294015.mol 3294015 5131552 102.114 4 3294015.mol 3294015 5131552 102.114 5 3293734.mol 3293734 5129625 104.152 6 3293734.mol ... (13 Replies)
Discussion started by: LMHmedchem
13 Replies

6. Shell Programming and Scripting

Duplicate rows in CSV files based on values

I am new to this forum and this is my first post. I am looking at an old post with exactly the same name. Can not paste URL because I do not have 5 posts My requirement is exactly opposite. I want to get rid of duplicate rows and try to append the values of columns in those rows ... (10 Replies)
Discussion started by: vbhonde11
10 Replies

7. UNIX for Dummies Questions & Answers

forming duplicate rows based on value of a key

if the key (A or B or ...others) has 4 in its 3rd column the 1st A row has to form 4 dupicates along with the all the values of A in 4th column (2.9, 3.8, 4.2) . Hope I explain the question clearly. Cheers Ruby input "A" 1 4 2.9 "A" 2 5 ... (7 Replies)
Discussion started by: ruby_sgp
7 Replies

8. Shell Programming and Scripting

Remove duplicate line detail based on column one data

My input file: AVI.out <detail>named as the RRM .</detail> AVI.out <detail>Contains 1 RRM .</detail> AR0.out <detail>named as the tellurite-resistance.</detail> AWG.out <detail>Contains 2 HTH .</detail> ADV.out <detail>named as the DENR family.</detail> ADV.out ... (10 Replies)
Discussion started by: patrick87
10 Replies

9. Shell Programming and Scripting

how to delete duplicate rows based on last column

hii i have a huge amt of data stored in a file.Here in this file i need to remove duplicates rows in such a way that the last column has different data & i must check for greatest among last colmn data & print the largest data along with other entries but just one of other duplicate entries is... (16 Replies)
Discussion started by: reva
16 Replies

10. UNIX for Dummies Questions & Answers

Remove duplicate rows of a file based on a value of a column

Hi, I am processing a file and would like to delete duplicate records as indicated by one of its column. e.g. COL1 COL2 COL3 A 1234 1234 B 3k32 2322 C Xk32 TTT A NEW XX22 B 3k32 ... (7 Replies)
Discussion started by: risk_sly
7 Replies
Login or Register to Ask a Question