![]() |
Hello and Welcome from United States to the UNIX and Linux Forums! Thank You for Visiting and Joining Our Global Community.
|
|
google unix.com
|
|||||||
| Forums | Register | Forum Rules | Links | Albums | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here. |
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| How can I calculate the total of nucleotide in Unix?What command line I should type? | patrick chia | Shell Programming and Scripting | 2 | 01-22-2009 04:39 AM |
| How to remove those sequence with same amino acid?What command line I should type? | patrick chia | Shell Programming and Scripting | 4 | 01-20-2009 09:50 PM |
| Remove duplicate entry in one line | kharen11 | UNIX for Dummies Questions & Answers | 5 | 07-05-2007 02:56 PM |
| Identify duplicate words in a line using command | srinivasan_85 | UNIX for Dummies Questions & Answers | 8 | 05-01-2007 01:29 AM |
| Remove Duplicate line | Student37 | UNIX for Dummies Questions & Answers | 1 | 02-22-2005 03:00 PM |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
|
|
|
||||
|
How can I remove those duplicate sequence in UNIX?What command line I should type?
The input is:
>HWI-EAS382_30FC7AAXX:4:1:1580:1465 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >HWI-EAS382_30FC7AAXX:4:1:1062:1640 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >HWI-EAS382_30FC7AAXX:4:1:272:629 AAAAAAAAGCTATAGTCTCGTCACACATACTCACAA >HWI-EAS382_30FC7AAXX:4:1:1033:1135 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >HWI-EAS382_30FC7AAXX:4:1:1421:27 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA My desired output is: >HWI-EAS382_30FC7AAXX:4:1:1580:1465 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >HWI-EAS382_30FC7AAXX:4:1:272:629 AAAAAAAAGCTATAGTCTCGTCACACATACTCACAA What command line I should type to remove those duplicated sequence? Thanks for all of your advise. |
|
||||
|
Hi, fajohnson...
Your command line is worked. But still left all the header of the nucleotide sequence. Do you have better idea that I just remain the first header of those same nucleotide sequence? My input: >HWI-EAS382_30FC7AAXX:4:1:631:449 >HWI-EAS382_30FC7AAXX:4:1:93:1407 >HWI-EAS382_30FC7AAXX:4:1:154:1123 >HWI-EAS382_30FC7AAXX:4:1:912:1008 >HWI-EAS382_30FC7AAXX:4:1:57:316 >HWI-EAS382_30FC7AAXX:4:1:1287:1193 >HWI-EAS382_30FC7AAXX:4:1:1451:1559 >HWI-EAS382_30FC7AAXX:4:1:1431:1913 TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT My output just want left this: >HWI-EAS382_30FC7AAXX:4:1:631:449 TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT Thanks for your advise. Have a nice day. |
|
||||
|
Quote:
Code:
awk '/^>/{if(hdr=="") hdr=$0}/^[^>]/{x[$0]++;if(x[$0]==1) {print hdr;print} hdr=""}' infile
|
![]() |
| Bookmarks |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|