Sponsored Content
Top Forums Shell Programming and Scripting How can I remove those duplicate sequence in UNIX?What command line I should type? Post 302279488 by patrick chia on Thursday 22nd of January 2009 10:36:37 PM
Old 01-22-2009
Quote:
Originally Posted by cfajohnson
Code:
awk '!x[$0]++' FILE

Hi, fajohnson...
Your command line is worked. But still left all the header of the nucleotide sequence. Do you have better idea that I just remain the first header of those same nucleotide sequence?
My input:
>HWI-EAS382_30FC7AAXX:4:1:631:449
>HWI-EAS382_30FC7AAXX:4:1:93:1407
>HWI-EAS382_30FC7AAXX:4:1:154:1123
>HWI-EAS382_30FC7AAXX:4:1:912:1008
>HWI-EAS382_30FC7AAXX:4:1:57:316
>HWI-EAS382_30FC7AAXX:4:1:1287:1193
>HWI-EAS382_30FC7AAXX:4:1:1451:1559
>HWI-EAS382_30FC7AAXX:4:1:1431:1913
TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT

My output just want left this:
>HWI-EAS382_30FC7AAXX:4:1:631:449
TTTCCGCGAACTGCAAAAGACGTTTCGTATGCCGTT

Thanks for your advise. Have a nice day.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Remove Duplicate line

Hi, I have a scenario here where I have created a flatfile with the below mentioned information. File as you can see is dispalyed in three columns 1st column is FileNameString 2nd column is Report_Name (this has spaces) 3rd column is Flag Result file needed is, removal of duplicate... (1 Reply)
Discussion started by: Student37
1 Replies

2. UNIX for Dummies Questions & Answers

Remove duplicate entry in one line

Can anyone help me how can i print only the unique entry in a line? MI_AP MI_AP MI_CM MI_MF RC_NAP MBS_AP SF_RAN MBS_AP NT_CAR so that it will on output the one unique entry per line. MI_AP MI_CM MI_MF RC_NAP MBS_AP SF_RAN NT_CAR I can't find the same situation on the knowledge... (5 Replies)
Discussion started by: kharen11
5 Replies

3. Shell Programming and Scripting

How to remove those sequence with same amino acid?What command line I should type?

My input is listed as: giNumber RefAminoAcid VarAminoAcid 10190711 P P 10190711 D D 109255248 I A 110349771 A ... (4 Replies)
Discussion started by: patrick chia
4 Replies

4. Shell Programming and Scripting

How can I calculate the total of nucleotide in Unix?What command line I should type?

For example, if I have the file whose content are: >HWI-EAS382_30FC7AAXX:7:1:927:1368 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA >HWI-EAS382_30FC7AAXX:7:1:924:1373 ACGAACTTTAAAGCACCTCTTGGCTCGTATGCCGTC I want my output calculate the total of nucleotide. So my output should look like this:... (2 Replies)
Discussion started by: patrick chia
2 Replies

5. Shell Programming and Scripting

remove duplicate words in a line

Hi, Please help! I have a file having duplicate words in some line and I want to remove the duplicate words. The order of the words in the output file doesn't matter. INPUT_FILE pink_kite red_pen ball pink_kite ball yellow_flower white no white no cloud nine_pen pink cloud pink nine_pen... (6 Replies)
Discussion started by: sam_2921
6 Replies

6. Shell Programming and Scripting

remove of duplicate line from a file

I have a file a.txt having content like deepak ram sham deepram sita kumar I Want to delete the first line containing "deep" ... I tried using... grep -i 'deep' a.txt It gives me 2 rows...I want to delete the first one.. + need to know the command to delete the line from... (5 Replies)
Discussion started by: saluja.deepak
5 Replies

7. Shell Programming and Scripting

Remove duplicate line on condition

Hi Ive been scratching over this for some time with no solution. I have a file like this 1 bla bla 1 2 bla bla 2 4 bla bla 3 5 bla bla 1 6 bla bla 1 I want to remove consecutive occurrences of lines like bla bla 1, but the first column may be different. Any ideasss?? (23 Replies)
Discussion started by: jamie_123
23 Replies

8. UNIX for Dummies Questions & Answers

Remove Duplicate Two Line Pairs?

So I have a bunch of files that look like this >gi|33332323 MMKCRGVIMVVEKVMKRDGRIVPFDESRIRWAVQ--- >gi|45235353 MMKCR----VEKMRDVFFDESIRWAVQ They go on...sequences are much longer but all in two line (fasta) format. I want to remove duplicate pairs of ID(GI) number and sequence. I tried... (12 Replies)
Discussion started by: bakere19
12 Replies

9. Shell Programming and Scripting

Remove duplicate entries from the same line

Hello, I have a file which have several duplicate entries on the same line: File ID source 1 GM GF GM 2 GM GF GM GF GM GF GM GF GM GF 3 GM GF GM SF GM GF GM SF 4 FF FF FF FF 5 FF GM FF ... (2 Replies)
Discussion started by: nans
2 Replies

10. Shell Programming and Scripting

Remove duplicate line starting with a pattern

HI, I have the below input file /* ----------------- cmdsDlyStartFWJ -----------------*/ UNIX_JOB CMDS065J RUN ANY CMDNAME sleep 5 AGENT CMDSHP USER proddata RUN MON,TUE,WED,THU,FRI DELAYSUB 02:00 /* "Triggers daily file watcher jobs" */ ENVAR... (5 Replies)
Discussion started by: varun22486
5 Replies
Bio::SeqEvolution::DNAPoint(3pm)			User Contributed Perl Documentation			  Bio::SeqEvolution::DNAPoint(3pm)

NAME
Bio::SeqEvolution::DNAPoint - evolve a sequence by point mutations SYNOPSIS
# $seq is a Bio::PrimarySeqI to mutate $evolve = Bio::SeqEvolution::Factory->new (-rate => 5, -seq => $seq, -identity => 50 ); $newseq = $evolve->next_seq; DESCRIPTION
Bio::SeqEvolution::DNAPoint implements the simplest evolution model: nucleotides change by point mutations, only. Transition/transversion rate of the change, rate(), can be set. The new sequences are named with the id of the reference sequence added with a running number. Placing a new sequence into a factory to be evolved resets that counter. It can also be called directly with reset_sequence_counter. The default sequence type returned is Bio::PrimarySeq. This can be changed to any Bio::PrimarySeqI compliant sequence class. Internally the probability of the change of one nucleotide is mapped to scale from 0 to 100. The probability of the transition occupies range from 0 to some value. The remaining range is divided equally among the two transversion nucleotides. A random number is then generated to pick up one change. Not that the default transition/transversion rate, 1:1, leads to observed transition/transversion ratio of 1:2 simply because there is only one transition nucleotide versus two transversion nucleotides. FEEDBACK
Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules. Send your comments and suggestions preferably to the Bioperl mailing list. Your participation is much appreciated. bioperl-l@bioperl.org - General discussion http://bioperl.org/wiki/Mailing_lists - About the mailing lists Support Please direct usage questions or support issues to the mailing list: bioperl-l@bioperl.org rather than to the module maintainer directly. Many experienced and reponsive experts will be able look at the problem and quickly address it. Please include a thorough description of the problem with code and data examples if at all possible. Reporting Bugs Report bugs to the Bioperl bug tracking system to help us keep track of the bugs and their resolution. Bug reports can be submitted via the web: https://redmine.open-bio.org/projects/bioperl/ AUTHOR
Heikki Lehvaslaiho E<lt>heikki at bioperl dot orgE<gt> CONTRIBUTORS
Additional contributor's names and emails here APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _ seq Title : seq Usage : $obj->seq($newval) Function: Set the sequence object for the original sequence Returns : The sequence object Args : newvalue (optional) Setting this will reset mutation and generated mutation counters. set_mutated_seq Title : seq_mutated_seq Usage : $obj->set_mutated_seq($newval) Function: In case of mutating a sequence with multiple evolvers, this Returns : set_mutated_seq Args : newvalue (optional) rate Title : rate Usage : $obj->rate($newval) Function: Set the transition/transversion rate. Returns : value of rate Args : newvalue (optional) Transition/transversion ratio is an observed attribute of an sequence comparison. We are dealing here with the transition/transversion rate that we set for our model of sequence evolution. Note that we are using standard nucleotide alphabet and that there can there is only one transition versus two possible transversions. Rate 2 is needed to have an observed transition/transversion ratio of 1. next_seq Title : next_seq Usage : $obj->next_seq Function: Evolve the reference sequence to desired level Returns : A new sequence object mutated from the reference sequence Args : - mutate Title : mutate Usage : $obj->mutate Function: mutate the sequence at the given location according to the model Returns : true Args : integer, start location of the mutation, required argument Called from next_seq(). perl v5.14.2 2012-03-02 Bio::SeqEvolution::DNAPoint(3pm)
All times are GMT -4. The time now is 01:15 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy