06-27-2012
How to change sequence name in along fasta file?
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT
TCAGATGTTTATTATAAAAATTAGATGAAAAATATGTTAATATACAAGTA
>JV501.contig00066(+):24356-42404|sequence_index=0|block_index=4|species=JV501|JV501_4_0
AATGACGATTTAGATGAAAAATAT...
The name of the sequences are too big and I want to just keep JV101, JV501 and delete rest of the words after the dot. I am new to unix, please suggest an easy unix command to do this.
Thanks
Baika
10 More Discussions You Might Find Interesting
1. Solaris
Hi all,
I have solaris and xp installed...
Usually solaris occupies the first boot slot.
So i wanted to know if it is possible to change the boot sequence to xp first and then solaris? (5 Replies)
Discussion started by: wrapster
5 Replies
2. Red Hat
Hi,
I use red hat linux kernel 2.6
I want to add the application shutdown in shutdown sequence .
I add the K script in /etc/rc.d/ all sub directory for all
running level .
But the auto shutdown application is not appear when
I type "shutdown -r now" ..
There is no indication the application... (5 Replies)
Discussion started by: chuikingman
5 Replies
3. Shell Programming and Scripting
Hi All,
I want to change the start-up sequence of services on SLES10/11.
I have my own start-up scripts for some services and I want them to start in a particular order(not in alphabetical order)
Can anyone help me on this issue? (4 Replies)
Discussion started by: senrooy
4 Replies
4. Shell Programming and Scripting
Hi,
Can anyone tell me what the following 2 lines are doing
base=${0##*/}
link=${base#*}
I found this in a start up service script and I think it is giving the service link names which in turn will change the start up sequence of services. (3 Replies)
Discussion started by: senrooy
3 Replies
5. Shell Programming and Scripting
Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this?
For Example Chr 1 is in following format
I need regions from 2 - 10 should give me AATTCCAAA
and in a similar way 15- 25 should give... (8 Replies)
Discussion started by: empyrean
8 Replies
6. UNIX for Dummies Questions & Answers
I have fasta files with multiple sequences in each. I need to change the sequence name headers from:
>accD:_59176-60699
ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA
>atpA_(reverse_strand):_showing_revcomp_of_10525-12048
ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC... (2 Replies)
Discussion started by: tyrianthinae
2 Replies
7. Shell Programming and Scripting
Hi,
I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help .
input
> fefrwefrwef X900
AGAGGGAATTGG
AGGGGCCTGGAG
GGTTCTCTTC
> fefrwefrwef X932
AGAGGGAATTGG
AGGAGGTGGAG
GGTTCTCTTC
> fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies
8. Shell Programming and Scripting
Hello,
I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Discussion started by: empyrean
5 Replies
9. UNIX for Dummies Questions & Answers
I have the following script:
awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'
and the following file:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies
10. UNIX for Beginners Questions & Answers
I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position,
AAGCZ-N16-AAGCZ
Z represents A, C or G (Except T)
N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies
LEARN ABOUT DEBIAN
bp_mask_by_search
BP_MASK_BY_SEARCH(1p) User Contributed Perl Documentation BP_MASK_BY_SEARCH(1p)
NAME
mask_by_search - mask sequence(s) based on its alignment results
SYNOPSIS
mask_by_search.pl -f blast genomefile blastfile.bls > maskedgenome.fa
DESCRIPTION
Mask sequence based on significant alignments of another sequence. You need to provide the report file and the entire sequence data which
you want to mask. By default this will assume you have done a TBLASTN (or TFASTY) and try and mask the hit sequence assuming you've
provided the sequence file for the hit database. If you would like to do the reverse and mask the query sequence specify the -t/--type
query flag.
This is going to read in the whole sequence file into memory so for large genomes this may fall over. I'm using DB_File to prevent keeping
everything in memory, one solution is to split the genome into pieces (BEFORE you run the DB search though, you want to use the exact file
you BLASTed with as input to this program).
Below the double dash (--) options are of the form --format=fasta or --format fasta or you can just say -f fasta
By -f/--format I mean either are acceptable options. The =s or =n or =c specify these arguments expect a 'string'
Options:
-f/--format=s Search report format (fasta,blast,axt,hmmer,etc)
-sf/--sformat=s Sequence format (fasta,genbank,embl,swissprot)
--hardmask (booelean) Hard mask the sequence
with the maskchar [default is lowercase mask]
--maskchar=c Character to mask with [default is N], change
to 'X' for protein sequences
-e/--evalue=n Evalue cutoff for HSPs and Hits, only
mask sequence if alignment has specified evalue
or better
-o/--out/
--outfile=file Output file to save the masked sequence to.
-t/--type=s Alignment seq type you want to mask, the
'hit' or the 'query' sequence. [default is 'hit']
--minlen=n Minimum length of an HSP for it to be used
in masking [default 0]
-h/--help See this help information
AUTHOR - Jason Stajich
Jason Stajich, jason-at-bioperl-dot-org.
perl v5.14.2 2012-03-02 BP_MASK_BY_SEARCH(1p)