06-27-2012
How to change sequence name in along fasta file?
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT
TCAGATGTTTATTATAAAAATTAGATGAAAAATATGTTAATATACAAGTA
>JV501.contig00066(+):24356-42404|sequence_index=0|block_index=4|species=JV501|JV501_4_0
AATGACGATTTAGATGAAAAATAT...
The name of the sequences are too big and I want to just keep JV101, JV501 and delete rest of the words after the dot. I am new to unix, please suggest an easy unix command to do this.
Thanks
Baika
10 More Discussions You Might Find Interesting
1. Solaris
Hi all,
I have solaris and xp installed...
Usually solaris occupies the first boot slot.
So i wanted to know if it is possible to change the boot sequence to xp first and then solaris? (5 Replies)
Discussion started by: wrapster
5 Replies
2. Red Hat
Hi,
I use red hat linux kernel 2.6
I want to add the application shutdown in shutdown sequence .
I add the K script in /etc/rc.d/ all sub directory for all
running level .
But the auto shutdown application is not appear when
I type "shutdown -r now" ..
There is no indication the application... (5 Replies)
Discussion started by: chuikingman
5 Replies
3. Shell Programming and Scripting
Hi All,
I want to change the start-up sequence of services on SLES10/11.
I have my own start-up scripts for some services and I want them to start in a particular order(not in alphabetical order)
Can anyone help me on this issue? (4 Replies)
Discussion started by: senrooy
4 Replies
4. Shell Programming and Scripting
Hi,
Can anyone tell me what the following 2 lines are doing
base=${0##*/}
link=${base#*}
I found this in a start up service script and I think it is giving the service link names which in turn will change the start up sequence of services. (3 Replies)
Discussion started by: senrooy
3 Replies
5. Shell Programming and Scripting
Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this?
For Example Chr 1 is in following format
I need regions from 2 - 10 should give me AATTCCAAA
and in a similar way 15- 25 should give... (8 Replies)
Discussion started by: empyrean
8 Replies
6. UNIX for Dummies Questions & Answers
I have fasta files with multiple sequences in each. I need to change the sequence name headers from:
>accD:_59176-60699
ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA
>atpA_(reverse_strand):_showing_revcomp_of_10525-12048
ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC... (2 Replies)
Discussion started by: tyrianthinae
2 Replies
7. Shell Programming and Scripting
Hi,
I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help .
input
> fefrwefrwef X900
AGAGGGAATTGG
AGGGGCCTGGAG
GGTTCTCTTC
> fefrwefrwef X932
AGAGGGAATTGG
AGGAGGTGGAG
GGTTCTCTTC
> fefrwefrwef X937... (2 Replies)
Discussion started by: ritakadm
2 Replies
8. Shell Programming and Scripting
Hello,
I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Discussion started by: empyrean
5 Replies
9. UNIX for Dummies Questions & Answers
I have the following script:
awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'
and the following file:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2... (2 Replies)
Discussion started by: Xterra
2 Replies
10. UNIX for Beginners Questions & Answers
I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position,
AAGCZ-N16-AAGCZ
Z represents A, C or G (Except T)
N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies
LEARN ABOUT DEBIAN
kalign
KALIGN(1) Kalign User Manual KALIGN(1)
NAME
kalign - performs multiple alignment of biological sequences.
SYNOPSIS
kalign [infile.fasta] [outfile.fasta] [Options]
kalign [-i infile.fasta] [-o outfile.fasta] [Options]
kalign [< infile.fasta] [> outfile.fasta] [Options]
DESCRIPTION
Kalign is a command line tool to perform multiple alignment of biological sequences. It employs the Muth?Manber string-matching algorithm,
to improve both the accuracy and speed of the alignment. It uses global, progressive alignment approach, enriched by employing an
approximate string-matching algorithm to calculate sequence distances and by incorporating local matches into the otherwise global
alignment.
OPTIONS
-s -gpo -gapopen -gap_open x
Gap open penalty .
-e -gpe -gap_ext -gapextension x
Gap extension penalty.
-t -tgpe -terminal_gap_extension_penalty x
Terminal gap penalties.
-m -bonus -matrix_bonus x
A constant added to the substitution matrix.
-c -sort <input, tree, gaps.>
The order in which the sequences appear in the output alignment.
-g -feature
Selects feature mode and specifies which features are to be used: e.g. all, maxplp, STRUCT, PFAM-A?
-same_feature_score
Score for aligning same features.
-diff_feature_score
Penalty for aligning different features.
-d -distance <wu, pair>
Distance method
-b -tree -guide-tree <nj, upgma>
Guide tree method.
-z -zcutoff
Parameter used in the wu-manber based distance calculation.
-i -in -input
Name of the input file.
-o -out -output
Name of the output file.
-a -gap_inc
Increases gap penalties depending on the number of existing gaps.
-f -format <fasta, msf, aln, clu, macsim>
The output format.
-q -quiet
Print nothing to STDERR. Read nothing from STDIN.
REFERENCES
o Timo Lassmann and Erik L.L. Sonnhammer (2005) Kalign - an accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics
6:298
o Timo Lassmann, Oliver Frings and Erik L. L. Sonnhammer (2009) Kalign2: high-performance multiple alignment of protein and nucleotide
sequences allowing external features. Nucleic Acid Research 3:858?865.
AUTHORS
Timo Lassmann <timolassmann@gmail.com>
Upstream author of Kalign.
Charles Plessy <plessy@debian.org>
Wrote the manpage.
COPYRIGHT
Copyright (C) 2004, 2005, 2006, 2007, 2008 Timo Lassmann
Kalign is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the
Free Software Foundation.
This manual page was written by Charles Plessy <plessy@debian.org> for the Debian(TM) system (but may be used by others). Permission is
granted to copy, distribute and/or modify this document under the same terms as kalign itself.
On Debian systems, the complete text of the GNU General Public License version 2 can be found in /usr/share/common-licenses/GPL-2.
kalign 2.04 February 25, 2009 KALIGN(1)