I really need some help with this task. I have a bunch of FASTA files with hundreds of DNA sequences that look like this:
And I need to change the format (Phylip) so they can look like this:
The first number at the very top is the number of sequences followed by the length of the sequences.
The first column is the Sequence ID that needs to be 8 characters long followed by 2 blank spaces and then the actual sequence. If the SequenceID is longer than 8 characters, then the extra characters should be removed. If the SequenceID is shorter than 8, blank spaces should be added to keep the length to 8. In my example I have added underscores to keep the sequences aligned and accurately reflect how the output file should look but in the outfile they should be blank spaces.
Any help will be greatly appreciate it!
Hi,
Is there any way to change one date format to another ?? I mean I have a file having dates in the format (Thu Sep 29 2005) ... and i wud like to change these to YYYYMMDD format .. is there any command which does so ?? Or anything like enum which we have in C ??
Thanks in advance,
... (5 Replies)
Hi,
There are lots of threads about how to manipulate the date using date +%m %.......
But how can I change the default format of the commad date?
$ date
Mon Apr 10 10:57:15 BST 2006
This would be on fedora and SunOs.
Cheers,
Neil (4 Replies)
Dear Experts,
Currently my script is gereating the output like this as mentioned below.
8718,8718,0,8777
7450,7450,0,7483
5063,5063,0,5091
3840,3840,0,3855
3129,3129,0,3142
2400,2400,0,2419
2597,2597,0,2604
3055,3055,0,3078
4249,4249,0,4266
4927,4927,0,4957
8920,8920,0,8978... (4 Replies)
Hi, I have a column in a table of Timestamp datatype. For Example : Var1 is the column 2008-06-26-10.10.30.2006. I have Given query as date(var1) and time (var1) I got the file as in the below format :
File1:
Col1 Col2
2008-06-02|12.36.06
2008-06-01|23.36.35
But the problem is... (7 Replies)
i have an variable mydate=2008Nov07
i want o/p as in variable mymonth=11 (i.e nov comes on 11 number month)
i want some command to do this for any month without using any loop.
plz help me (1 Reply)
Hi,
I'm in need of creating a file in the fasta format:
>1A6A.A
HVIIQAEFYLNPDQSGEFMFDFDGDEIFHVDMAKKETVWRLEEFGRFASFEAQGALANIAVDKANLEIMTKRSNYTPITN
VPPEVTVLTNSPVELREPNVLICFIDKFTPPVVNVTWLRNGKPVTTGVSETVFLPREDHLFRKFHYLPFLPSTEDVYDCR
VEHWGLDEPLLKHWEF
>1A6A.B ... (5 Replies)
I have a list of dates in the following format: mm/dd/yyyy and want to change these to the MySQL standard format: yyyy-mm-dd.
The dates in the original file may or may not be zero padded, so April is sometimes "04" and other times simply "4".
This is what I use to change the format:
sed -i '' -e... (2 Replies)
Hi all,
I have a file with below data
af23b|11-FEB-12|acc7
ad23b|12-JAN-12|acc4
as23b|15-DEC-11|acc5
z123b|18-FEB-12|acc1
I need the output as below:-(date in yyyymmdd format)
af23b|20120211|acc7
ad23b|20120112|acc4
as23b|20111215|acc5
z123b|20120218|acc1
Please help me on this.... (7 Replies)
Hi,
I am having a file of dna sequences in fasta format which look like this:
>admin_1_45
atatagcaga
>admin_1_46
atatagcagaatatatat
with many such thousands of sequences in a single file. I want to the replace the accession Id "admin_1_45" similarly in following sequences to... (5 Replies)
Hello all,
I am tryign to change the format of files (which are many in numbers). They at present are named like this:
SomeProcess_M-130_100_1_3BR.root
SomeProcess_M-130_101_2_3BX.root
SomeProcess_M-130_103_3_3RY.root
SomeProcess_M-130_105_1_3GH.root
SomeProcess_M-130_99_1_3LF.root... (7 Replies)
Discussion started by: emily
7 Replies
LEARN ABOUT DEBIAN
formatdb
FORMATDB(1) NCBI Tools User's Manual FORMATDB(1)NAME
formatdb - format protein or nucleotide databases for BLAST
SYNOPSIS
formatdb [-] [-B filename] [-F filename] [-L filename] [-T filename] [-V] [-a] [-b] [-e] [-i filename] [-l filename] [-n str] [-o] [-p F]
[-s] [-t str] [-v N]
DESCRIPTION
formatdb must be used in order to format protein or nucleotide source databases before these databases can be searched by blastall,
blastpgp or MegaBLAST. The source database may be in either FASTA or ASN.1 format. Although the FASTA format is most often used as input
to formatdb, the use of ASN.1 is advantageous for those who are using ASN.1 as the common source for other formats such as the GenBank
report. Once a source database file has been formatted by formatdb it is not needed by BLAST. Please note that if you are going to apply
periodic updates to your BLAST databases using fmerge(1), you will need to keep the source database file.
OPTIONS
A summary of options is included below.
- Print usage message
-B filename
Binary Gifile produced from the Gifile specified by -F. This option specifies the name of a binary GI list file. This option
should be used with the -F option. A text GI list may be specified with the -F option and the -B option will produce that GI list
in binary format. The binary file is smaller and BLAST does not need to convert it, so it can be read faster.
-F filename
Gifile (file containing list of gi's) for use with -B or -L
-L filename
Create an alias file named filename, limiting the sequences searched to those specified by -F.
-T filename
Set the taxonomy IDs in ASN.1 deflines according to the table in filename.
-V Verbose: check for non-unique string ids in the database
-a Input file is database in ASN.1 format (otherwise FASTA is expected)
-b ASN.1 database is binary (as opposed to ASCII text)
-e Input is a Seq-entry. A source ASN.1 database (either text ascii or binary) may contain a Bioseq-set or just one Bioseq. In the
latter case -e should be provided.
-i filename
Input file(s) for formatting
-l filename
Log file name (default = formatdb.log)
-n str Base name for BLAST files (defaults to the name of the original FASTA file)
-o Parse SeqID and create indexes. If the source database is in FASTA format, the database identifiers in the FASTA definition line
must follow the conventions of the FASTA Defline Format.
-p F Input is a nucleotide, not a protein.
-s Index only by accession, not by locus. This is especially useful for sequence sets like the EST's where the accession and locus
names are identical. Formatdb runs faster and produces smaller temporary files if this option is used. It is strongly recommended
for EST's, STS's, GSS's, and HTGS's.
-t str Title for database file [String]
-v N Break up large FASTA files into `volumes' of size N million letters (4000 by default). As part of the creation of a volume, for-
matdb writes a new type of BLAST database file, called an alias file, with the extension `nal' or `pal'.
AUTHOR
The National Center for Biotechnology Information.
SEE ALSO blast(1), copymat(1), formatrpsdb(1), makemat(1), /usr/share/doc/blast2/formatdb.html.
NCBI 2007-10-19 FORMATDB(1)