10 More Discussions You Might Find Interesting
1. UNIX for Beginners Questions & Answers
I have to mine the following sequence pattern from a large fasta file namely gene.fasta (contains multiple fasta sequences) along with the flanking sequences of 5 bases at starting position and ending position,
AAGCZ-N16-AAGCZ
Z represents A, C or G (Except T)
N16 represents any of the four... (3 Replies)
Discussion started by: dineshkumarsrk
3 Replies
2. Shell Programming and Scripting
Hallo,
I need to extract distinct sequence of letters for example from 136 to 193
Files are quite big, so I would prefer not to use "fold -w1"
Thank you very much
Input file look like this:
1 cttttacctt catgtgtttt tgcagatatt tgttcataat aacatcttct ttttaagtta
61 ttaaaatctt... (4 Replies)
Discussion started by: kamcamonty
4 Replies
3. Shell Programming and Scripting
Hello,
I have 10 fasta files with sequenced reads information with read sizes from 15 - 35 . I have combined the reads and collapsed in to unique reads and filtered for sizes 18 - 26 bp long unique reads. Now i wanted to count each unique read appearance in all the fasta files and make a table... (5 Replies)
Discussion started by: empyrean
5 Replies
4. Shell Programming and Scripting
I have two files. File1 is shown below.
>153L:B|PDBID|CHAIN|SEQUENCE
RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL
KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM
DIGTTHDDYANDVVARAQYYKQHGY
>16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies
5. UNIX for Dummies Questions & Answers
I have fasta files with multiple sequences in each. I need to change the sequence name headers from:
>accD:_59176-60699
ATGGAAAAGTGGAGGATTTATTCGTTTCAGAAGGAGTTCGAACGCA
>atpA_(reverse_strand):_showing_revcomp_of_10525-12048
ATGGTAACCATTCAAGCCGACGAAATTAGTAATCTTATCCGGGAAC... (2 Replies)
Discussion started by: tyrianthinae
2 Replies
6. UNIX for Dummies Questions & Answers
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Discussion started by: baika
2 Replies
7. Shell Programming and Scripting
Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this?
For Example Chr 1 is in following format
I need regions from 2 - 10 should give me AATTCCAAA
and in a similar way 15- 25 should give... (8 Replies)
Discussion started by: empyrean
8 Replies
8. Shell Programming and Scripting
Hi,
I have an one-line file consisting of a sequence of 660 letters. I would like to extract 9-letter blocks iteratively:
ASDFGHJKLQWERTYUIOPZXCVBNM
first block: ASDFGHJKL
1nd block: SDFGHJKLQ
What I have so far only gives me the first block, can anyone please explain why?
cat... (7 Replies)
Discussion started by: solli
7 Replies
9. Shell Programming and Scripting
Dear Collegues
I have to extract Some pattern from raw text file using perl
The input will be raw text.
Pattern to get - Sequence of Capital Letter Words ( e.g. he is working in Center for Perl Studies. He will come tomorrow...) from thos I have to extract sequences like "Center for Perl... (5 Replies)
Discussion started by: jaganadh
5 Replies
10. Shell Programming and Scripting
Hi
I want to be able to extract a sequence of n lines from a file.
ideas, commands and suggestions would be highly appreciated.
Thanks (4 Replies)
Discussion started by: 0ktalmagik
4 Replies
FASTACMD(1) NCBI Tools User's Manual FASTACMD(1)
NAME
fastacmd - retrieve FASTA sequences from a BLAST database
SYNOPSIS
fastacmd [-] [-D N] [-I] [-L start,stop] [-P N] [-S N] [-T] [-a] [-c] [-d str] [-i str] [-l N] [-o filename] [-p type] [-s str] [-t]
DESCRIPTION
fastacmd retrieves FASTA formatted sequences from a blast(1) database formatted using the `-o' option. An example fastacmd call would be
fastacmd -d nr -s p38398
OPTIONS
A summary of options is included below.
- Print usage message
-D N Dump the entire database in some format:
1 fasta
2 GI list
3 Accession.version list
-I Print database information only (overrides all other options)
-L start,stop
Range of sequence to extract (0 in start is beginning of sequence, 0 in stop is end of sequence, default is whole sequence)
-P N Retrieve sequences with Protein Identification Group (PIG) N.
-S N Strand on subsequence (nucleotide only):
1 top (default)
2 bottom
-T Print taxonomic information for requested sequence(s)
-a Retrieve duplicate accessions
-c Use ^A ( 01) as non-redundant defline separator
-d str Database (default is nr)
-i str Input file with GIs/accessions/loci for batch retrieval
-l N Line length for sequence (default = 80)
-o filename
Output file (default = stdout)
-p type
Type of file:
G guess (default): look for protein, then nucleotide
T protein
F nucleotide
-s str Comma-delimited search string(s). GIs, accessions, loci, or fullSeq-id strings may be used, e.g., 555, AC147927, 'gnl|dbname|tag'
-t Definition line should contain target GI only
EXIT STATUS
0 Completed successfully.
1 An error (other than those below) occurred.
2 The BLAST database was not found.
3 A search (accession, GI, or taxonomy info) failed.
4 No taxonomy database was found.
AUTHOR
The National Center for Biotechnology Information.
SEE ALSO
blast(1), /usr/share/doc/blast2/fastacmd.html.
NCBI
2005-11-04 FASTACMD(1)