Parsing and masking regions from a single fasta file with subsequence
HI,
I have a Complete genome fasta file and I have list of sub sequence regions
in the format as :
I want a script which can mask these region in a single complete genome fasta file with the alphabet N
kindly help
Last edited by Don Cragun; 09-25-2014 at 06:51 AM..
Reason: Add CODE tags.
Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this?
For Example Chr 1 is in following format
I need regions from 2 - 10 should give me AATTCCAAA
and in a similar way 15- 25 should give... (8 Replies)
Hi,
I have 3 kind of files that contains date data needed to be masked. The file is like this:
File 1 (all contents in 1 line):
input:DTM+7:201103281411:203'LOC+175+SGSIN:139:6+TERMINATOR......'DTM+132:201103281413:203'LOC....
output:... (4 Replies)
Hello guys,
I guess you are fed up with sed command and parse questions, but after a while researching the forum, I could not get an answer to my doubt. I know it must be easy done with sed command, but unfortunately, I never get right syntax of this command
OK, this is what I have in my... (3 Replies)
Hi
I have an alignment file (.fasta) with ~80 sequences. They look like this-
>JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0
GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT
TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT... (2 Replies)
Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2.
Based on a post elsewhere, I found the code:
awk... (2 Replies)
Hi,
I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help .
input
> fefrwefrwef X900
AGAGGGAATTGG
AGGGGCCTGGAG
GGTTCTCTTC
> fefrwefrwef X932
AGAGGGAATTGG
AGGAGGTGGAG
GGTTCTCTTC
> fefrwefrwef X937... (2 Replies)
I would like to take a fasta file formated like
>0001
agttcgaggtcagaatt
>0002
agttcgag
>0003
ggtaacctga
and use command line perl to move the all sample gt 8 in length to a new file. the result would be
>0001
agttcgaggtcagaatt
>0003
ggtaacctga
cat ${sample}.fasta | perl -lane... (2 Replies)
Hello, here I am posting my query again with modified data input files.
see my query is :
i have two input files file1 and file2.
file1 is smalldata.fasta
>gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
I have the following script:
awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }'
and the following file:
>P39PT-1224 Freq 900
cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg
>P39PT-784 Freq 2... (2 Replies)
VERSION:(1) User Commands VERSION:(1)NAME
PyNAST - alignment of short DNA sequences
SYNOPSIS
pynast [options] {-i input_fp -t template_fp}
DESCRIPTION
[] indicates optional input (order unimportant) {} indicates required input (order unimportant)
Example usage:
pynast -i my_input.fasta -t my_template.fasta
OPTIONS --version
show program's version number and exit
-h, --help
show this help message and exit
-t TEMPLATE_FP, --template_fp=TEMPLATE_FP
path to template alignment file [REQUIRED]
-i INPUT_FP, --input_fp=INPUT_FP
path to input fasta file [REQUIRED]
-v, --verbose
Print status and other information during execution [default: False]
-p MIN_PCT_ID, --min_pct_id=MIN_PCT_ID
minimum percent sequence identity to consider a sequence a match [default: 75.0]
-l MIN_LEN, --min_len=MIN_LEN
minimum sequence length to include in NAST alignment [default: 1000]
-m PAIRWISE_ALIGNMENT_METHOD, --pairwise_alignment_method=PAIRWISE_ALIGNMENT_METHOD
method for performing pairwise alignment [default: uclust]
-a FASTA_OUT_FP, --fasta_out_fp=FASTA_OUT_FP
path to store resulting alignment file [default: derived from input filepath]
-g LOG_FP, --log_fp=LOG_FP
path to store log file [default: derived from input filepath]
-f FAILURE_FP, --failure_fp=FAILURE_FP
path to store file of seqs which fail to align [default: derived from input filepath]
-e MAX_E_VALUE, --max_e_value=MAX_E_VALUE
Depreciated. Will be removed in PyNAST 1.2
-d BLAST_DB, --blast_db=BLAST_DB
Depreciated. Will be removed in PyNAST 1.2
SEE ALSO
http://pynast.sourceforge.net
Version: pynast 1.1 August 2011 VERSION:(1)