Parsing and masking regions from a single fasta file with subsequence Post: 302918682

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing a fasta sequence with start and end coordinates

Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this? For Example Chr 1 is in following format I need regions from 2 - 10 should give me AATTCCAAA and in a similar way 15- 25 should give...

2. Shell Programming and Scripting

Masking data for different file format

Hi, I have 3 kind of files that contains date data needed to be masked. The file is like this: File 1 (all contents in 1 line): input:DTM+7:201103281411:203'LOC+175+SGSIN:139:6+TERMINATOR......'DTM+132:201103281413:203'LOC.... output:...

3. Shell Programming and Scripting

[SED] Parsing to get a single value

Hello guys, I guess you are fed up with sed command and parse questions, but after a while researching the forum, I could not get an answer to my doubt. I know it must be easy done with sed command, but unfortunately, I never get right syntax of this command OK, this is what I have in my...

4. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Hi I have an alignment file (.fasta) with ~80 sequences. They look like this- >JV101.contig00066(+):25302-42404|sequence_index=0|block_index=4|species=JV101|JV101_4_0 GAGGTTAATTATCGATAACGTTTAATTAAAGTGTTTAGGTGTCATAATTT TAAATGACGATTTCTCATTACCATACACCTAAATTATCATCAATCTGAAT...

5. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: awk...

6. Shell Programming and Scripting

Extract sequence from fasta file

Hi, I want to match the sequence id (sub-string of line starting with '>' and extract the information upto next '>' line ). Please help . input > fefrwefrwef X900 AGAGGGAATTGG AGGGGCCTGGAG GGTTCTCTTC > fefrwefrwef X932 AGAGGGAATTGG AGGAGGTGGAG GGTTCTCTTC > fefrwefrwef X937...

7. Shell Programming and Scripting

Command Line Perl for parsing fasta file

I would like to take a fasta file formated like >0001 agttcgaggtcagaatt >0002 agttcgag >0003 ggtaacctga and use command line perl to move the all sample gt 8 in length to a new file. the result would be >0001 agttcgaggtcagaatt >0003 ggtaacctga cat ${sample}.fasta | perl -lane...

8. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence...

9. UNIX for Dummies Questions & Answers

Round up -FASTA file

I have the following script: awk 'FNR==NR{s+=$3;next;} { print $1 , $2, 100*$3/s }' and the following file: >P39PT-1224 Freq 900 cccctacgacggcattggtaatggctcagctgctccggatcccgcaagccatcttggatatgagggttcgtcggcctcttcagccaagg-cccccagcagaacatccagctgatcg >P39PT-784 Freq 2...

10. Shell Programming and Scripting

Help with reformat single-line multi-fasta into multi-line multi-fasta

Input File: >Seq1 ASDADAFASFASFADGSDGFSDFSDFSDFSDFSDFSDFSDFSDFSDFSDFSD >Seq2 SDASDAQEQWEQeqAdfaasd >Seq3 ASDSALGHIUDFJANCAGPATHLACJHPAUTYNJKG ...... Desired Output File >Seq1 ASDADAFASF ASFADGSDGF SDFSDFSDFS DFSDFSDFSD FSDFSDFSDF SD >Seq2

LEARN ABOUT DEBIAN

pynast

VERSION:(1)							   User Commands						       VERSION:(1)

NAME

       PyNAST - alignment of short DNA sequences

SYNOPSIS

       pynast [options] {-i input_fp -t template_fp}

DESCRIPTION

       [] indicates optional input (order unimportant) {} indicates required input (order unimportant)

   Example usage:
	      pynast -i my_input.fasta -t my_template.fasta

OPTIONS

       --version
	      show program's version number and exit

       -h, --help
	      show this help message and exit

       -t TEMPLATE_FP, --template_fp=TEMPLATE_FP
	      path to template alignment file [REQUIRED]

       -i INPUT_FP, --input_fp=INPUT_FP
	      path to input fasta file [REQUIRED]

       -v, --verbose
	      Print status and other information during execution [default: False]

       -p MIN_PCT_ID, --min_pct_id=MIN_PCT_ID
	      minimum percent sequence	identity to consider a sequence a match [default: 75.0]

       -l MIN_LEN, --min_len=MIN_LEN
	      minimum sequence length to include in NAST alignment [default: 1000]

       -m PAIRWISE_ALIGNMENT_METHOD, --pairwise_alignment_method=PAIRWISE_ALIGNMENT_METHOD
	      method for performing pairwise alignment [default: uclust]

       -a FASTA_OUT_FP, --fasta_out_fp=FASTA_OUT_FP
	      path to store resulting alignment file [default: derived from input filepath]

       -g LOG_FP, --log_fp=LOG_FP
	      path to store log file [default: derived from input filepath]

       -f FAILURE_FP, --failure_fp=FAILURE_FP
	      path to store file of seqs which fail to align [default: derived from input filepath]

       -e MAX_E_VALUE, --max_e_value=MAX_E_VALUE
	      Depreciated. Will be removed in PyNAST 1.2

       -d BLAST_DB, --blast_db=BLAST_DB
	      Depreciated. Will be removed in PyNAST 1.2

SEE ALSO

       http://pynast.sourceforge.net

Version: pynast 1.1						    August 2011 						       VERSION:(1)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Parsing a fasta sequence with start and end coordinates

Discussion started by: empyrean

2. Shell Programming and Scripting

Masking data for different file format

Discussion started by: Alvin123

3. Shell Programming and Scripting

[SED] Parsing to get a single value

Discussion started by: manolain

4. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Discussion started by: baika