Parsing a fasta sequence with start and end coordinates

04-15-2011

Registered User

58, 0

Join Date: Jun 2009

Last Activity: 13 March 2014, 4:17 PM EDT

Posts: 58

Thanks Given: 12

Thanked 0 Times in 0 Posts

Parsing a fasta sequence with start and end coordinates

Hi.. I have a seperate chromosome sequences and i wanted to parse some regions of chromosome based on start site and end site.. how can i achieve this?

For Example Chr 1 is in following format

Quote:

>chr1
GAATTCCAAAGCCAAAGATTGCATCAGTTCTGCTGCTATTTCCTCCTATCATTCTTTCTGATGTTGAAAATGATATTAAG

I need regions from 2 - 10 should give me AATTCCAAA

and in a similar way 15- 25 should give me AAGATTGCAT

and from 27 - 30 should give me AGTT

How can i do it either in perl or bioperl or awk or any other way?

Last edited by empyrean; 04-15-2011 at 02:55 AM..

empyrean

View Public Profile for empyrean

Find all posts by empyrean

04-15-2011

Registered User

436, 107

Join Date: Feb 2011

Last Activity: 24 March 2015, 6:12 AM EDT

Posts: 436

Thanks Given: 9

Thanked 107 Times in 106 Posts

Code:

awk -v start=2 -v end=10 -v chr=chr1 '$0~chr{getline seq; print substr(seq,start,end-start+1)}' sequence
AATTCCAAA

awk -v start=15 -v end=25 -v chr=chr1 '$0~chr{getline seq; print substr(seq,start,end-start+1)}' sequence
AAGATTGCATC

yinyuemi

View Public Profile for yinyuemi

Find all posts by yinyuemi

04-15-2011

Registered User

58, 0

Join Date: Jun 2009

Last Activity: 13 March 2014, 4:17 PM EDT

Posts: 58

Thanks Given: 12

Thanked 0 Times in 0 Posts

Thanks for the reply.. i am pretty new to awk programming.. so i have chromosome 1 in a fasta file format and where should i give it as input?

empyrean

View Public Profile for empyrean

Find all posts by empyrean

04-15-2011

Registered User

894, 183

Join Date: Jul 2010

Last Activity: 2 November 2018, 11:07 AM EDT

Location: IN

Posts: 894

Thanks Given: 15

Thanked 183 Times in 174 Posts

Using cut command

Code:

cut -c2-10 inputfile

michaelrozar17

View Public Profile for michaelrozar17

Find all posts by michaelrozar17

04-15-2011

Registered User

58, 0

Join Date: Jun 2009

Last Activity: 13 March 2014, 4:17 PM EDT

Posts: 58

Thanks Given: 12

Thanked 0 Times in 0 Posts

cut command is not working properly.. its splicing whole file in to 10 frament length lines

empyrean

View Public Profile for empyrean

Find all posts by empyrean

04-15-2011

Registered User

436, 107

Join Date: Feb 2011

Last Activity: 24 March 2015, 6:12 AM EDT

Posts: 436

Thanks Given: 9

Thanked 107 Times in 106 Posts

Quote:

Originally Posted by empyrean

Thanks for the reply.. i am pretty new to awk programming.. so i have chromosome 1 in a fasta file format and where should i give it as input?

I think it should be ok if you use the whole fasta file as the input file:

Code:

awk '{CODE}' fasta

yinyuemi

View Public Profile for yinyuemi

Find all posts by yinyuemi

04-15-2011

Registered User

58, 0

Join Date: Jun 2009

Last Activity: 13 March 2014, 4:17 PM EDT

Posts: 58

Thanks Given: 12

Thanked 0 Times in 0 Posts

No its not giving correct results.. I have the fasta file of 300,000 bp long.. but i need the sequences for some specific sites.. The above code in awk only giving the sequence of one line no matter how much length you give.. Also if the start site is after the first line, we are not getting any information about it..

empyrean

View Public Profile for empyrean

Find all posts by empyrean

Shell Programming and Scripting

Parsing a fasta sequence with start and end coordinates

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Splitting week start date and end date based on custom period start dates

Discussion started by: nani2019

2. UNIX for Beginners Questions & Answers

How to find a specific sequence pattern in a fasta file?

Discussion started by: dineshkumarsrk

3. Shell Programming and Scripting

Command Line Perl for parsing fasta file

Discussion started by: jdilts

4. Shell Programming and Scripting

Parsing and masking regions from a single fasta file with subsequence

Discussion started by: margarita

5. Shell Programming and Scripting

Count and search by sequence in multiple fasta file

Discussion started by: empyrean

6. Shell Programming and Scripting

Extract sequence from fasta file

Discussion started by: ritakadm

7. UNIX for Dummies Questions & Answers

Change sequence names in fasta file

Discussion started by: tyrianthinae

8. Shell Programming and Scripting

Remove lines between the start string and end string including start and end string Python

Discussion started by: Dabheeruz

9. UNIX for Dummies Questions & Answers

How to change sequence name in along fasta file?

Discussion started by: baika

10. UNIX for Dummies Questions & Answers

Help Parsing Sequence File

Discussion started by: Fahmida