Sponsored Content
Full Discussion: Sequence extraction
Top Forums Shell Programming and Scripting Sequence extraction Post 302951316 by harpreetmanku04 on Wednesday 5th of August 2015 05:48:24 AM
Old 08-05-2015
let me clear my query again :

i have 2 files, file1 is 1.fasta sequences file which are as follow:

Code:
>gi|547177824|gb|AWWX01000001.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig0, whole genome shotgun sequence
ACATAATCCCCGAGGCAAAATAAGTCTCTAATGAACTTGACCCTATGAGTGTCAAGGTGAGGGAGTCCTA
AGAGACTGACAAGGGCTGTGAAGGCTACAAGGGAAGAAGACAACAGGATCAGCTGGAAAAGGCTTAAAAT
TGCAGCCCTTGATTCTTCCACTGTGCCCTGGGGCCATGAATCGCTACAGCCTCACTGAAGGAATCTGAGG
TAGCATCTCAGAGCTCCCATGCCAAGACCATGGGGAACAAATTTGAGTTGGATGTGGCAGCATGGCCCAG
AGGTCATGAAATAACCAGCACAAACTTTGTTAGTGGATGACAACTGGCCCTTTTAAGTGATCACCTTAAT
AGACTATCATTCAGGGACTAAAAGAGAAAACTTAGTACCCCAAAAATTGTACCAAAGGACAGAAATTCAC
CTCAGAACTTTCCAGCAAAG
>gi|547177823|gb|AWWX01000002.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig1, whole genome shotgun sequence
CAATTTGATTCCAGCTTGTGTTCATCCATCTGGGAATTTTCATGATGTACTGTGCATATAACTTAAATAA
TCAGGGTGACAATATGCAGCCTTAATGTACTTGTGTCCCAATTTTGAACCTGTCTGTTGTTCCGTGTCCA
CTTCTAACTGTTGCTTCTTGACCTGTATACCAGTTTTTCAGAAGGCAGGTAAGGTGCTCTGGTATTCCCA
TCTCTTTAATAATTTTGCAGTTTCCTGTGATACACACAGACAAAGGCTTTTACATAGTCAAACAAGTAGA
AGTTTTTCTGGAGTTCTCTTGCTTTGCCTA
>gi|547177822|gb|AWWX01000003.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig2, whole genome shotgun sequence
GGTCCAAGAGGCTCCTTCTTTTGTCCAGAAAGAGTTCCTGTGAACAGGGCTCAGTGGCTGACTGATGGTT
AGTATCATCTGGAGAAGGGAAAGGGCTTTTGTAGGCAAATTTAGCATTGCCAAGCCAGCAGAGTCTACCT
GGCAAGTCTTTCAAAGTTCAGAATTTTTCCTAAAGGCAAGGAAAACATGGACATAAGGGACCTCGCTGGA
ATCCTAAGCGCAAATCTTCAGCAATGGACAGTCCCATCTGAAAAGGAAATGATGAATGCTCATAAGGGGC
TTCGTGTGTCACTGAATTCCCTTATGGACTTTTCCTGGCTGTGGCATCCCTCATCTGTGATCTTGCTGGG
CATAATGTGGTTCAGTTCTCGCCCCAGGGGCCGTTGTGGCTTCCCTGAGTATTTTTTTAAGAGATTATTT
ATGTGTTTAATTTTGGTTGCACTGGCTCTTCATTGCTCCATGTGTGTTCTCTAGTTTCGAGGAGCAGGGG
CTA
>gi|547177821|gb|AWWX01000004.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig3, whole genome shotgun sequence
CTATATAAATAAATCTATTTCTTGCCTAACACTTTGCCTGTTGCTGAATTCCTTCTGTGCCGAGACACAA
ACAATGTGAGCCACAGTACATCCAGGTACCAGATAAGTGATTCTAAATAAAAGACCATGGTTCAAGACTC
AAACTGGATTTTGGGGAGGGGGGGAGGTTGGAGTCCTGGCCATGTGGATTTTAGTCCAAACCTGATGTGA
TAGGTTTCAGGTAGAGATCATCTCCATGCCACCACATTGGAAATTTTATACTATGGTCTGTTATACTGGT
TCA
>gi|547177820|gb|AWWX01000005.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig4, whole genome shotgun sequence
TCTCAAGCAGACTTTACCTATAAAAAGGGCAATGTGTCCTGGAACTGCAAGCTGAGCTTAGGAACAGAGA
ACTGCAAAAACTATTGGCATGAACAACTCTGTCCAAACATCTACATTAGGAACCTTGACTGACCTCTAGC
ATGGTTTCTAGCAGCAACCTAAGGCCACGTTCTAGGACAACTCAGCTACCCCTGAGTTCCTGTCTAGAAA
ATTTCAAGGCTACCAAAGGAATCTGCTCCAGCCAACATCTGACATAAGCCCCTCATCTTCCTTTACTTAG
AGTGTCTATTTAAAACAAAGACCAAAAAAAAAAAAAACAAAAACAAAAAACCCTCACGATTACAAGAAAA
GTGTGACGGAAATAAACTAGAAATTGATAACAAAAAGACAGCAGGAAAATTAAAGATTATTTGCAGACTA
AGCAGAAAACATCTAAA
>gi|547177819|gb|AWWX01000006.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig5, whole genome shotgun sequence
ATCAGCAAGCTCAACCTCGGGCAAACGATTGTCCGAGTGATGATGGGGGCTGCCCAAGTCCGGTCTCCTC
ATGGTGCTCCTGGGGGTCATCTTCATGAATGGTAACCACGCCACCGAGGAGGAGGTCTGGGAATTCCTGA
GTATGTTGGGGATCTATGCTGGGAGGAGGCACTGGATCTTTGGGGAGCCCAGAAGGCTCATCACCAAAGA
TCTGGTGCAGAAGGAGTACCTGAACTACCGCCGGGTGCCCAATAGTGATCCTCCGCGCTACCAGTTGCTG
TGGGGCCCGAGAGCTTGTGCTGAGACCAGTAAGATGAAGGTACTGGAGGTTCTAGCCAAGTTCCACGGTA
GGGTCCCTAGTTCCTTCCCAGACCTATATGACGAGGCTCTGAGAGATCAGGCGGAGACAGCAGGGCGGAG
AGGTGTGGCCAGGGCTCCCGACCATGGCTGAGGCCAGTGCCC
>gi|547177818|gb|AWWX01000007.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig6, whole genome shotgun sequence
ATGAAGAAATGAAGAGAAAAAAAATGAGCACAAATACCTCTCATGAACATAGAAGCAAAATCCCCAACCA
AATATAAGAAAAGAAACCTAAAAATCCATAGAAGGAATTACATACTATATCTAAGTAGGATGCATTTCAG
ATGTGCAACATTTGAAAATCAGCAATCATAAATCATAATAACAATAGTCTAAGGAAGATAATCACAAGAT
GAGATCAACAGATGAAGGGAAAAATCATCTGATAAATCCAACACCCATTCATCATAAAAATTCTACAGCA
AACTAGATACAGACAGGCATTTCCTCAGCTTCATATAAAACATACTAAAAAGAAATACAGGTAAAACATA
CTTCATAGT
>gi|547177817|gb|AWWX01000008.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig7, whole genome shotgun sequence
AACTTTGGTGCCCCGTGGTCCTGGCATGGGGCCTCGGAGATGCTGCCTCAGATTCCTTCAGTGAAGCAGT
GGAGATTCATGATCCCAGGGCACAGTGCAAGAATCCAGGGCTCAAGTTTAAAGCATTTTTGCAGCTCATC
CTGTTGTCTGCTTCCCTTGCAGGCTGCACATCTATGCCAGTCTCTTAAGTACTCCCCCACCTAGGTACAC
ATAGGGTCAAGTTCAGTGGAGACTTACTGTCCATCAGGGATTATTTAATTTTATGTTTTCACTTTTGTTA
ATAGTTTTCTTCATTCCTTCAATAAGATTTCCGTGGTTACTTCTGGGTACAAAAA
>gi|547177816|gb|AWWX01000009.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig8, whole genome shotgun sequence
CCTTCTTTTTCCTGAGACTCTCAGGAAATTCTCCTTGCCTCCATGAAGACTGTATACATTTTTCTAAAGT
TTCTTTGTAGGTAAATTTTCTCTATGGTTGCTTTTGTTTGTGGTATATTTTCTGTTACTTGGTGTAAGTA
ACTGTTGATTCTTCCAGACCAACTGGATGATTTTGCTGCCATGATCTTTTTATTATTCCAGATTTTTGAA
GTTTCTTACATGTTTGAATATTTCCTGTCCTGTTTTCTAAGATTTAATTCGAGAATCAAATTGTCATTGT
GATCTTTTGCTTTTCATTACTTCTGACTTTTATTCATTTGTCATTGTTGTATATAACTCTAGTGGCTATA
TGTACTT

file2 is result.ods file which is as :

Code:
gi|546687122|gb|AWWX01446731.1|	       13172	13194
gi|546693672|gb|AWWX01441057.1|	       6859	        6837
gi|546698969|gb|AWWX01436431.1|	       18753	18775
gi|546703077|gb|AWWX01432778.1|   	4132	        4154
gi|546670495|gb|AWWX01450063.1|	        4111	        4133
gi|546689695|gb|AWWX01444610.1|	       14602	14580
gi|546691352|gb|AWWX01443112.1|	       10073	10051
gi|546880329|gb|AWWX01275531.1|	       1158	        1136
gi|546670216|gb|AWWX01450333.1|	       10633	10655
gi|546678257|gb|AWWX01448205.1|	        1112	        1134
gi|546693672|gb|AWWX01441057.1|	        6832	        6854
gi|546704475|gb|AWWX01431394.1|	        18135	18113
gi|546699347|gb|AWWX01436084.1|	        26840	26862
gi|546702960|gb|AWWX01432895.1|	        13515	13493
gi|546833971|gb|AWWX01313367.1|	        615	        593
gi|546860287|gb|AWWX01291803.1|	         2188	2210
gi|546689115|gb|AWWX01445179.1|	        10761	10739
gi|546701370|gb|AWWX01434384.1|	        2616	        2638
gi|546694075|gb|AWWX01440674.1|	        9568	        9546
gi|546701082|gb|AWWX01434635.1|	        8423	        8445
gi|547071923|gb|AWWX01098172.1|	        135	        157
gi|546705086|gb|AWWX01430793.1|	        3181	        3203
gi|546704429|gb|AWWX01431440.1|	       1352	        1330
gi|546709146|gb|AWWX01426952.1|	       52962	52984

and we want a awk script which creates an output file which contain sequences extracted from these coordinates along with id with > symbol .
for example:

Code:
>gi|546709146|gb|AWWX01426952.1|
acctgctgcatgcgtgcgtggcgtgcaaaatgcagtcaaggcaggtcagtccatgcatgacgtgcaatgcattgcatggcgtgcaaaatgcaggcgtggcgtgcaaaatgcagtcaaaaaattgccgtgcaatgggcc

output file should be like this.

I hope now it is clear to you.

Last edited by Don Cragun; 08-05-2015 at 06:57 AM.. Reason: Add CODE and ICODE tags.
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with tar extraction!

I have this tar file which has files of (.ksh, .ini &.sql) and their hard and soft links. Later when the original files and their directories are deleted (or rather lost as in a system crash), I have this tar file as the only source to restore all of them. In such a case when I do, tar... (4 Replies)
Discussion started by: manthasirisha
4 Replies

2. Shell Programming and Scripting

AWK extraction

Hi all, I have a data file from which i would like to extract only certain fields, which are not adjacent to each other. Following is the format of data file (data.txt) that i have, which has about 6 fields delimited by "|" HARRIS|23|IT|PROGRAMMER|CHICAGO|EMP JOHN|35|IT|JAVA|NY|CON... (2 Replies)
Discussion started by: harris2107
2 Replies

3. Shell Programming and Scripting

extraction of last but one char

I need to extract the character before the last "|" in the following lines, which are 'N' and 'U'. The last "|" shouldn't be extracted. Also the no.s of "|" may vary in a line, but I need only the character before the last one. ... (5 Replies)
Discussion started by: hidnana
5 Replies

4. Shell Programming and Scripting

Regex extraction

Hello, I need your help to extract text from following: ./sherg_fyd_rur:blkabl="R23.21_BL2008_0122_1" ./serge_a75:rlwual="/main/r23.21=26-Mar-2008.05:00:20UTC@R11.31_BL2008_0325" ./serge_a75:blkabl="R23.21_BL2008_0325" ./sherg_proto_npiv:bkguals="R23.21_BL2008_0302 I80_11.31_LR" I... (11 Replies)
Discussion started by: abdurrouf
11 Replies

5. Programming

extraction from a path

Hi, Can you help me on this two problems? how can i get : from input: /ect/exp/hom/bin ==> output: exp and from input: aex1234 =====>output: ex thanks, (1 Reply)
Discussion started by: yeclota
1 Replies

6. Shell Programming and Scripting

extraction

I have following input @xxxxxx@ I want to extract what's between @....@ that is : xxxx using SED command (6 Replies)
Discussion started by: xerox
6 Replies

7. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450... (4 Replies)
Discussion started by: Fahmida
4 Replies

8. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

9. Shell Programming and Scripting

String Extraction

I am trying to extract a time from the below string in perl but not able to get the time properly I just want to extract the time from the above line I am using the below syntax x=~ /(.*) (\d+)\:(\d+)\:(\d+),(.*)\.com/ $time = $2 . ':' . $3 . ':' . $4; print $time Can... (1 Reply)
Discussion started by: karan8810
1 Replies

10. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies
All times are GMT -4. The time now is 03:38 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy