Sponsored Content
Top Forums Shell Programming and Scripting Extract the part of sequences from a file Post 302881041 by rahim42 on Thursday 26th of December 2013 09:07:43 AM
Old 12-26-2013
Thank you for your answer. I corrected the code. But the first character of each sequence is not printing. I got the output like this
Code:
>P02649:3-23
LWAALLVTFLAGCQAKVEQA
>P02649:130-162
GRLVQYRGEVQAMLGQSTEELRVRLASHLRKL
>P17427:25-27
KE

My desired output is

Code:
>P02649:3-23
VLWAALLVTFLAGCQAKVEQA
>P02649:130-162
CGRLVQYRGEVQAMLGQSTEELRVRLASHLRKL
>P17427:25-27
SKE

---------- Post updated at 09:04 AM ---------- Previous update was at 09:00 AM ----------

Quote:
Originally Posted by Akshay Hegde
An Awk approch :

Code:
$ cat input.fasta
>P02649
MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT
LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA
RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY
QAGAREGAERGLSAIRERLGPLVEQGRVRAATVGSLAGQPLQERAQAWGERLRARMEEMG
SRTRDRLDEVKEQVAEVRAKLEEQAQQIRLQAEAFQARLKSWFEPLVEDMQRQWAGLVEK
VQAAVGTSAAPVPSDNH
>P17427
MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC
KLLFIFLLGHDIDFGHMEAVNLLSSNRYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL
ASRNPTFMGLALHCIANVGSREMAEAFAGEIPKILVAGDTMDSVKQSAALCLLRLYRTSP
DLVPMGDWTSRVVHLLNDQHLGVVTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA

Code:
$ cat ids.txt
P02649  3   23
P02649  130 162
P17427  25  27

Code:
$ awk '{print $0,length($0)}' input.fasta
>P02649 7
MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT 60
LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA 60
RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY 60
QAGAREGAERGLSAIRERLGPLVEQGRVRAATVGSLAGQPLQERAQAWGERLRARMEEMG 60
SRTRDRLDEVKEQVAEVRAKLEEQAQQIRLQAEAFQARLKSWFEPLVEDMQRQWAGLVEK 60
VQAAVGTSAAPVPSDNH 17
>P17427 7
MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60
KLLFIFLLGHDIDFGHMEAVNLLSSNRYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 60
ASRNPTFMGLALHCIANVGSREMAEAFAGEIPKILVAGDTMDSVKQSAALCLLRLYRTSP 60
DLVPMGDWTSRVVHLLNDQHLGVVTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA 60

Code:
$ awk 'FNR==NR{A[FNR]=$1 FS $2 FS $3;next}{x=$1;for(i in A){split(A[i],S);if(x == ">"S[1]){print x":"S[2]"-"S[3];getline;print substr($0,S[2],S[3]-S[2])}} }' ids.txt input.fasta

Code:
>P02649:3-23
VLWAALLVTFLAGCQAKVEQ
>P02649:130-162

>P17427:25-27
SK

---------- Post updated at 09:07 AM ---------- Previous update was at 09:04 AM ----------

Dear Hedge,

Thanks for your answer. But your output is different from my desired output.
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extract a part of file name

Hi, I want to extract a part of filename and pass it as a parameter to one of the scripts. Could someone help. File name:- NLL_NAM_XXXXX.XXXXXXX_1_1.txt. Here i have to extract only XXXXX.XXXXXXX and the position will be constant. that means that i have to extract some n characters from... (6 Replies)
Discussion started by: dnat
6 Replies

2. Shell Programming and Scripting

How to extract certain part of log file?

Hi there, I'm having some problem with UNIX scripting (ksh), perhaps somebody can help me out? For example: ------------ Sample content of my log file (text file): -------------------------------------- File1: .... info_1 ... info_2 ... info_3 ... File2: .... info_1 ... info_2 ...... (10 Replies)
Discussion started by: superHonda123
10 Replies

3. Shell Programming and Scripting

extract part of text file

I need to extract the following lines from this text and put it in different files. From xxxx@gmail.com Thu Jun 10 21:15:46 2010 Return-Path: <xxxxx@gmail.com> X-Original-To: xxx@localhost Delivered-To:xxxx@localhost Received: from ubuntu (localhost ) by ubuntu (Postfix) with ESMTP... (11 Replies)
Discussion started by: waxo
11 Replies

4. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

5. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

6. Shell Programming and Scripting

Extract part of file

Hello All, I need to extract part of a file into a new file My file is Define schema xxxxxx Insert into table ( a ,b ,c ,d ) values ( 1, 2, 3, (15 Replies)
Discussion started by: nnani
15 Replies

7. Shell Programming and Scripting

Extract sequences of bytes from binary for differents blocks

Hello to all, I would like to search sequences of bytes inside big binary file. The bin file contains blocks of information, each block begins is estructured as follow: 1- Each block begins with the hex 32 (1 byte) and ends with FF. After the FF of the last block, it follows 33. 2- Next... (59 Replies)
Discussion started by: Ophiuchus
59 Replies

8. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

9. Programming

Extract part of an archive to a different file

I need to save part of a file to a different one, start and end offset bytes are provided by two counters in long format. If the difference is big, how should I do it to prevent buffer overflow in java? (7 Replies)
Discussion started by: Tribe
7 Replies

10. Shell Programming and Scripting

Extract a part of variable/line content in a file

I have a variable and assigned the following values ***XYZ_201519_20150929140642_20150929140644_211_0_0_211 I need to read this variable from backward and stop read when I get first underscore (_) In this scenario I should get 211 Thanks Kris (3 Replies)
Discussion started by: mkris
3 Replies
All times are GMT -4. The time now is 07:09 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy