Sponsored Content
Top Forums Shell Programming and Scripting Extract the part of sequences from a file Post 302881040 by Akshay Hegde on Thursday 26th of December 2013 09:00:03 AM
Old 12-26-2013
An Awk approch :

Code:
$ cat input.fasta
>P02649
MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT
LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA
RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY
QAGAREGAERGLSAIRERLGPLVEQGRVRAATVGSLAGQPLQERAQAWGERLRARMEEMG
SRTRDRLDEVKEQVAEVRAKLEEQAQQIRLQAEAFQARLKSWFEPLVEDMQRQWAGLVEK
VQAAVGTSAAPVPSDNH
>P17427
MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC
KLLFIFLLGHDIDFGHMEAVNLLSSNRYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL
ASRNPTFMGLALHCIANVGSREMAEAFAGEIPKILVAGDTMDSVKQSAALCLLRLYRTSP
DLVPMGDWTSRVVHLLNDQHLGVVTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA

Code:
$ cat ids.txt
P02649  3   23
P02649  130 162
P17427  25  27

Code:
$ awk '{print $0,length($0)}' input.fasta
>P02649 7
MKVLWAALLVTFLAGCQAKVEQAVETEPEPELRQQTEWQSGQRWELALGRFWDYLRWVQT 60
LSEQVQEELLSSQVTQELRALMDETMKELKAYKSELEEQLTPVAEETRARLSKELQAAQA 60
RLGADMEDVCGRLVQYRGEVQAMLGQSTEELRVRLASHLRKLRKRLLRDADDLQKRLAVY 60
QAGAREGAERGLSAIRERLGPLVEQGRVRAATVGSLAGQPLQERAQAWGERLRARMEEMG 60
SRTRDRLDEVKEQVAEVRAKLEEQAQQIRLQAEAFQARLKSWFEPLVEDMQRQWAGLVEK 60
VQAAVGTSAAPVPSDNH 17
>P17427 7
MPAVSKGDGMRGLAVFISDIRNCKSKEAEIKRINKELANIRSKFKGDKALDGYSKKKYVC 60
KLLFIFLLGHDIDFGHMEAVNLLSSNRYTEKQIGYLFISVLVNSNSELIRLINNAIKNDL 60
ASRNPTFMGLALHCIANVGSREMAEAFAGEIPKILVAGDTMDSVKQSAALCLLRLYRTSP 60
DLVPMGDWTSRVVHLLNDQHLGVVTAATSLITTLAQKNPEEFKTSVSLAVSRLSRIVTSA 60

Code:
$ awk 'FNR==NR{A[FNR]=$1 FS $2 FS $3;next}{x=$1;for(i in A){split(A[i],S);if(x == ">"S[1]){print x":"S[2]"-"S[3];getline;print substr($0,S[2],S[3]-S[2])}} }' ids.txt input.fasta

Code:
>P02649:3-23
VLWAALLVTFLAGCQAKVEQ
>P02649:130-162

>P17427:25-27
SK

 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Extract a part of file name

Hi, I want to extract a part of filename and pass it as a parameter to one of the scripts. Could someone help. File name:- NLL_NAM_XXXXX.XXXXXXX_1_1.txt. Here i have to extract only XXXXX.XXXXXXX and the position will be constant. that means that i have to extract some n characters from... (6 Replies)
Discussion started by: dnat
6 Replies

2. Shell Programming and Scripting

How to extract certain part of log file?

Hi there, I'm having some problem with UNIX scripting (ksh), perhaps somebody can help me out? For example: ------------ Sample content of my log file (text file): -------------------------------------- File1: .... info_1 ... info_2 ... info_3 ... File2: .... info_1 ... info_2 ...... (10 Replies)
Discussion started by: superHonda123
10 Replies

3. Shell Programming and Scripting

extract part of text file

I need to extract the following lines from this text and put it in different files. From xxxx@gmail.com Thu Jun 10 21:15:46 2010 Return-Path: <xxxxx@gmail.com> X-Original-To: xxx@localhost Delivered-To:xxxx@localhost Received: from ubuntu (localhost ) by ubuntu (Postfix) with ESMTP... (11 Replies)
Discussion started by: waxo
11 Replies

4. Shell Programming and Scripting

Extract sequences based on the list

Hi, I have a file with more than 28000 records and it looks like below.. >mm10_refflat_ABCD range=chr1:1234567-2345678 tgtgcacactacacatgactagtacatgactagac....so on >mm10_refflat_BCD range=chr1:3234567-4545678... tgtgcacactacacatgactagtatgtgcacactacacatgactagta . . . . . so on ... (2 Replies)
Discussion started by: Diya123
2 Replies

5. Shell Programming and Scripting

Extract length wise sequences from fastq file

I have a fastq file from small RNA sequencing with sequence lengths between 15 - 30. I wanted to filter sequence lengths between 21-25 and write to another fastq file. how can i do that? (4 Replies)
Discussion started by: empyrean
4 Replies

6. Shell Programming and Scripting

Extract part of file

Hello All, I need to extract part of a file into a new file My file is Define schema xxxxxx Insert into table ( a ,b ,c ,d ) values ( 1, 2, 3, (15 Replies)
Discussion started by: nnani
15 Replies

7. Shell Programming and Scripting

Extract sequences of bytes from binary for differents blocks

Hello to all, I would like to search sequences of bytes inside big binary file. The bin file contains blocks of information, each block begins is estructured as follow: 1- Each block begins with the hex 32 (1 byte) and ends with FF. After the FF of the last block, it follows 33. 2- Next... (59 Replies)
Discussion started by: Ophiuchus
59 Replies

8. Shell Programming and Scripting

Extract sequences from a FASTA file based on another file

I have two files. File1 is shown below. >153L:B|PDBID|CHAIN|SEQUENCE RTDCYGNVNRIDTTGASCKTAKPEGLSYCGVSASKKIAERDLQAMDRYKTIIKKVGEKLCVEPAVIAGIISRESHAGKVL KNGWGDRGNGFGLMQVDKRSHKPQGTWNGEVHITQGTTILINFIKTIQKKFPSWTKDQQLKGGISAYNAGAGNVRSYARM DIGTTHDDYANDVVARAQYYKQHGY >16VP:A|PDBID|CHAIN|SEQUENCE... (7 Replies)
Discussion started by: nelsonfrans
7 Replies

9. Programming

Extract part of an archive to a different file

I need to save part of a file to a different one, start and end offset bytes are provided by two counters in long format. If the difference is big, how should I do it to prevent buffer overflow in java? (7 Replies)
Discussion started by: Tribe
7 Replies

10. Shell Programming and Scripting

Extract a part of variable/line content in a file

I have a variable and assigned the following values ***XYZ_201519_20150929140642_20150929140644_211_0_0_211 I need to read this variable from backward and stop read when I get first underscore (_) In this scenario I should get 211 Thanks Kris (3 Replies)
Discussion started by: mkris
3 Replies
FBB::mlm(3bobcat)                                             OFoldStream manipulator                                            FBB::mlm(3bobcat)

NAME
FBB::mlm - Manipulator modifying left margins of OFoldStream objects SYNOPSIS
#include <bobcat/ofoldstream> or #include <bobcat/ofoldstreambuf> Linking option: -lbobcat DESCRIPTION
The mlm class implements a manipulator that can be inserted into OFoldStream objects to modify the stream's left margin by a requested amount. The request cannot result in a negative left margin value. If a negative left margin would be the arithmetic result of the request then left margin 0 will silently be used. Depending on the tab-setting of the OFoldStream the inserted value represents the number of blank space characters or the number of tab-characters that will be added to the left margin. The request will be processed at the next newline character or std::flush or std::endl manipulator that is inserted into the stream. If a line is still empty once an mlm object and a flush manipulator are inserted into the stream then the new left margin will be effective at the next word inserted into that line (cf., the example section below) A bad_cast exception is thrown when the manipulator is inserted into an ostream that is not using a OFoldStreambuf buffer. NAMESPACE
FBB All constructors, members, operators and manipulators, mentioned in this man-page, are defined in the namespace FBB. INHERITS FROM
- CONSTRUCTOR
o mlm(int addValue): The standard copy constructor is available. MEMBER FUNCTIONS
There are no public or protected member functions in this class. EXAMPLE
#include <iostream> #include <bobcat/ofoldstream> using namespace std; using namespace FBB; int main() { OFoldStream out(cout, 0, 80); out << "hello world (left margin is 0)" << mlm(4) << " " "this uses a 4 character wide left margin " << mlm(-10) << flush << "left margin -6 changed to 0, active on this line "; return 0; } FILES
bobcat/mlm - defines the class interface SEE ALSO
bobcat(7), manipulators(3bobcat), lm(3bobcat), ofoldstream(3bobcat) BUGS
None Reported. DISTRIBUTION FILES
o bobcat_3.01.00-x.dsc: detached signature; o bobcat_3.01.00-x.tar.gz: source archive; o bobcat_3.01.00-x_i386.changes: change log; o libbobcat1_3.01.00-x_*.deb: debian package holding the libraries; o libbobcat1-dev_3.01.00-x_*.deb: debian package holding the libraries, headers and manual pages; o http://sourceforge.net/projects/bobcat: public archive location; BOBCAT
Bobcat is an acronym of `Brokken's Own Base Classes And Templates'. COPYRIGHT
This is free software, distributed under the terms of the GNU General Public License (GPL). AUTHOR
Frank B. Brokken (f.b.brokken@rug.nl). libbobcat1-dev_3.01.00-x.tar.gz 2005-2012 FBB::mlm(3bobcat)
All times are GMT -4. The time now is 01:53 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy