Sponsored Content
Full Discussion: fast sequence extraction
Top Forums UNIX for Dummies Questions & Answers fast sequence extraction Post 302665129 by Fahmida on Monday 2nd of July 2012 04:16:58 AM
Old 07-02-2012
fast sequence extraction

Hi everyone,

I have a large text file containing DNA sequences in fasta format as follows:
Code:
>someseq
GAACTTGAGATCCGGGGAGCAGTGGATCTC
CACCAGCGGCCAGAACTGGTGCACCTCCAG
GCCAGCCTCGTCCTGCGTGTC
>another seq 
GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT
GACATTTTCATTACTACCATTTTGGAGTACA
>seq3450
TTTTCCTGTTCACTGCTGCTTTTCTATAGACAGCA
GCAGCAAGCAGTAAGAGAAAGTA
etc.

In a separate (tab/space delimited) file, I have indexes as follows:
Code:
someseq   5   10
another seq   1   12
seq3450   3   10
etc.

(above column 1 is sequence name, column 2 sequence start position and column 3 sequence end position)
I want to extract sequences from file 1 based on the indexes on file 2. For example, 'someseq 5 10' will extract characters 5-10 from 'someseq' of file 1.
Example output is:
Code:
>someseq 5 10
TTGAGA
>another seq
GGCATTTTTGTG
>seq3450   3   10
TTCCTGTT

any solution is greatly appreciated.

Moderator's Comments:
Mod Comment Please use code tags, thanks!

Last edited by zaxxon; 07-02-2012 at 05:32 AM.. Reason: code tags
 

7 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need help fast

I am trying to reset the IP address on a Unix HP box here in my office and I am stuck in this EM100 mode and cant issue any commands. Any help would be great. By the way I no zero about unix. Thanks (0 Replies)
Discussion started by: zx6ninja
0 Replies

2. Solaris

what is that 1 in the instruction!~ (please help fast)

Hi all, make_lofs /.cdrom/<something>/<something> 1 what does this instruction mean? Note:both the "something" are obviously different . I would like to know what that 1 means, the rest of the instruction is clear!! Thanks (6 Replies)
Discussion started by: wrapster
6 Replies

3. Solaris

How do you ufsrestore the fast way?

hi, on my sol9 box i create my backup using the below command: /usr/sbin/ufsdump 0uf /dev/rmt/0n /u1 /usr/sbin/ufsdump 0uf /dev/rmt/0n /u2 /usr/sbin/ufsdump 0uf /dev/rmt/0n /u3 /usr/sbin/ufsdump 0uf /dev/rmt/0n /u4 now on the new sol10 box, to restore i use this commands: cd /u1... (3 Replies)
Discussion started by: pinoy43v3r
3 Replies

4. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

5. Shell Programming and Scripting

Help me in this script fast

i have log files that represent names, times and countries, each name come once in country but may in diff times i need at end each name visited which country and its USA | Tony | 12:25:22:431 Italy | Tony | 09:33:11:212 **** Italy| John | 08:22:12:349 France | Adam | 14:22:42:981... (2 Replies)
Discussion started by: teefa
2 Replies

6. Shell Programming and Scripting

Sequence extraction

i want to extract specific region of interest from big file. i have only start position, end position and seq id, see my query is: I have file1 is this >GL3482.1 GAACTTGAGATCCGGGGA GCAGTGGATCTCCACCAG CGGCCAGAACTGGTGCAC CTCCAGGCCAGCCTCGTC CTGCGTGTC >GL3550.1... (14 Replies)
Discussion started by: harpreetmanku04
14 Replies

7. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies
TRUNCATE(1)							   User Commands						       TRUNCATE(1)

NAME
truncate - shrink or extend the size of a file to the specified size SYNOPSIS
truncate OPTION... FILE... DESCRIPTION
Shrink or extend the size of each FILE to the specified size A FILE argument that does not exist is created. If a FILE is larger than the specified size, the extra data is lost. If a FILE is shorter, it is extended and the extended part (hole) reads as zero bytes. Mandatory arguments to long options are mandatory for short options too. -c, --no-create do not create any files -o, --io-blocks treat SIZE as number of IO blocks instead of bytes -r, --reference=RFILE base size on RFILE -s, --size=SIZE set or adjust the file size by SIZE bytes --help display this help and exit --version output version information and exit The SIZE argument is an integer and optional unit (example: 10K is 10*1024). Units are K,M,G,T,P,E,Z,Y (powers of 1024) or KB,MB,... (pow- ers of 1000). SIZE may also be prefixed by one of the following modifying characters: '+' extend by, '-' reduce by, '<' at most, '>' at least, '/' round down to multiple of, '%' round up to multiple of. AUTHOR
Written by Padraig Brady. REPORTING BUGS
GNU coreutils online help: <http://www.gnu.org/software/coreutils/> Report truncate translation bugs to <http://translationproject.org/team/> COPYRIGHT
Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. SEE ALSO
dd(1), truncate(2), ftruncate(2) Full documentation at: <http://www.gnu.org/software/coreutils/truncate> or available locally via: info '(coreutils) truncate invocation' GNU coreutils 8.28 January 2018 TRUNCATE(1)
All times are GMT -4. The time now is 06:51 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy