Sponsored Content
Full Discussion: fast sequence extraction
Top Forums UNIX for Dummies Questions & Answers fast sequence extraction Post 302665175 by Ygor on Monday 2nd of July 2012 05:37:22 AM
Old 07-02-2012
Try...
Code:
$ head file[12]
==> file1 <==
>someseq
GAACTTGAGATCCGGGGAGCAGTGGATCTC
CACCAGCGGCCAGAACTGGTGCACCTCCAG
GCCAGCCTCGTCCTGCGTGTC
>another seq
GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT
GACATTTTCATTACTACCATTTTGGAGTACA
>seq3450
TTTTCCTGTTCACTGCTGCTTTTCTATAGACAGCA
GCAGCAAGCAGTAAGAGAAAGTA

==> file2 <==
someseq 5       10
another seq     1       12
seq3450 3       10

$ awk 'NR==FNR{if($0~/^>/){i=substr($0,2);getline};a[i]=a[i] $0;next}{print ">" $1 ORS substr(a[$1], $2, $3-$2+1)}' file1 FS=\\t file2
>someseq
TTGAGA
>another seq
GGCATTTTTGTG
>seq3450
TTCCTGTT

$

 

7 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need help fast

I am trying to reset the IP address on a Unix HP box here in my office and I am stuck in this EM100 mode and cant issue any commands. Any help would be great. By the way I no zero about unix. Thanks (0 Replies)
Discussion started by: zx6ninja
0 Replies

2. Solaris

what is that 1 in the instruction!~ (please help fast)

Hi all, make_lofs /.cdrom/<something>/<something> 1 what does this instruction mean? Note:both the "something" are obviously different . I would like to know what that 1 means, the rest of the instruction is clear!! Thanks (6 Replies)
Discussion started by: wrapster
6 Replies

3. Solaris

How do you ufsrestore the fast way?

hi, on my sol9 box i create my backup using the below command: /usr/sbin/ufsdump 0uf /dev/rmt/0n /u1 /usr/sbin/ufsdump 0uf /dev/rmt/0n /u2 /usr/sbin/ufsdump 0uf /dev/rmt/0n /u3 /usr/sbin/ufsdump 0uf /dev/rmt/0n /u4 now on the new sol10 box, to restore i use this commands: cd /u1... (3 Replies)
Discussion started by: pinoy43v3r
3 Replies

4. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ... (1 Reply)
Discussion started by: manigrover
1 Replies

5. Shell Programming and Scripting

Help me in this script fast

i have log files that represent names, times and countries, each name come once in country but may in diff times i need at end each name visited which country and its USA | Tony | 12:25:22:431 Italy | Tony | 09:33:11:212 **** Italy| John | 08:22:12:349 France | Adam | 14:22:42:981... (2 Replies)
Discussion started by: teefa
2 Replies

6. Shell Programming and Scripting

Sequence extraction

i want to extract specific region of interest from big file. i have only start position, end position and seq id, see my query is: I have file1 is this >GL3482.1 GAACTTGAGATCCGGGGA GCAGTGGATCTCCACCAG CGGCCAGAACTGGTGCAC CTCCAGGCCAGCCTCGTC CTGCGTGTC >GL3550.1... (14 Replies)
Discussion started by: harpreetmanku04
14 Replies

7. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies
SEQ(1)							    BSD General Commands Manual 						    SEQ(1)

NAME
seq -- print sequences of numbers SYNOPSIS
seq [-w] [-f format] [-s string] [-t string] [first [incr]] last DESCRIPTION
The seq utility prints a sequence of numbers, one per line (default), from first (default 1), to near last as possible, in increments of incr (default 1). When first is larger than last the default incr is -1. All numbers are interpreted as floating point. Normally integer values are printed as decimal integers. The seq utility accepts the following options: -f format Use a printf(3) style format to print each number. Only the E, e, f, G, g, and % conversion characters are valid, along with any optional flags and an optional numeric minimum field width or precision. The format can contain character escape sequences in backslash notation as defined in ANSI X3.159-1989 (``ANSI C89''). The default is %g. -s string Use string to separate numbers. The string can contain character escape sequences in backslash notation as defined in ANSI X3.159-1989 (``ANSI C89''). The default is . -t string Use string to terminate sequence of numbers. The string can contain character escape sequences in backslash notation as defined in ANSI X3.159-1989 (``ANSI C89''). This option is useful when the default separator does not contain a . -w Equalize the widths of all numbers by padding with zeros as necessary. This option has no effect with the -f option. If any sequence numbers will be printed in exponential notation, the default conversion is changed to %e. The seq utility exits 0 on success and non-zero if an error occurs. EXAMPLES
# seq 1 3 1 2 3 # seq 3 1 3 2 1 # seq -w 0 .05 .1 0.00 0.05 0.10 SEE ALSO
jot(1), printf(1), printf(3) HISTORY
The seq command first appeared in Plan 9 from Bell Labs. A seq command appeared in NetBSD 3.0, and ported to FreeBSD 9.0. This command was based on the command of the same name in Plan 9 from Bell Labs and the GNU core utilities. The GNU seq command first appeared in the 1.13 shell utilities release. BUGS
The -w option does not handle the transition from pure floating point to exponent representation very well. The seq command is not bug for bug compatible with the Plan 9 from Bell Labs or GNU versions of seq. BSD
February 19, 2010 BSD
All times are GMT -4. The time now is 10:59 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy