Extraction of upstream and downstream regions from long sequence file Post: 302951416

Sponsored Content

Top Forums Shell Programming and Scripting Extraction of upstream and downstream regions from long sequence file Post 302951416 by harpreetmanku04 on Friday 7th of August 2015 01:23:34 AM

08-07-2015

Registered User

sir that is excel file how can i post here?
however it is roughly like

Code:

sequence id	extracted region small	extracted region big upstream and downstream
gi|546669925|gb|AWWX01450616.1|	CACCTTGATCTTGGACTTCTAGC	"CCAGAGAAAAAAGAAGAGAAAAAAAATCACTTGGGGACATAGCAAGAAGGTGGCCATCCTCAAACCAAGG
AGAGAAGCCAGAAGAAACCAAACTTTCCAACACCTTGATCTTGGACTTCTAGCCTCCAGAACTGTGAGAA
AATAAATTTCTGTAGAGTCACCCAGTCTGTGGTATTTTGTTATGGCAGACCTAGCAGACTGATATGCTCC
TTAAGGCAAGA"

---------- Post updated at 03:30 AM ---------- Previous update was at 03:01 AM ----------

cragun sir, last entry yes i want to generate 5 set of output correspond to each gi|546669842|gb|AWWX01450698.1| entry. though it is occurring multiple times but positions are different so there should be 5 lines of result in the result file correspond to gi|546669842|gb|AWWX01450698.1| entry.

and last entry is there in file1 see entry no 4.

---------- Post updated at 04:27 AM ---------- Previous update was at 03:30 AM ----------

even now any problem exists sir?

---------- Post updated 08-07-15 at 12:23 AM ---------- Previous update was 08-06-15 at 04:27 AM ----------

hello, i am waiting for your answer sir. Smilie

harpreetmanku04

View Public Profile for harpreetmanku04

Find all posts by harpreetmanku04

10 More Discussions You Might Find Interesting

1. Programming

selecting rows with specific IDs for downstream analysis

Hi, I'm working hard on SQL and I came across a hurdle I'm hoping you can help me out with. I have two tables table1 headers: chrom start end name score strand 11 9720685 9720721 U0 0 + 21 9721043 9721079 U0 0 - 1 9721093 9721129 U0 0 + 20 ...

2. Shell Programming and Scripting

awk: union regions

Hi all, I have difficulty to solve the followign problem. mydata: StartPoint EndPoint 22 55 2222 2230 33 66 44 58 222 240 11 25 22 60 33 45 The union of above...

3. UNIX for Dummies Questions & Answers

fast sequence extraction

Hi everyone, I have a large text file containing DNA sequences in fasta format as follows: >someseq GAACTTGAGATCCGGGGAGCAGTGGATCTC CACCAGCGGCCAGAACTGGTGCACCTCCAG GCCAGCCTCGTCCTGCGTGTC >another seq GGCATTTTTGTGTAATTTTTGGCTGGATGAGGT GACATTTTCATTACTACCATTTTGGAGTACA >seq3450...

4. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: awk...

5. Shell Programming and Scripting

find common entries and match the number with long sequence and cut that sequence in output

Hi all, I have a file like this ID 3BP5L_HUMAN Reviewed; 393 AA. AC Q7L8J4; Q96FI5; Q9BQH8; Q9C0E3; DT 05-FEB-2008, integrated into UniProtKB/Swiss-Prot. DT 05-JUL-2004, sequence version 1. DT 05-SEP-2012, entry version 71. FT COILED 59 140 ...

6. Shell Programming and Scripting

FILE_ID extraction from file name and save it in CSV file after looping through each folders

FILE_ID extraction from file name and save it in CSV file after looping through each folders My files are located in UNIX Server, i want to extract file_id and file_name from each file .and save it in a CSV file. How do I do that? I have folders in unix environment, directory structure is...

7. Shell Programming and Scripting

Obtain the names of the flanking regions

Hi I have 2 files; usually the end position in the file1 is the start position in the file2 and the end position in file2 will be the start position in file1 (flanks) file1 Id start end aaa1 0 3000070 aaa1 3095270 3095341 aaa1 3100822 3100894 aaa1 ...

8. IP Networking

Newbie BIND DNS question: resolving upstream hosts?

Old skool UNIX and Linux geek here, but newbie to the world of DNS and bind. I've recently been tasked with replacing our DNS infrastructure, currently on Windows, with a RHEL based solution. And I assume that means using bind, which I've not used before. Here's my question: Suppose our company...

9. Shell Programming and Scripting

Parsing and masking regions from a single fasta file with subsequence

HI, I have a Complete genome fasta file and I have list of sub sequence regions in the format as : 4353..5633 6795..9354 1034..14456 I want a script which can mask these region in a single complete genome fasta file with the alphabet N kindly help

10. Shell Programming and Scripting

Sequence extraction

i want to extract specific region of interest from big file. i have only start position, end position and seq id, see my query is: I have file1 is this >GL3482.1 GAACTTGAGATCCGGGGA GCAGTGGATCTCCACCAG CGGCCAGAACTGGTGCAC CTCCAGGCCAGCCTCGTC CTGCGTGTC >GL3550.1...

LEARN ABOUT DEBIAN

combine

COMBINE(1)																COMBINE(1)

NAME

       combine - combine sets of lines from two files using boolean operations

SYNOPSIS

       combine file1 and file2

       combine file1 not file2

       combine file1 or file2

       combine file1 xor file2

       _ file1 and file2 _

       _ file1 not file2 _

       _ file1 or file2 _

       _ file1 xor file2 _

DESCRIPTION

       combine combines the lines in two files. Depending on the boolean operation specified, the contents will be combined in different ways:

       and Outputs lines that are in file1 if they are also present in file2.

       not Outputs lines that are in file1 but not in file2.

       or  Outputs lines that are in file1 or file2.

       xor Outputs lines that are in either file1 or file2, but not in both files.

       "-" can be specified for either file to read stdin for that file.

       The input files need not be sorted, and the lines are output in the order they occur in file1 (followed by the order they occur in file2
       for the two "or" operations). Bear in mind that this means that the operations are not commutative; "a and b" will not necessarily be the
       same as "b and a". To obtain commutative behavior sort and uniq the result.

       Note that this program can be installed as "_" to allow for the syntactic sugar shown in the latter half of the synopsis (similar to the
       test/[ command). It is not currently installed as "_" by default, but you can alias it to that if you like.

SEE ALSO

       join(1)

AUTHOR

       Copyright 2006 by Joey Hess <joey@kitenet.net>

       Licensed under the GNU GPL.

moreutils							    2012-04-09								COMBINE(1)

10 More Discussions You Might Find Interesting

1. Programming

selecting rows with specific IDs for downstream analysis

Discussion started by: labrazil

2. Shell Programming and Scripting

awk: union regions

Discussion started by: phoeberunner

3. UNIX for Dummies Questions & Answers

fast sequence extraction

Discussion started by: Fahmida

4. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Discussion started by: pathunkathunk