Obtain the names of the flanking regions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Obtain the names of the flanking regions
# 1  
Old 05-21-2013
Obtain the names of the flanking regions

Hi I have 2 files; usually the end position in the file1 is the start position in the file2 and the end position in file2 will be the start position in file1 (flanks)
Code:
file1
Id        start         end
aaa1    0           3000070
aaa1    3095270    3095341 
aaa1    3100822    3100894
aaa1    3167949    3168020
aaa1    3205652    3205723
aaa1    3684683    3684752

Code:
file2  
Id     start     end     name
aaa1  3000070  3095270  bbc
aaa1  3095341  3100822  rbc
aaa1  3100894  3137949  srh
aaa1  3137949  3167949  ytf

I want ot get some thing like this
Code:
output
Id     start      end    name1    name2
aaa1    3095270    3095341  bbc    rbc
aaa1    3100822    3100894  rbc    srh
----------------------------------

# 2  
Old 05-23-2013
Read file1 into an indexed array pair, then read file2 into variables and walk through the array pair looking for the relationships, and when you find a hit, echo out the desired variables. Maybe I have that backwards, but that is the flavor.

If there is a size problem, sorting the files if not always sorted and some sort of merge would be faster, but that is a bit hard in shell. Also, you may need multiple lines of reference to solve a line of query, so you are back keeping at least a local array, or seeking back before the next query line.

Sticking the data in an RDBMS, or using a JDBC or similar text file as table SQL tool makes it a trivial SQL query.
Code:
Select a.Id, a.start. a.end, b1.name, b2.name
 from
  file1 a
   join file2 b1
    on a.start between b1.start and b1.end
   join file2 b2
    on a.end between b2.start and b2.end
 order by 1,2,3

Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Find flanking positions

I have a positions file with markers in col1 and position defined by chromosome and location in col2 and col3 m1 ch1 1 m2 ch1 5 m3 ch1 50 m4 ch2 567 m5 ch2 4567 m6 ch2 7766 m7 ch2 554433 m8 ch3 76 m9 ch3 456 m10 ch3 2315 Given a set of query marker, I would like to know what are the... (1 Reply)
Discussion started by: jianp83
1 Replies

2. Shell Programming and Scripting

Extract Big and continuous regions

Hi all, I have a file like this I want to extract only those regions which are big and continous chr1 3280000 3440000 chr1 3440000 3920000 chr1 3600000 3920000 # region coming within the 3440000 3920000. so i don't want it to be printed in output chr1 3920000 4800000 chr1 ... (2 Replies)
Discussion started by: amrutha_sastry
2 Replies

3. Shell Programming and Scripting

Extraction of upstream and downstream regions from long sequence file

Hello, here I am posting my query again with modified data input files. see my query is : i have two input files file1 and file2. file1 is smalldata.fasta >gi|546671471|gb|AWWX01449637.1| Bubalus bubalis breed Mediterranean WGS:AWWX01:contig449636, whole genome shotgun sequence... (20 Replies)
Discussion started by: harpreetmanku04
20 Replies

4. Shell Programming and Scripting

Parsing and masking regions from a single fasta file with subsequence

HI, I have a Complete genome fasta file and I have list of sub sequence regions in the format as : 4353..5633 6795..9354 1034..14456 I want a script which can mask these region in a single complete genome fasta file with the alphabet N kindly help (2 Replies)
Discussion started by: margarita
2 Replies

5. Shell Programming and Scripting

Assigning the names from overlapping regions

I have 2 files; file 1 having smaller positions that overlap with the positions with positions in file2. file1 aaa 20 22 apple aaa 18 25 banana aaa 12 30 grapes aaa 22 25 melon file2 aaa 18 26 cdded aaa 10 35 abcde I want to get something like this output aaa 18 26 cdded banana... (4 Replies)
Discussion started by: anurupa777
4 Replies

6. Forum Support Area for Unregistered Users & Account Problems

Trouble Registering? Countries or Regions Abusing Forums

The forums have been seeing a sharp increase in spam bots, forum robots, and malicious registrations from certain countries. If you have been directed to this thread due to a "No Permission Error" when trying to register please post in this thread and request permission to register, including... (1 Reply)
Discussion started by: Neo
1 Replies

7. UNIX for Dummies Questions & Answers

extract regions of file based on start and end position

Hi, I have a file1 of many long sequences, each preceded by a unique header line. file2 is 3-columns list: headers name, start position, end position. I'd like to extract the sequence region of file1 specified in file2. Based on a post elsewhere, I found the code: awk... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

8. Shell Programming and Scripting

awk: union regions

Hi all, I have difficulty to solve the followign problem. mydata: StartPoint EndPoint 22 55 2222 2230 33 66 44 58 222 240 11 25 22 60 33 45 The union of above... (2 Replies)
Discussion started by: phoeberunner
2 Replies

9. UNIX for Dummies Questions & Answers

Where to obtain FreeBSD?

Anyone help ! From where can I download a free version of FreeBSD ? I am trying to teach myself this OS, have all the documentation needed, but am short the OS itself. If anyone can send me a link, I would be most appreciative ! (3 Replies)
Discussion started by: treborwallace
3 Replies
Login or Register to Ask a Question