overlapped genomic coordinates


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers overlapped genomic coordinates
# 1  
Old 10-26-2012
overlapped genomic coordinates

Hi,

I would like to know how can I get the ID of a feature if its genomic coordinates overlap the coordinates of another file. Example:

Get the 4th column (ID) of this file1:
Code:
chr1	10	100	gene1
chr2	3000	5000	gene2
chr3	200	1500	gene3

if it overlaps with a feature in this file2:
Code:
chr2 3001 3330
chr4 10 100

Desired output file:
Code:
chr2 3001 3330 gene2


Thanks in advance
# 2  
Old 10-26-2012
So, we have a = key, a min key, a max key and a payload field. Sorting helps. Unix tools are not so good at merging not=equal. Shell solution tend toward putting one file into an array and then using it to filter the other, which does not scale well. If the first field is reasonably selective, an 'join' of two sorted files gives you the cartesian product lines, which you can read in shell from a pipe and decide if they are a hit. Man Page for join (linux Section 1) - The UNIX and Linux Forums
Code:
sort -o file1 file1
sort -o file2 fiel2
join file1 file2 | whille read f1 f1a f1b f1c f2b f3b f4b
do
 if (( ( $f2a <= $f2b
     && $f3a >= $f2b )
    || ( $f2a <= $f3b
     && $f3a >= $f3b ) ))
 then
  echo $f1 $f2b $f3b $f4b
 fi
done

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Help with processing coordinates in a file.

I have a variation table (variation.txt) which is a very big file. The first column in the chromosome number and the second column is the position of the variation. I have a second file annotation.txt which has a list of 37,000 genes (1st column), their chromosome number(2nd column), their start... (1 Reply)
Discussion started by: Sanchari
1 Replies

2. UNIX for Dummies Questions & Answers

Mean score value by ID over a defined genomic region

Hi, I would like to know how can I get a mean score value by ID over a defined genomic region. Here it is an example: file1 12 100 103 id1 12 110 112 id1 12 200 203 id2 file2 12 100 101 1 12 101 102 0.8 12 102 103 0.7 12 110 111 2.5 12 111 112 2.8 12 200 201 10.1 12 201 202... (7 Replies)
Discussion started by: fadista
7 Replies

3. Programming

Merge two strings by overlapped region

Hello, I am trying to concatenate two strings by merging the overlapped region. E.g. Seq1=ACGTGCCC Seq2=CCCCCGTGTGTGT Seq_merged=ACGTGCCCCCGTGTGTGTFunction strcat(char *dest, char *src) appends the src string to the dest string, ignoring the overlapped parts (prefix of src and suffix of dest).... (30 Replies)
Discussion started by: yifangt
30 Replies

4. UNIX for Dummies Questions & Answers

Length of a segment based on coordinates

Hi, I would like to have the length of a segment based on coordinates of its parts. Example input file: chr11 genes_good3.gtf aggregate_gene 1 100 gene1 chr11 genes_good3.gtf exonic_part 1 60 chr11 genes_good3.gtf exonic_part 70 100 chr11 genes_good3.gtf aggregate_gene 200 1000 gene2... (2 Replies)
Discussion started by: fadista
2 Replies

5. UNIX for Dummies Questions & Answers

Genomic data processing

Dear fellow members, I've just joined the forum and am a newbie to shell scripting and programming. I'm stuck on the following problem. I'm working with large scale genomic data and need to do some analyses on it. Essentially it is text processing problem, so please don't mind the scientific... (0 Replies)
Discussion started by: mvaishnav
0 Replies

6. Shell Programming and Scripting

Differential substring removal using coordinates

Hello all, this might be better suited for a bioinformatics forum, but I thought I'd try my luck here as well. I have several tabular text files of DNA sequence reads that appear as such: File_1.txt >H01BA45XW GATTACAGATTCGACATCCAACTGAGGCATT >H02BG78WR CCTTACAGACTGGGCATGAATATTGCATACC... (3 Replies)
Discussion started by: vectorborne5
3 Replies

7. Shell Programming and Scripting

Determination n points between two coordinates

Hi guys. Can anyone tell me how to determine points between two coardinates. For example: Which type of command line gives me 50 points between (8, -5, 7) and (2, 6, 9) points Thanks (5 Replies)
Discussion started by: rpf
5 Replies

8. Shell Programming and Scripting

place cursor in specific coordinates

Hi, I have this problem on how to place the cursor in a text editor (for example: pico). I made this script that would attach comments to a script file then open the script file, I would like to know how to place the cursor in a specific place, for example at the end of the comments, ... (1 Reply)
Discussion started by: lechelle
1 Replies

9. Shell Programming and Scripting

Search for particular tag and arrange as coordinates

Hi I have a file whose sample contents are shown here, 1.2.3.4->2.4.2.4 a(10) b(20) c(30) 1.2.3.4->2.9.2.4 a(10) c(20) 2.3.4.3->3.6.3.2 b(40) d(50) c(20) 2.3.4.3->3.9.0.2 a(40) e(50) c(20) 1.2.3.4->3.4.2.4 a(10) c(30) 6.2.3.4->2.4.2.5 c(10) . . . . Here I need to search... (5 Replies)
Discussion started by: AKD
5 Replies

10. Shell Programming and Scripting

Defining X and Y Coordinates Inside A Window

Hello, I am starting up an Xnest window and trying to place a program inside of it. I have the window inside of it now but it always spawns with the top left corner at (0, 0). I need to find a way to set the x and y coordinates to something other than (0, 0). I tried using the -geometry option... (1 Reply)
Discussion started by: lesnaubr
1 Replies
Login or Register to Ask a Question