I have an input file (more than 20K records) as following. The information I'm interested to manipulate are at column 10, 11 and 13.
Column 13: It's item name, item name may appear more than once in the table.
Column 10: A string of "start position" seperated by comma.
Column 11: A string of "end position" seperated by comma.
I'm would like to find overlapping regions for each item.
Output:
Code:
Item Start End
B1 90098643,90152028,90178260 90098890,90152170,90185093
B2 76540388,76779489,76877692 76540569,76779684,76878102
By literally, overlapping regions of B1 are (90098643-90098890,90152028-90152170,90178260-90185093)
The following is my script,
Code:
{a[$13]++
start[$13]=start[$13] "" $10
end[$13]=end[$13] "" $11
}
END {
for (i in a){
split(start[i],split_start,",")
split(end[i],split_end,",")
mylen=length(split_start)
for (k=1;k<mylen;k++){
if (split_start[k]"+"split_end[k] in mypair) continue
else {
mypair[split_start[k]"+"split_end[k]]++
if (k==1) mystring=split_start[k]"+"split_end[k]
else mystring=mystring","split_start[k]"+"split_end[k]
}
}
split(mystring,mylist,",")
asort(mylist)
count=length(mylist)
#---------finding overlapping regions
ind=0
for (z=1;z<=count;z++){
split(mylist[z],item,"+")
if (z==1){
ind+=1
unionlist[ind]=mylist[z]
}else{
split(unionlist[ind],old,"+")
aa=old[1]
bb=old[2]
cc=item[1]
dd=item[2]
if (cc>bb){
ind+=1
unionlist[ind]=cc"+"dd
}else if (cc>=aa && cc<bb){
if (dd>bb){ unionlist[ind]=aa"+"dd}
else {unionlist[ind]=aa"+"bb}
}
}
}
for (j=1;j<=ind;j++) mystring3=mystring3","unionlist[j]
print i,length(unionlist),mystring3
delete mypair
delete unionlist
delete mylist
mystring3=""
}}
From the above script, you would see I store string of regions in this format, (90098643+90098890,90152028+90152170,90178260+90185093).
I want to sort them in ascending order so that it will ease the finding of overlapping region. The function asort() is not appropriate in the following case,
Code:
33+54
11+34
22+33
222+456
The output sorted region would be,
Code:
11+34
22+33
222+456
33+54
The order is incorrect as 222+456 should have positioned at last.
I'm sure that the part finding overlapping region is correct, I tested it with another programming language. Now the only problem I have is the sorting part.
Does anyone could suggest me to sort a 2 dimentional array?
Thanks,
phoebe
Last edited by Franklin52; 08-09-2010 at 04:06 PM..
Reason: Please use code tags
---------- Post updated at 03:04 PM ---------- Previous update was at 03:02 PM ----------
Quote:
Originally Posted by frans
a simple
Code:
sort -n file
does the job too
No, it does not. It may give you the correct result with this particular dataset and your particular sort implementation, but it is definitely not a correct solution.
Hi, I'm developing a script which contains a multi dimensional array, however for some reason the array is not iterating.
When executing the script, services are listed as arguments from argument 2. Ex voice data sms.
service=${@:2};
for services in $service
do
... (2 Replies)
I have an array of names. Each one of the name, has a number represented to it.
For example A has an ID 8, B has an ID 2.
What I am after is a for loop that when the array is in position 1, a particular variable is set to the value of position 1 in array 2
declare -a arr=("A" "B" "C"... (6 Replies)
I cant get out of this while loop at the beginning of my program. Just reading from stdin one char at a time and storing it into a multi-array. Need to fix it with in two hours.
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include... (1 Reply)
Hello,
I have two files in the following format;
file1:
A B C D
E F G H
I J K L
file2:
1 2 3 4
5 6 7 8
9 10 11 12
I have read them both in to multi-dimensional arrays. I need a file that has column 2 of the first file printed out for each column 3 of the second file ie...
... (3 Replies)
Hi All,
I'm writing a nagios check that will see if our ldap servers are in sync...
I got the status data into a nested array, I would like to search key of each array and if "OK" is NOT present, echo other key=>values in the current array to a variable
so...eg...let take the single array... (1 Reply)
Hi all
I have a file that i'm running and exec(cat ./dat) against..and putting its contents into any array, then doing an exploding the array into a multi-dimension array...
The 15 multi-dimensional arrays have elements that are null/empty, I would like to remove/unset these elements and then... (2 Replies)
Hi.
I am reasonably new to awk, but have done quite a lot of unix scripting in the past. I have resolved the issues below with unix scripting but it runs like a dog. Moved to awk for speed and functionality but running up a big learning curve in a hurry, so hope there is some help here.
I... (6 Replies)
So, I'm fooling around with multi demtional arrays, and I made this in a short amount of time:
#include <stdio.h>
main(int argc, char *argv) {
char blah = {
{'a', 'b'},
{'b', 'a'}
};
int i = 0;
while (i < 2) {
if (argv == blah)
printf("%c\n", blah);
i++;
}
}
The goal... (3 Replies)
Hello -
I've serached the web but can't find much on array script variables (except that C-shell variables are arrays!)
I'm trying to form a 2-D string array: (this is what I want, but in java)
String list = { {"one", "two"}, {"three"} };
I know this is a 1-D string array shell... (4 Replies)