Hi,
I think this is the toughest prob
I have ever come across and I thankfully owe all of u for helping me cross this.
cat 1.txt
Quote:
chr1 100 200
chr1 300 400
chr1 350 467
chr1 450 700
chr2 500 600
chr2 345 765
chr3 101 300
chr3 132 456
cat 2.txt
Quote:
chr1 156 199
chr1 165 230
chr1 201 299
chr1 525 600
chr2 800 1000
chr2 534 676
chr2 200 400
chr2 100 200
chr3 200 400
chr3 500 600
chr3 400 700
K now. This is what I am looking for.
Output.txt
Quote:
chr1 300 400 1.txt
chr1 350 467 1.txt
chr1 201 299 2.txt
chr2 800 1000 2.txt
chr2 100 200 2.txt
chr3 500 600 2.txt
Here is how my output has been generated.
First, the column one of each file has to be matched to column one of other files, like chr1 to chr1, chr2 to chr2 and chr3 to chr3 only. No different column values has to be matched.
Second, if a particular range of column 2 and 3 intersects/comes in between the range of column 2 and 3 of the other file, they have to be eliminated.
Examples from the given input:
chr1 100 200(1.txt) intersects with chr1 156 199(2.txt), chr1 165 230(2.txt). So, they are eliminated.
chr1 450 700(1.txt) intersects with chr1 525 600(2.txt). So, these two are eliminated from the output.
Similarly,
chr2 500 600(1.txt) intersects with chr2 534 676(2.txt). So, it is eliminated.
chr2 345 765(1.txt) intersects with chr2 200 400(2.txt). So, it is eliminated from the output file.
Same is the case for chr3 too. My files have different number of records in each of them which are not sorted. The last column in the output file indicates the file from which the record originates. If you have any questions or suggestions, please write in the reply and I shall reply ASAP to clarify your doubt that might give me a chance to kick this problem out. All your time, patience and attention are highly appreciated.
Thanks in advance.