Interpolation if there is no exact match for value


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Interpolation if there is no exact match for value
# 1  
Old 05-09-2014
Interpolation if there is no exact match for value

Dear all, could you help me with following question. There are two datasets (below). I need to find match between BP values from data1 and data2, and add corresponding CM value from data2 into data1. if there is not exact match, the corresponding CM value should be calculated using interpolation. More detailed i put steps below (just if i were a code writer i would take those steps, unfortunately i am not).

(data1)
Code:
BP        RS 
752566    rs3094315
752721    rs3131972
753541    rs2073813
760300    rs11564776
768448    rs12562034
776546    rs12124819

(data2)
Code:
BP       CM 
55550 0.000000
82571 0.080572
88169 0.092229
254996 0.439456

1) take BP value from data1, find exact match BP in data2.
if exact match is found, then embed corresponding CM value from data2 into data1 (as a third column).

2) if there is no exact match for BP value from data1 in data2, then:

a) find two nearest BP values from data2 (i.e.: BP value data1>BP value data2 and BP value data1< BP value data2),
b) then go to CM values that correspond to those two nearest BP values (data2) and using linear interpolation calculate CM value that will correspond to the BP number from data1

and
c) embed this interpolated/calculated CM value into data1 (as a third column).

I will appreciate ur suggestions!
Thank u a lot in advance!
# 2  
Old 05-09-2014
Would this help?:-
Code:
#!/bin/ksh

interpolate()
{
   # Some function yet to be written
   echo 0
}

while read BP RS
do
   grep "^$BP " data2 | read a CM
   if [ "$CM" = "" ]
   then
      interpolate $BP | read CM
   fi
   echo "$BP $RS $CM"
done < data1 > data1.tmp

mv data1.tmp data1

As to the interpolation, I'm a bit stuck. Are both files sorted numerically according to the first column before we start? We might be able to work on that.

Will there be a BP in data2 for every BP in data1?




Robin
# 3  
Old 05-09-2014
hi, Robin,
1) ''Are both files sorted numerically according to the first column before we start?''
BP in both datasets are in an ascending order (from smaller to higher numbers).
2) ''Will there be a BP in data2 for every BP in data1?'':
No. Some BP from data1 have exact match in data2. So in this case it is easier (I just copy corresponding CM into data1).
But there are BP values in data1 that do not have exact matches in data2. In this case, the common practice (as I was told) is to use interpolation to calculate CM value (using two nearest CM values that correspond to two BP values from data2).

e.g.: we have BP value from data1 752721 , and it has no exact match in data2.
But in data2 there are two nearest BP values (and corresponding them CM values):

Code:
BP          CM
752566    2,012958
753269    2,013806

So, I can use those two nearest BP values (and CM) and linear interpolation formula to calculate CM value that will correspond to BP (752721) from data1. And i should get 2,013145. And paste this value into data1 in same row as 752721.


So, as I understand, this calculation should be done for all BP values from data1 that do not have exact match in data2.

Hope, my details do not increase mess. Thank you a lot for ur help!
# 4  
Old 05-09-2014
Let us know expected output for given input, show mathematically, for given sample I think you can't generate even a single BP value using linear interpolation , we can't see even a single upper bp value in data2 for any one of bp values in data1

Code:
$ sh intp.sh 
254996 <-- lower nearest 752566 upper nearest --> No value in data2
254996 <-- lower nearest 752721 upper nearest --> No value in data2
254996 <-- lower nearest 753541 upper nearest --> No value in data2
254996 <-- lower nearest 760300 upper nearest --> No value in data2
254996 <-- lower nearest 768448 upper nearest --> No value in data2
254996 <-- lower nearest 776546 upper nearest --> No value in data2

# 5  
Old 05-09-2014
Is this homework @kush ?

And please state the relationship with your other thread:
Find match between two datasetsand add value
# 6  
Old 05-09-2014
yes, sorry, i gave only 'heads' of real input as an example, because both datasets are very long. E.g.
data1 has 61951 lines
data2 has 256895 lines.

If to extract some arbitrary lines, then it may look like:
data1
Code:
BP         CM
752566 rs3094315
752721 rs3131972
753541 rs2073813
760300 rs11564776


data2
Code:
BP         CM
740857 1.984065
750235 2.009238
752566 2.012958
753269 2.013806
753541 2.014133
754745 2.014728
765948 2.019205
767038 2.019743
767070 2.019759
768448 2.020467
769551 2.020969
771521 2.021869

---------- Post updated at 04:52 PM ---------- Previous update was at 04:48 PM ----------

to Scrutinizer:
no, this is my job, i m biologist, and have to create a reference genetic map to run set of analyses with my snp data.
my previous posts are also related to my work. main problem, that i am biologist (genetics), and I'm only learning to understand codes (can't yet even say - to write).
I appreciate your help, if I have done sth wrong, please, let me know. it is not intentionally.
# 7  
Old 05-09-2014
Okay, let's try this:-
Code:
#!/bin/ksh

interpolate()
{
 BP_flag=0
 while read BPT CMT
 do
    if [ $BPT -lt $BP ]
    then
       BP1=$BPT
       CM1=$CMT
    elif [ $BP_flag -eq 0 ]
    then
       BP2=$BPT
       CM2=$CMT
       BP_flag=1
    fi
 done < data2

 # Linear interpolation is:-
 #
 # CM - CM1   CM2 - CM1
 # ~~~~~~~~ = ~~~~~~~~~
 # BP - BP1   BP2 - BP1

 CM=`echo $CM1+(($CM2-$CM1)*($BP-$BP1)/($BP2-$BP1)) | bc`
 echo $CM
}

while read BP RS
do
   grep "^$BP " data2 | read a CM
   if [ "$CM" = "" ]
   then
      interpolate $BP | read CM
   fi
   echo "$BP $RS $CM"
done < data1 > data1.tmp

mv data1.tmp data1


How does that do? It doesn't work very well on the sample input (as already mentioned by others. How does it work with the real thing? It may well be a bit slow, but it's explicit to follow the logic. An awk may be able to do things faster if you can code one.



Robin
This User Gave Thanks to rbatte1 For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. Shell Programming and Scripting

Replacing exact match

Hi All, My Input file contains a 1000’s of lines in which I have to replace a a string to the other. Here the problem is, I have the lines in my Input as below. Cable Yes && !Pay TV && !ADS \noUE \Label="Cable Yes && !Pay TV && !ADS" I want to replace exactly the string Cable Yes &&... (37 Replies)
Discussion started by: am24
37 Replies

3. UNIX for Dummies Questions & Answers

Exact match question

Hi guys, I am using Centos 6.3. Actually I posted similar question but I still have some minor problem need be fixed. I have two files, file1:target: gi|57529786|ref|NM_001006513.1| mfe: -31.4 kcal/mol p-value: 0.006985 target: gi|403048743|ref|NM_001271159.1| mfe: -29.6 kcal/mol p-value:... (11 Replies)
Discussion started by: yuejian
11 Replies

4. Shell Programming and Scripting

Exact match using sed

I would like replace all the rows in a file if a row has an exact match to number say 21 in a tab delimited file. I want to delete the row only if it has 21 any of the rows but it should not delecte the row that has 542178 or 563421. I tried this sed '/\<21\>/d' ./inputfile > output.txt ... (7 Replies)
Discussion started by: Kanja
7 Replies

5. Shell Programming and Scripting

Match exact and append zero

file 11 2 12 6 13 7 114 6 011 7 if I'm searching for 11, output needed is output: 11 2 011 7 Code: awk '$1 ~ /^11$/' file I used the above to match exact, but it avoiding "011 7" line too, how to resolve this? (6 Replies)
Discussion started by: Roozo
6 Replies

6. Shell Programming and Scripting

Exact match and #

Hi friends, i am using the following grep command for exact word match: >echo "sachin#tendulkar" | grep -iw "sachin" output: sachin#tendulkar as we can see in the above example that its throwinng the exact match(which is not the case as the keyword is sachin and string is... (6 Replies)
Discussion started by: neelmani
6 Replies

7. Solaris

grep exact match

Hi This time I'm trying to grep for an exact match e.g cat.dog.horse.cow.bird.pig horse.dog.pig pig.cat.horse.dog horse dog dog pig.dog pig.dog.bird how do I grep for dog only so that a wc -l would result 2 in above case. Thanks in advance ---------- Post updated at 06:33 AM... (4 Replies)
Discussion started by: rob171171
4 Replies

8. Shell Programming and Scripting

Exact match question

Hi, I have a file like follows . . . White.Jack.is.going.home Black.Jack.is.going.home Red.Jack.is.going.home Jack.is.going.home . . . when I make: cat <file> | grep -w "Jack.is.going.home" it gives: White.Jack.is.going.home Black.Jack.is.going.home Red.Jack.is.going.home... (4 Replies)
Discussion started by: salih81
4 Replies

9. Shell Programming and Scripting

exact string match ; search and print match

I am trying to match a pattern exactly in a shell script. I have tried two methods awk '/\<mpath${CURR_MP}\>/{print $1 $2}' multipath perl -ne '/\bmpath${CURR_MP}\b/ and print' /var/tmp/multipath Both these methods require that I use the escape character. I am guessing that is why... (8 Replies)
Discussion started by: bash_in_my_head
8 Replies

10. Shell Programming and Scripting

perl exact match

How to emulate grep -o option in perl. I mean to print not all line, only the exact match. echo "2A2 BB" | perl -ne 'print if /2A2/' 2A2 BB I want to print only 2A2. (2 Replies)
Discussion started by: mirusnet
2 Replies
Login or Register to Ask a Question