Hi
radoulov,
I will try to explain my question with two examples. Sorry if its a lengthy read, I`m sure the answer will take you much less time than reading the question. This is a continuation of the substring code that you helped me with earlier. I have attached file1 and file 2 samples for testing.
File2$6 can have 'M' and 'S' along with other alphebets and numbers.There can be no S, or max 2 S. There has to be at least 1 'M' and at most two 'M's. As a rule , we ignore the S and take the M.
S will only be present at the beginning and/or ending of $6 and not the middle.
example 23S4M9S, 1S34M1S, 34M2S are valid but 23M1S12M is invalid.
Lets take an example of file1$4=SNPSTER1_0001:7:60:876:131#0/1
So if file2$6 is 20M769N15M1S for $10 string ATAGCCAATATCCCCAACAGGTTGAGGGAACTGTTT
,we divide it into 4 segments.
s1=0=first 0 characters to ignore since there is no leading S
s2=1=last 1 character to ignore = T
m1=20=first 20 characters after s1,ATAGCCAATATCCCCAACAG
m2=15=last 15 characters before s2, GTTGAGGGAACTGTT
So we have 2 strings(m1 and m2), and substring is to be extracted from one of them based on the following condition.
if (file2$4+s1+m1) > file1$9
choose string m1 for substring operation
else
choose string m2 for substring opeartion
Substring operation
when file2$1=file1$4,
print file1$0 , the substring of m1 or m2 with parameters file1$8 - file1$2 + 1, file1$9 - file1$8
In this case file1$2=15735490
file1$8=15735496
file1$9=15735497
file2$4=15734702
(file2$4+s1+m1)=15734722 is less than file1$9=15735497
so we choose m2=GTTGAGGGAACTGTT for substring operation.
answer = substring (GTTGAGGGAACTGTT,7,1) = G
##########################################################
Another example
file1$4=file2$1=SNPSTER1_0001:7:115:1082:672#0/1
file2$10 = ATCTTGGGCCGCGAGCATCTTCAACCGCAAAATTTG
file 2$6=1S24M186N11M
s1=1 ignore first character 'A'
s2=0
m1=24 , TCTTGGGCCGCGAGCATCTTCAAC
m2=11, CGCAAAATTTG
In this case file1$2=4044310
file1$8=4044316
file1$9=4044317
file2$4=4044311
Here (file2$4+s1+m1)=4044336 > file1$9=4044317
So we choose m1=TCTTGGGCCGCGAGCATCTTCAA for subtring operation
answer = substring (TCTTGGGCCGCGAGCATCTTCAA ,7,1) = G
############################################################
Thanks,
Alpesh