The below perl executes and does classify each of the 3 lines in file.txt. Lines 2 and 3 are correct as they fit the criteria for Rule 2.
The problem is that line one should be classified VUS as it does not meet the criteria for Rule 1, so Rule 3 is used.
However, currently Rule 2 is changing the classification to Likely Benign, if I comment that Rule out I get the expected result. I am not sure why that rule is even executed on that line as the first criteria is $FuncIDPrefGene !~ "exonic" --- if field is not exonic, but in line one that field is.
I have included comments in the code, but each rule is designed to follow a specific set of criteria. I have tried changing the order but the result is the same. Thank you
#!/usr/bin/perl
use strict;
use warnings;
my $header = scalar <>;
while (<>)
{
my @f = split /\t/;
my ( $FuncIDPrefGene,
$AAChangeIDPrefGene,
$PopFreqMax,
$GeneDetailIDPrefGene,
$ClinSig,
$Score ) = @f[6,11,13,8,46,54];
print "\$FuncIDPrefGene = $FuncIDPrefGene and you're trying to abs($Score)\n";
}
Using it with the example you posted it outputs:
Code:
perl showme.pl file.txt
Code:
$FuncIDPrefGene = exonic and you're trying to abs(12)
$FuncIDPrefGene = splicing and you're trying to abs(2)
$FuncIDPrefGene = intronic and you're trying to abs(.)
You have also, precedent issues with the _and_. I suggest you make use of if/else.
I am not sure I follow completely. Is the logic not right. Thank you .
abs() is a function for numeric values, a dot is not numeric, turning the pragma warnings, would had shown you that at some point.
If the code is not producing the desired result but it runs, then the logic must not be correct.
This appears to be the flow you are following but it is flawed because of your use of abs() regardless if it has a numeric value or not. It is not possible for me to find out what's the meaning of $f[54], if it does not contain a numeric value.
Code:
#!/usr/bin/perl
use strict;
use warnings;
print scalar <>;
while (<>)
{
my @f = split /\t/;
my ( $FuncIDPrefGene,
$AAChangeIDPrefGene,
$PopFreqMax,
$GeneDetailIDPrefGene,
$ClinSig,
$Score ) = @f[6,11,13,8,46,54];
if (abs($Score) < 5) {
if($FuncIDPrefGene eq 'exonic') {
pj(\@f,'Likely Benign');
}
else {
my $scored = $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;
pj(\@f, 'Likely Benign') if $scored < 10;
}
}
else {
pj(\@f, 'VUS');
}
}
sub pj
{
my $fr = shift;
$fr->[55] = shift;
print join "\t", @{$fr};
}
If f[54] has a . in it, the value associated with it is zero. In order to prevent column shifting due to null values I use a . in these fields.
So, I think I follow but just to make sure the abs($Score) is only used if f[54] is not a .? Is that right? Also, could you please comment the code so I may try to learn from more from it, if possible. Thank you very much .
Code:
#!/usr/bin/perl # call perl
use strict; # use exactdefined criteria
use warnings; # display warning messages
print scalar <>; # skip header line
while (<>) # start conditional checks
{
my @f = split /\t/; # split on tabs
my ( $FuncIDPrefGene, # field 1
$AAChangeIDPrefGene, # field 2
$PopFreqMax, # field 3
$GeneDetailIDPrefGene, # field 4
$ClinSig, # field 5
$Score ) = @f[6,11,13,8,46,54]; # field 6 and define field locations using 0 coordinate
if (abs($Score) < 5) { # check field 6 for value and ensure its less than 5
if($FuncIDPrefGene eq 'exonic') { # check field 1 and if exonic and conditon above met
pj(\@f,'Likely Benign'); # set field 55 to Likely Benign
} # end condition 1 block
else {
my $scored = $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/; # use field 4 and split on the . and +/1 and read value into variable
pj(\@f, 'Likely Benign') if $scored > 10; # if variable greater then 10 then field 55 is Likely Benign
} # end condition 2 block
}
else {
pj(\@f, 'VUS'); # if niether condition is meet set field 55 to VUS
}
} # end while block
sub pj # define subroutine
{ # start sub block
my $fr = shift; # define variable
$fr->[55] = shift; # use field 55 as variable
print join "\t", @{$fr}; # print value in field
} # end sub block
Last edited by cmccabe; 04-09-2017 at 11:18 AM..
Reason: added comment question
# Rule 1. Set classification to Likely benign based on score less than 5 for exons
What would you like to happen if it is an exon but it is more than 5?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Would these be disregarded, otherwise?
# Rule 2. Reclassify intronic variants (with c.) less than 10 based on score
What would you like to happen if it is an intronic but with score more than 10?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Do you disregard, otherwise?
# Rule 3. If none of the above tests succeeded, and the PopFreqMax < 0.011
What if the PopFreqMax is more than 0.011? Where would those go?
Can $FuncIDPrefGene be anything else than exonic, splicing, or intronic?
Would $Score ever contain a value with a plus (+12) or minor(-12)?
Would $Score ever contain a value beside a dot (.) that would not have a numeric interpretation?
# Rule 1. Set classification to Likely benign based on score less than 5 for exons
What would you like to happen if it is an exon but it is more than 5?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Would these be disregarded, otherwise?
Rule 3 was meant to be a catch all type rule but maybe it is better not to have that. If Rule 1 is exon and more than 5 then the classification is VUS. So is it better to have an else statement in Rule 1 or just remove the PopFreqMax condition from Rule 3?
Quote:
# Rule 2. Reclassify intronic variants (with c.) less than 10 based on score
What would you like to happen if it is an intronic but with score more than 10?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Do you disregard, otherwise?
I think this followss the same logic as Rule 1 in that i need an else to capture the other condition or redo Rule 3.
Quote:
# Rule 3. If none of the above tests succeeded, and the PopFreqMax < 0.011
What if the PopFreqMax is more than 0.011? Where would those go?
If PopFreqMax is greater than 0.011 classification is Likely Benign.
Quote:
Can $FuncIDPrefGene be anything else than exonic, splicing, or intronic?
Yes, these are just three of the more common, but there are several other. However eventhough there are many possible values they can all be grouped in to exonic, for exons or not exonic, for everything else.
Quote:
Would $Score ever contain a value with a plus (+12) or minor(-12)?
The number in $Score should always be 1 2 15 20 (some positive #). I used abs() just in case the format every changed to include a + or some other symbol.
Quote:
Would $Score ever contain a value beside a dot (.) that would not have a numeric interpretation?
No, a dot is only used for a null value and is always zero.
Good Day All,
I need to make a script that will do the following :
1- read a .csv file line by line, check the 3 field of each file print the whole line if
this field matches the condition (note : FS = ",")
2-from the basic file, the script should genrate 3 new files based on the step #1... (6 Replies)
Hi,
I have 500 MB of file.
I want to retain first line and last line of the file.
I am unaware of deleting lines from a file in PERL.
How can i do it in PERL?
Regards
VANITHA (3 Replies)
Dear All,
I need to classify my data into sets or ranges based on values in the second column of a file as - low medium and high.
INPUT:
file1.dat
1.tmp 1.03
2.tmp 0.38
3.tmp 3.23
4.tmp 1.34
I would like to classify all the numerical values into a range based on the followng... (3 Replies)
Hi,
I want to match the time in the file and retrieve those contents of the file.
I am taking only first two parameters of localtime(time) function minutes and seconds so partial match i am performing.
For Example
$start = "14:23";
$end = "14:30";
I am matching file contents... (3 Replies)
i try to do this for a long time
input is command ls -l
and output is:
Number of files : xx
Number of file type – awk : 5 total size: 2345 bytes // file ex type .awk
Number of file type – dat : 10 total size: 233 bytes // file ex type .dat
...
Number of unknown file type... (1 Reply)
I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated!
My input file looks like (Unix):
marker,allele1,allele2
RS1002244,1,1
RS1002244,1,3
RS1002244,3,3
RS1003719,2,2
RS1003719,2,4
RS1003719,4,4
Most markers are listed 3 times but a few... (2 Replies)
Can anyone tell me what could be the solution to following :
I have one .txt file which contains some seed information. This seed may appear multiple time in the file so what I want do is if this seed appears again in the file then that line should be removed.
here is the contents of .txt... (5 Replies)
Hi
I have a file from which i need to remove the first 4 and the last 2 lines.. i know how to do it with sed but i need to do it in a perl script.. can you please help me how to do that.
Thanks (10 Replies)