Classify lines in file using perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Classify lines in file using perl
# 1  
Old 04-08-2017
Classify lines in file using perl

The below perl executes and does classify each of the 3 lines in file.txt. Lines 2 and 3 are correct as they fit the criteria for Rule 2.
The problem is that line one should be classified VUS as it does not meet the criteria for Rule 1, so Rule 3 is used.
However, currently Rule 2 is changing the classification to Likely Benign, if I comment that Rule out I get the expected result. I am not sure why that rule is even executed on that line as the first criteria is $FuncIDPrefGene !~ "exonic" --- if field is not exonic, but in line one that field is.
I have included comments in the code, but each rule is designed to follow a specific set of criteria. I have tried changing the order but the result is the same. Thank you Smilie

perl
Code:
#!/usr/bin/perl
use strict;

while (<>)
{
        $.<2 and print and next;
          my @f=split/\t/;
         #my @f=split/\s+/;
          my ($FuncIDPrefGene,$AAChangeIDPrefGene,$PopFreqMax,$GeneDetailIDPrefGene,$ClinSig,$Score)=@f[6,11,13,8,46,54];
# Check score for exonic set to 5
         $FuncIDPrefGene eq "exonic" && abs($Score) < 5 and &pj(\@f,"Likely Benign") and next; # Rule 1. Set classification to Likely benign based on score less than 5 for exons

# Check score for everything else set to 5 with GeneDetail following c. nomenclature
        $FuncIDPrefGene !~ "exonic" and abs($Score) < 5 and $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/; # this will capture the digits after    +/- into $1
        $1 < 10 and &pj(\@f,"Likely Benign") and next; # Rule 2. Reclassify intronic variants (with c.) less than 10 based on score

# PopFreqMax VUS
         &pj(\@f,"VUS"); # Rule 3.  If none of the above tests succeeded, and the PopFreqMax < 0.011 set the Classification field to the string VUS.
}
sub pj
{
    my $fr=shift;
       $fr->[55]=shift;
       print join("\t",@{$fr}); # add separator ,"\n"
}

desired result in [55] Classification
Code:
VUS
Likely Benign
Likely Benign


Last edited by cmccabe; 04-08-2017 at 11:29 PM..
# 2  
Old 04-08-2017
Perhaps this might help you:

Code:
#!/usr/bin/perl
use strict;
use warnings;

my $header = scalar <>;
while (<>)
{
    my @f = split /\t/;
    my ( $FuncIDPrefGene,
         $AAChangeIDPrefGene,
         $PopFreqMax,
         $GeneDetailIDPrefGene,
         $ClinSig,
         $Score ) = @f[6,11,13,8,46,54];

     print "\$FuncIDPrefGene = $FuncIDPrefGene and you're trying to abs($Score)\n";

}

Using it with the example you posted it outputs:
Code:
perl showme.pl file.txt

Code:
$FuncIDPrefGene = exonic and you're trying to abs(12)
$FuncIDPrefGene = splicing and you're trying to abs(2)
$FuncIDPrefGene = intronic and you're trying to abs(.)

You have also, precedent issues with the _and_. I suggest you make use of if/else.
This User Gave Thanks to Aia For This Post:
# 3  
Old 04-09-2017
I am not sure I follow completely. Is the logic not right. Thank you Smilie.
# 4  
Old 04-09-2017
Quote:
Originally Posted by cmccabe
I am not sure I follow completely. Is the logic not right. Thank you Smilie.
abs() is a function for numeric values, a dot is not numeric, turning the pragma warnings, would had shown you that at some point.


If the code is not producing the desired result but it runs, then the logic must not be correct.
This appears to be the flow you are following but it is flawed because of your use of abs() regardless if it has a numeric value or not. It is not possible for me to find out what's the meaning of $f[54], if it does not contain a numeric value.

Code:
#!/usr/bin/perl
use strict;
use warnings;

print scalar <>;
while (<>)
{
    my @f = split /\t/;
    my ( $FuncIDPrefGene,
         $AAChangeIDPrefGene,
         $PopFreqMax,
         $GeneDetailIDPrefGene,
         $ClinSig,
         $Score ) = @f[6,11,13,8,46,54];

    if (abs($Score) < 5) {
        if($FuncIDPrefGene eq 'exonic') {
            pj(\@f,'Likely Benign');
        }
        else {
            my $scored = $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;
            pj(\@f, 'Likely Benign') if $scored < 10;
        }
    }
    else {
        pj(\@f, 'VUS');
    }
}
sub pj
{
    my $fr = shift;
    $fr->[55] = shift;
    print join "\t", @{$fr};
}

Test:
Code:
perl test.pl file.txt 2>/dev/null | perl -naF'\t' -le 'print $F[55]'

Code:
Classification
VUS
Likely Benign
Likely Benign


Last edited by Aia; 04-09-2017 at 01:02 AM..
This User Gave Thanks to Aia For This Post:
# 5  
Old 04-09-2017
If f[54] has a . in it, the value associated with it is zero. In order to prevent column shifting due to null values I use a . in these fields.
So, I think I follow but just to make sure the abs($Score) is only used if f[54] is not a .? Is that right? Also, could you please comment the code so I may try to learn from more from it, if possible. Thank you very much Smilie.

Code:
#!/usr/bin/perl    # call perl
use strict;     # use exactdefined criteria
use warnings;   # display warning messages

print scalar <>;  # skip header line
while (<>)    # start conditional checks
{
    my @f = split /\t/;      # split on tabs
    my ( $FuncIDPrefGene,    # field 1
         $AAChangeIDPrefGene, # field 2
         $PopFreqMax,         # field 3
         $GeneDetailIDPrefGene, # field 4
         $ClinSig,              # field 5
         $Score ) = @f[6,11,13,8,46,54];   # field 6 and define field locations using 0 coordinate

    if (abs($Score) < 5) {      # check field 6 for value and ensure its less than 5
        if($FuncIDPrefGene eq 'exonic') {   # check field 1 and if exonic and conditon above met
            pj(\@f,'Likely Benign');    # set field 55 to Likely Benign
        } # end condition 1 block
        else {
            my $scored = $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;  # use field 4 and split on the . and +/1 and read value into variable
            pj(\@f, 'Likely Benign') if $scored > 10;    # if variable greater then 10 then field 55 is Likely Benign
        }  # end condition 2 block
    } 
    else {
        pj(\@f, 'VUS');  # if niether condition is meet set field 55 to VUS
    }
}  # end while block
sub pj     # define subroutine
{    # start sub block
    my $fr = shift;  # define variable 
    $fr->[55] = shift;  # use field 55 as variable
    print join "\t", @{$fr};   # print value in field
}   # end sub block


Last edited by cmccabe; 04-09-2017 at 11:18 AM.. Reason: added comment question
# 6  
Old 04-09-2017
# Rule 1. Set classification to Likely benign based on score less than 5 for exons
What would you like to happen if it is an exon but it is more than 5?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Would these be disregarded, otherwise?

# Rule 2. Reclassify intronic variants (with c.) less than 10 based on score
What would you like to happen if it is an intronic but with score more than 10?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Do you disregard, otherwise?


# Rule 3. If none of the above tests succeeded, and the PopFreqMax < 0.011
What if the PopFreqMax is more than 0.011? Where would those go?

Can $FuncIDPrefGene be anything else than exonic, splicing, or intronic?

Would $Score ever contain a value with a plus (+12) or minor(-12)?
Would $Score ever contain a value beside a dot (.) that would not have a numeric interpretation?
This User Gave Thanks to Aia For This Post:
# 7  
Old 04-09-2017
Quote:
# Rule 1. Set classification to Likely benign based on score less than 5 for exons
What would you like to happen if it is an exon but it is more than 5?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Would these be disregarded, otherwise?
Rule 3 was meant to be a catch all type rule but maybe it is better not to have that. If Rule 1 is exon and more than 5 then the classification is VUS. So is it better to have an else statement in Rule 1 or just remove the PopFreqMax condition from Rule 3?

Quote:

# Rule 2
. Reclassify intronic variants (with c.) less than 10 based on score
What would you like to happen if it is an intronic but with score more than 10?
Your logic place these into rule #3 only if PopFreqMax is less than 0.011. Do you disregard, otherwise?
I think this followss the same logic as Rule 1 in that i need an else to capture the other condition or redo Rule 3.

Quote:

# Rule 3
. If none of the above tests succeeded, and the PopFreqMax < 0.011
What if the PopFreqMax is more than 0.011? Where would those go?
If PopFreqMax is greater than 0.011 classification is Likely Benign.

Quote:
Can $FuncIDPrefGene be anything else than exonic, splicing, or intronic?
Yes, these are just three of the more common, but there are several other. However eventhough there are many possible values they can all be grouped in to exonic, for exons or not exonic, for everything else.

Quote:
Would $Score ever contain a value with a plus (+12) or minor(-12)?
The number in $Score should always be 1 2 15 20 (some positive #). I used abs() just in case the format every changed to include a + or some other symbol.

Quote:
Would $Score ever contain a value beside a dot (.) that would not have a numeric interpretation?
No, a dot is only used for a null value and is always zero.

Thank you very much Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK to classify a file into several ones ..

Good Day All, I need to make a script that will do the following : 1- read a .csv file line by line, check the 3 field of each file print the whole line if this field matches the condition (note : FS = ",") 2-from the basic file, the script should genrate 3 new files based on the step #1... (6 Replies)
Discussion started by: engkemo2002
6 Replies

2. Shell Programming and Scripting

How to delete lines from a file in PERL?

Hi, I have 500 MB of file. I want to retain first line and last line of the file. I am unaware of deleting lines from a file in PERL. How can i do it in PERL? Regards VANITHA (3 Replies)
Discussion started by: vanitham
3 Replies

3. UNIX for Dummies Questions & Answers

Classify value to a range

Dear All, I need to classify my data into sets or ranges based on values in the second column of a file as - low medium and high. INPUT: file1.dat 1.tmp 1.03 2.tmp 0.38 3.tmp 3.23 4.tmp 1.34 I would like to classify all the numerical values into a range based on the followng... (3 Replies)
Discussion started by: chen.xiao.po
3 Replies

4. Shell Programming and Scripting

How to get the lines matched of a file in perl?

Hi, I want to match the time in the file and retrieve those contents of the file. I am taking only first two parameters of localtime(time) function minutes and seconds so partial match i am performing. For Example $start = "14:23"; $end = "14:30"; I am matching file contents... (3 Replies)
Discussion started by: vanitham
3 Replies

5. Shell Programming and Scripting

How to use awk to classify file extension from input ls -l

i try to do this for a long time input is command ls -l and output is: Number of files : xx Number of file type – awk : 5 total size: 2345 bytes // file ex type .awk Number of file type – dat : 10 total size: 233 bytes // file ex type .dat ... Number of unknown file type... (1 Reply)
Discussion started by: retsuseiba
1 Replies

6. Shell Programming and Scripting

Using Perl to Merge Multiple Lines in a File

I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated! My input file looks like (Unix): marker,allele1,allele2 RS1002244,1,1 RS1002244,1,3 RS1002244,3,3 RS1003719,2,2 RS1003719,2,4 RS1003719,4,4 Most markers are listed 3 times but a few... (2 Replies)
Discussion started by: Peggy White
2 Replies

7. Shell Programming and Scripting

Parsing a file using perl and skipping some lines

Hi, Consider following file with input: `YFLG:NC^Byad_insert constraint {id=600104470} {profile=GENDER == 2} {profile=BEHAVIOR == 17} {profile=SITEATTR_MULT == siteid:211051} {profile=AGE in } yad_insert ad {id=1718286093336959379} {type=R} ^AYFLG:YOO^Byad_insert constraint {id=600104471}... (1 Reply)
Discussion started by: bvids
1 Replies

8. Shell Programming and Scripting

How to remove the lines from file using perl

Can anyone tell me what could be the solution to following : I have one .txt file which contains some seed information. This seed may appear multiple time in the file so what I want do is if this seed appears again in the file then that line should be removed. here is the contents of .txt... (5 Replies)
Discussion started by: dipakg
5 Replies

9. Shell Programming and Scripting

add lines in file with perl

How to search string like: a and replace to a a a : : a in a file with perl? Thanks, Grace (6 Replies)
Discussion started by: jinsh
6 Replies

10. Shell Programming and Scripting

strip first 4 and last 2 lines from a file using perl

Hi I have a file from which i need to remove the first 4 and the last 2 lines.. i know how to do it with sed but i need to do it in a perl script.. can you please help me how to do that. Thanks (10 Replies)
Discussion started by: meghana
10 Replies
Login or Register to Ask a Question