Classify lines in file using perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Classify lines in file using perl
# 8  
Old 04-09-2017
Quote:
Originally Posted by cmccabe
[...]
Code:
#!/usr/bin/perl    # call perl

# disables certain Perl expressions that could behave unexpectedly
#  or are difficult to debug, turning them into errors
use strict;     # use exactdefined criteria

use warnings;   # display warning messages

# Read and display the first line of the file passed at command line.
print scalar <>;  # skip header line

# Read line by line the file given at the command line.
# it could be the stdin if no file is give as argument.
while (<>)    # start conditional checks
{
    # Make tokens out of the line, using the tab the separator.
    my @f = split /\t/;      # split on tabs

    # Select 6 tokens from @f for convenience.
    my ( 
         $FuncIDPrefGene,    # field 1
         # Not used; possibly unnecessary.
         $AAChangeIDPrefGene, # field 2
         Not used.
         $PopFreqMax,         # field 3
         $GeneDetailIDPrefGene, # field 4
         # Not used; possibly unnecessary.
         $ClinSig,              # field 5
         $Score ) = @f[6,11,13,8,46,54];   # field 6 and define field locations using 0 coordinate

    if (abs($Score) < 5) {      # check field 6 for value and ensure its less than 5
        if($FuncIDPrefGene eq 'exonic') {   # check field 1 and if exonic and conditon above met
            pj(\@f,'Likely Benign');    # set field 55 to Likely Benign
        } # end condition 1 block
        else {
            my $scored = $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;  # use field 4 and split on the . and +/1 and read value into variable
            pj(\@f, 'Likely Benign') if $scored > 10;    # if variable greater then 10 then field 55 is Likely Benign
        }  # end condition 2 block
    } 
    else {
        pj(\@f, 'VUS');  # if niether condition is meet set field 55 to VUS
    }
}  # end while block
sub pj     # define subroutine
{    # start sub block
    my $fr = shift;  # define variable 
    $fr->[55] = shift;  # use field 55 as variable
    print join "\t", @{$fr};   # print value in field
}   # end sub block

This was an rearrangement of the code you posted at post #1, so you could view your logic a bit clearer, removing the and and next, operators and commands.

---------- Post updated at 09:29 AM ---------- Previous update was at 09:22 AM ----------

Could these statements encapsulate an accurate logic?

Everything is VUS by default.
Conditions that could change it to Likely Benign
score less than 5
PopFreqMax more than 0.011

If this is not accurate, still, I think you should build upon the idea that it appears that everything is VUS and you are trying to find reasons to change it to Likely Benign.

Last edited by Aia; 04-09-2017 at 12:53 PM..
This User Gave Thanks to Aia For This Post:
# 9  
Old 04-09-2017
Quote:
Could these statements encapsulate an accurate logic?

Everything is VUS by default.
Conditions that could change it to Likely Benign
score less than 5
PopFreqMax more than 0.011
Yes, these statements are accurate and true. Thank you very much Smilie.
# 10  
Old 04-09-2017
Quote:
Originally Posted by cmccabe
Yes, these statements are accurate and true. Thank you very much Smilie.
In that case this might do it.

Code:
#!/usr/bin/perl
use strict;
use warnings;

# display header
print scalar <>;
while (<>)
{
    # tokenization of line, splitting by tab.
    my @f = split /\t/;
    # tokens to check on.
    my ($PopFreqMax, $Score) = @f[13,54];

    # Default classification.
    my $classification = 'VUS';

    # map to 0 if it doesn't have numeric meaning.
    $Score = 0 if $Score eq '.';

    # Change to Likely Benign if either of these two
    # conditions occurs.
    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }
    # token 55 is classification.
    $f[55] = $classification;
    # display results.
    print join "\t", @f;
}

This User Gave Thanks to Aia For This Post:
# 11  
Old 04-09-2017
very close but I am getting a syntax error:

Code:
#!/usr/bin/perl
use strict;
use warnings;

print scalar <>;
while (<>)
{
    my @f = split /\t/;
    my ( $FuncIDPrefGene,
         $AAChangeIDPrefGene,
         $PopFreqMax,
         $GeneDetailIDPrefGene,
         $ClinSig,
         $Score ) = @f[6,11,13,8,46,54];
# map to 0 if it doesn't have numeric meaning.
         $Score = 0 if $Score eq '.';

    if (abs($Score) < 5) {
        if($FuncIDPrefGene eq 'exonic') {
            pj(\@f,'Likely Benign');
        }
        else {
            my $scored = $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;
            pj(\@f, 'Likely Benign') if $scored < 10;
        }
        else {
            my $scored = $GeneDetailIDPrefGene=~/^\D(\d+)$/;
            pj(\@f, 'Likely Benign') if $scored < 10;
        }
    else {
        pj(\@f, 'VUS');
    }
}
sub pj
{
    my $fr = shift;
    $fr->[55] = shift;
    print join "\t", @{$fr};
}
syntax error at /home/cmccabe/classify3.pl line 26, near "else"
Illegal declaration of subroutine main::pj at /home/cmccabe/classify3.pl

Thank you very much Smilie.

Also,
adding a:
Code:
# Default classification.
    my $classification = 'VUS';

means:
sub pj looks like:
Code:
sub pj
{
    my $fr=shift;
       $fr->[55]=shift;
       print join("\t",@{$fr}); # add seperater ,"\n
  {
# token 55 is classification.
    $f[55] = $classification;
    # display results.
    print join "\t", @f;
   }
}

--- to define $my classification

Last edited by cmccabe; 04-09-2017 at 04:59 PM.. Reason: added default question
# 12  
Old 04-09-2017
Quote:
Originally Posted by cmccabe
very close but I am getting a syntax error:

Code:
#!/usr/bin/perl
use strict;
use warnings;

print scalar <>;
while (<>)
{
    my @f = split /\t/;
    my ( $FuncIDPrefGene,
         $AAChangeIDPrefGene,
         $PopFreqMax,
         $GeneDetailIDPrefGene,
         $ClinSig,
         $Score ) = @f[6,11,13,8,46,54];
# map to 0 if it doesn't have numeric meaning.
         $Score = 0 if $Score eq '.';

    if (abs($Score) < 5) {
        if($FuncIDPrefGene eq 'exonic') {
            pj(\@f,'Likely Benign');
        }
        else {
            my $scored = $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;
            pj(\@f, 'Likely Benign') if $scored < 10;
        }
#### You cannot have an else by itself ####
        else {
            my $scored = $GeneDetailIDPrefGene=~/^\D(\d+)$/;
            pj(\@f, 'Likely Benign') if $scored < 10;
        }
    else {
        pj(\@f, 'VUS');
    }
}
sub pj
{
    my $fr = shift;
    $fr->[55] = shift;
    print join "\t", @{$fr};
}
syntax error at /home/cmccabe/classify3.pl line 26, near "else"
Illegal declaration of subroutine main::pj at /home/cmccabe/classify3.pl

Thank you very much Smilie.

Also,
adding a:
Code:
# Default classification.
    my $classification = 'VUS';

means:
sub pj looks like:
Code:
sub pj
{
    my $fr=shift;
       $fr->[55]=shift;
       print join("\t",@{$fr}); # add seperater ,"\n

### Not proper ####
  {
# token 55 is classification.
    $f[55] = $classification;
    # display results.
    print join "\t", @f;
   }
### End of not proper ####
}

--- to define $my classification
Hi, cmccabe

What you are trying to do now is contradictory to what you said it was true in post #9.
The code I posted in #10, where does not meet your expectations?
# 13  
Old 04-09-2017
Hi Aia,

The code works great, I have several additional rules/conditions to use. I did not post them all to keep the post shorter. I was trying to add them following your code as that works much better. Thank you Smilie.
# 14  
Old 04-10-2017
Quote:
Originally Posted by cmccabe
Hi Aia,

The code works great, I have several additional rules/conditions to use. I did not post them all to keep the post shorter. I was trying to add them following your code as that works much better. Thank you Smilie.
I understand, then. However, going back to the flawed work flow will not guarantee to keep every line.

If my suggestion works and you need to add more conditions, just follow the pattern. Ditch the subroutine pj, you really do not need it.
Code:
#!/usr/bin/perl
use strict;
use warnings;

print scalar <>;
while (<>)
{
    my @f = split /\t/;
    # Change this to hold more variables for checking.
    my ($PopFreqMax, $Score) = @f[13,54];

    # Default classification.
    my $classification = 'VUS';

    # map to 0 if it doesn't have numeric meaning.
    $Score = 0 if $Score eq '.';
    # if you must.
    $Score = abs($Score);

    # Change to Likely Benign if either of these two
    # conditions occurs.
    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }

    # Create here any other conditions that might change $classification
    if () {
       $classification = '...';
    }
    else {
       $classification =  '...';
    }

   # When you get to this point you are ready to change $f[55] token
   # and to display the result.

    # token 55 is classification.
    $f[55] = $classification;
    # display results.
    print join "\t", @f;
}

This User Gave Thanks to Aia For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK to classify a file into several ones ..

Good Day All, I need to make a script that will do the following : 1- read a .csv file line by line, check the 3 field of each file print the whole line if this field matches the condition (note : FS = ",") 2-from the basic file, the script should genrate 3 new files based on the step #1... (6 Replies)
Discussion started by: engkemo2002
6 Replies

2. Shell Programming and Scripting

How to delete lines from a file in PERL?

Hi, I have 500 MB of file. I want to retain first line and last line of the file. I am unaware of deleting lines from a file in PERL. How can i do it in PERL? Regards VANITHA (3 Replies)
Discussion started by: vanitham
3 Replies

3. UNIX for Dummies Questions & Answers

Classify value to a range

Dear All, I need to classify my data into sets or ranges based on values in the second column of a file as - low medium and high. INPUT: file1.dat 1.tmp 1.03 2.tmp 0.38 3.tmp 3.23 4.tmp 1.34 I would like to classify all the numerical values into a range based on the followng... (3 Replies)
Discussion started by: chen.xiao.po
3 Replies

4. Shell Programming and Scripting

How to get the lines matched of a file in perl?

Hi, I want to match the time in the file and retrieve those contents of the file. I am taking only first two parameters of localtime(time) function minutes and seconds so partial match i am performing. For Example $start = "14:23"; $end = "14:30"; I am matching file contents... (3 Replies)
Discussion started by: vanitham
3 Replies

5. Shell Programming and Scripting

How to use awk to classify file extension from input ls -l

i try to do this for a long time input is command ls -l and output is: Number of files : xx Number of file type – awk : 5 total size: 2345 bytes // file ex type .awk Number of file type – dat : 10 total size: 233 bytes // file ex type .dat ... Number of unknown file type... (1 Reply)
Discussion started by: retsuseiba
1 Replies

6. Shell Programming and Scripting

Using Perl to Merge Multiple Lines in a File

I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated! My input file looks like (Unix): marker,allele1,allele2 RS1002244,1,1 RS1002244,1,3 RS1002244,3,3 RS1003719,2,2 RS1003719,2,4 RS1003719,4,4 Most markers are listed 3 times but a few... (2 Replies)
Discussion started by: Peggy White
2 Replies

7. Shell Programming and Scripting

Parsing a file using perl and skipping some lines

Hi, Consider following file with input: `YFLG:NC^Byad_insert constraint {id=600104470} {profile=GENDER == 2} {profile=BEHAVIOR == 17} {profile=SITEATTR_MULT == siteid:211051} {profile=AGE in } yad_insert ad {id=1718286093336959379} {type=R} ^AYFLG:YOO^Byad_insert constraint {id=600104471}... (1 Reply)
Discussion started by: bvids
1 Replies

8. Shell Programming and Scripting

How to remove the lines from file using perl

Can anyone tell me what could be the solution to following : I have one .txt file which contains some seed information. This seed may appear multiple time in the file so what I want do is if this seed appears again in the file then that line should be removed. here is the contents of .txt... (5 Replies)
Discussion started by: dipakg
5 Replies

9. Shell Programming and Scripting

add lines in file with perl

How to search string like: a and replace to a a a : : a in a file with perl? Thanks, Grace (6 Replies)
Discussion started by: jinsh
6 Replies

10. Shell Programming and Scripting

strip first 4 and last 2 lines from a file using perl

Hi I have a file from which i need to remove the first 4 and the last 2 lines.. i know how to do it with sed but i need to do it in a perl script.. can you please help me how to do that. Thanks (10 Replies)
Discussion started by: meghana
10 Replies
Login or Register to Ask a Question