Classify lines in file using perl

04-10-2017

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

I am adding the below condition to change the classification of line 3 to Likely Benign. If $Score was 20 and the PopFreqMax being what it is 0.003 it would follow the default rule.
However, because in the GeneDetailIDPrefGene section the digit 50, that is stripped off of the >50 is greater than 10, so classification is Likely Benign. I know the code that strips of the 50 works, but am I doing something else wrong? Thank you

.

Code:

if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/;) {   # capture the digits after any non-digit into $1
        $1 > 10   # Reclassify intronic variants (with distance only) based on score less than 5 to Likely Benign
        $classification = 'Likely Benign';
    }
else {
             my $scored = $FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   # capture the digits after . and (+/-) into $1
                if $scored < 5;    # Reclassify intronic variants (with c.) less than 5 based on score
       $classification =  'Likely Benign';
}
syntax error at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 45, near "}"

Execution of /home/cmccabe/Desktop/NGS/scripts/classifier.pl aborted due to compilation errors.

Last edited by cmccabe; 04-10-2017 at 09:58 AM.. Reason: fixed format

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

04-10-2017

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Let me remove all the extra around what you posted and highlight the syntax issues.

Quote:

Originally Posted by cmccabe

[...]

Code:

# remove ;
if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/;) {  
        if ($1 > 10) {
             $classification = 'Likely Benign';
        }
    }
else {
             my $scored = $FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   
                if ($scored < 5) {  
                       $classification =  'Likely Benign';
                }
}

This User Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

04-10-2017

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Below is the updated code along with attempt to fix the message. The sections in bold were updated accordingly, however the new message seems to give a different message but allows the script to run. I am a little confused as this line seems to be important but the script ignores it/ or skips it? Thank you

.

Code:

# Change to Likely Benign if either of these two conditions occurs.
    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }
    # GeneDetail condition
    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {
        $1 > 10
        $classification = 'Likely Benign';
    }
    else {
           if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/)
              $1 > 10
              $classification =  'Likely Benign';
    }

# token 55 is classification.
    $f[55] = $classification;

    # display results and update @f.
    print join "\t", @f;
}   # end conditional block
Scalar found where operator expected at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 34, near "$classification"
	(Missing semicolon on previous line?)
Scalar found where operator expected at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 38, near ")
              $1"
	(Missing operator before $1?)
Scalar found where operator expected at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 39, near "$classification"
	(Missing semicolon on previous line?)
syntax error at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 34, near "$classification "
syntax error at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 38, near ")
              $1 "
Execution of /home/cmccabe/Desktop/NGS/scripts/classifier.pl aborted due to compilation errors.

adding the ; indicated by the message but the script does execute

Code:

# Change to Likely Benign if either of these two conditions occurs.
    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }
    # GeneDetail condition
    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {
        $1 > 10;
        $classification = 'Likely Benign';
    }
    else {
           if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/)
              $1 > 10;
              $classification =  'Likely Benign';
    }

# token 55 is classification.
    $f[55] = $classification;

    # display results and update @f.
    print join "\t", @f;
}   # end conditional block
Useless use of numeric gt (>) in void context at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 33.
Useless use of numeric gt (>) in void context at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 38.

Last edited by cmccabe; 04-10-2017 at 02:03 PM.. Reason: adding bold and highlighting to make it easier to read

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

04-10-2017

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Hi cmccabe,
Please, take a look again at post #16. I highlighted for you how it needs to be if you mean it as such.
$1 > 10; It is useless as the message says.
It would be the equivalent of _the sky is blue_. So what? No flow control, there.
If the code runs it would always be $classification = 'Likely Benign' as soon as the if is met.

This User Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

04-10-2017

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

I apologize I read the post incorrectly. I am not sure why line 1 in the attached file.txt should be VUS set by the default classification. That is correct. However, when the two conditions below are added the first behaves as expected. The second (after the else) changes the first line to Likely Benign. However, it should not be applied as $FuncIDPrefGene does not equal exonic. Is there something wrong with my logic? Thank you for all your help

.

Code:

# GeneDetail condition
    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {  
        if ($1 > 10) {
            $classification = 'Likely Benign';
        }
    }
        else {
             my $transcript = $FuncIDPrefGene !~/exonic/i && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   
             if ($transcript > 10) {  
                 $classification =  'Likely Benign';
            }
    }

desired classification

Code:

VUS     ----- default classification
Likely Benign   -----  portion before the else $Score < 5
Likely Benign    ---- portion after the else >50 is used to be Likely Benign

file.txt (1.3 KB)

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

04-10-2017

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Quote:

Originally Posted by cmccabe

Is there something wrong with my logic?

You decide.

This is not necessary,

Code:

    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {  
        if ($1 > 10) {
            $classification = 'Likely Benign';
        }
    }

its mission is to make $classification = 'Likely Benign', however the condition above does that job already since the $Score is less than 5, regardless if it is not exonic nor the $GeneDetailIDPrefGene is more than 10.

Code:

    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }

Code:

        else {
             my $transcript = $FuncIDPrefGene !~/exonic/i &&$GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   
             if ($transcript > 10) {  
                 $classification =  'Likely Benign';
            }
    }

The highlighted part does not work for >50 which is what the last line has.

Perhaps this might help, instead

Code:


    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }

    if ($FuncIDPrefGene !~ /exonic/i) {
        # Get a numeric value if exist.
        my ($transcript) = ($GeneDetailIDPrefGene) =~ /(?:\.\d+[+-]|\D)(\d+)/;
        # Give it a value of zero if no numeric value was found.
        $transcript //= 0;
        $classification = 'Likely Benign' if $transcript > 10;
    }

Last edited by Aia; 04-11-2017 at 12:49 AM..

This User Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

04-11-2017

Registered User

1,393, 20

Join Date: Nov 2013

Last Activity: 1 May 2020, 2:35 PM EDT

Location: Chicago

Posts: 1,393

Thanks Given: 901

Thanked 20 Times in 19 Posts

Using the lines below I am trying to update the $classification by using the rules in the description, but can not get the desired output. Thank you for all your help, I really appreciate it

.

Code:

35	chr1	154562623	154562625	CCG	-	intronic	ADAR	>50	.	.	.	rs779843196	0.0003	.	.	.	.	.	.	0.0001	0.0003	0.0001	.	.	0.0001	.	0.0003	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	Pathogenic|Likely Pathogenic	.	.	.	.	.	.	.	20	VUS	.	.
35	chr1	154562623	154562625	CCG	-	intronic	ADAR	>50	.	.	.	rs779843196	0.0003	.	.	.	.	.	.	0.0001	0.0003	0.0001	.	.	0.0001	.	0.0003	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	Benign|other|unknown	.	.	.	.	.	.	.	20	VUS	.	.
35	chr1	154562623	154562625	CCG	-	intronic	ADAR	>50	.	.	.	rs779843196	0.0003	.	.	.	.	.	.	0.0001	0.0003	0.0001	.	.	0.0001	.	0.0003	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	Uncertain signiffigance	.	.	.	.	.	.	.	20	VUS	.	.

Description:
In the first line classification updated toPathogenic because it follows the rules of the second else statement
In the second line the first else statement is used to update classification
In the third line the first if statement is used to update classification because it is a single entry with no |

ClinSig ---- only allow single entries in classification ----

Code:

Benign is single entry
Benign|Likely benign|Unknown is a multiple entry

Since a | (pipe) character is always presents for multiple entries, maybe: (seems to execute but nothing changes in classification)-

Code:

if ($ClinSig !~/untested|unknown|not provided|other/i && $ClinSig ne "." && $ClinSig ne "|") {
           $classification = $ClinSig;
         }
     }
   else {
        if ($ClinSig !~/Pathogenic|Likely pathogenic|Uncertain significance/i || $ClinSig eq ".") {
             $classification = 'Likely Benign';
   }
        }

   else {
        if ($ClinSig eq "Pathogenic|Likely pathogenic|Uncertain significance" && $ClinSig ne ".") {
                 #$classification = 'Pathogenic';
   }
        }

desired classification

Code:

Pathogenic
Likely Benign
Uncertain signiffigance

Last edited by cmccabe; 04-11-2017 at 11:11 AM.. Reason: fixed format

cmccabe

View Public Profile for cmccabe

Find all posts by cmccabe

Shell Programming and Scripting

Classify lines in file using perl

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK to classify a file into several ones ..

Discussion started by: engkemo2002

2. Shell Programming and Scripting

How to delete lines from a file in PERL?

Discussion started by: vanitham

3. UNIX for Dummies Questions & Answers

Classify value to a range

Discussion started by: chen.xiao.po

4. Shell Programming and Scripting

How to get the lines matched of a file in perl?

Discussion started by: vanitham

5. Shell Programming and Scripting

How to use awk to classify file extension from input ls -l

Discussion started by: retsuseiba

6. Shell Programming and Scripting

Using Perl to Merge Multiple Lines in a File

Discussion started by: Peggy White

7. Shell Programming and Scripting

Parsing a file using perl and skipping some lines

Discussion started by: bvids

8. Shell Programming and Scripting

How to remove the lines from file using perl

Discussion started by: dipakg

9. Shell Programming and Scripting

add lines in file with perl

Discussion started by: jinsh

10. Shell Programming and Scripting

strip first 4 and last 2 lines from a file using perl

Discussion started by: meghana