Classify lines in file using perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Classify lines in file using perl
# 15  
Old 04-10-2017
I am adding the below condition to change the classification of line 3 to Likely Benign. If $Score was 20 and the PopFreqMax being what it is 0.003 it would follow the default rule.
However, because in the GeneDetailIDPrefGene section the digit 50, that is stripped off of the >50 is greater than 10, so classification is Likely Benign. I know the code that strips of the 50 works, but am I doing something else wrong? Thank you Smilie.

Code:
if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/;) {   # capture the digits after any non-digit into $1
        $1 > 10   # Reclassify intronic variants (with distance only) based on score less than 5 to Likely Benign
        $classification = 'Likely Benign';
    }
else {
             my $scored = $FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   # capture the digits after . and (+/-) into $1
                if $scored < 5;    # Reclassify intronic variants (with c.) less than 5 based on score
       $classification =  'Likely Benign';
}
syntax error at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 45, near "}"

Execution of /home/cmccabe/Desktop/NGS/scripts/classifier.pl aborted due to compilation errors.


Last edited by cmccabe; 04-10-2017 at 09:58 AM.. Reason: fixed format
# 16  
Old 04-10-2017
Let me remove all the extra around what you posted and highlight the syntax issues.
Quote:
Originally Posted by cmccabe
[...]

Code:
# remove ;
if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/;) {  
        if ($1 > 10) {
             $classification = 'Likely Benign';
        }
    }
else {
             my $scored = $FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   
                if ($scored < 5) {  
                       $classification =  'Likely Benign';
                }
}

This User Gave Thanks to Aia For This Post:
# 17  
Old 04-10-2017
Below is the updated code along with attempt to fix the message. The sections in bold were updated accordingly, however the new message seems to give a different message but allows the script to run. I am a little confused as this line seems to be important but the script ignores it/ or skips it? Thank you Smilie.


Code:
# Change to Likely Benign if either of these two conditions occurs.
    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }
    # GeneDetail condition
    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {
        $1 > 10
        $classification = 'Likely Benign';
    }
    else {
           if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/)
              $1 > 10
              $classification =  'Likely Benign';
    }

# token 55 is classification.
    $f[55] = $classification;

    # display results and update @f.
    print join "\t", @f;
}   # end conditional block
Scalar found where operator expected at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 34, near "$classification"
	(Missing semicolon on previous line?)
Scalar found where operator expected at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 38, near ")
              $1"
	(Missing operator before $1?)
Scalar found where operator expected at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 39, near "$classification"
	(Missing semicolon on previous line?)
syntax error at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 34, near "$classification "
syntax error at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 38, near ")
              $1 "
Execution of /home/cmccabe/Desktop/NGS/scripts/classifier.pl aborted due to compilation errors.


adding the ; indicated by the message but the script does execute
Code:
# Change to Likely Benign if either of these two conditions occurs.
    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }
    # GeneDetail condition
    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {
        $1 > 10;
        $classification = 'Likely Benign';
    }
    else {
           if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/)
              $1 > 10;
              $classification =  'Likely Benign';
    }

# token 55 is classification.
    $f[55] = $classification;

    # display results and update @f.
    print join "\t", @f;
}   # end conditional block
Useless use of numeric gt (>) in void context at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 33.
Useless use of numeric gt (>) in void context at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 38.


Last edited by cmccabe; 04-10-2017 at 02:03 PM.. Reason: adding bold and highlighting to make it easier to read
# 18  
Old 04-10-2017
Hi cmccabe,
Please, take a look again at post #16. I highlighted for you how it needs to be if you mean it as such.
$1 > 10; It is useless as the message says.
It would be the equivalent of _the sky is blue_. So what? No flow control, there.
If the code runs it would always be $classification = 'Likely Benign' as soon as the if is met.
This User Gave Thanks to Aia For This Post:
# 19  
Old 04-10-2017
I apologize I read the post incorrectly. I am not sure why line 1 in the attached file.txt should be VUS set by the default classification. That is correct. However, when the two conditions below are added the first behaves as expected. The second (after the else) changes the first line to Likely Benign. However, it should not be applied as $FuncIDPrefGene does not equal exonic. Is there something wrong with my logic? Thank you for all your help Smilie.

Code:
# GeneDetail condition
    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {  
        if ($1 > 10) {
            $classification = 'Likely Benign';
        }
    }
        else {
             my $transcript = $FuncIDPrefGene !~/exonic/i && $GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   
             if ($transcript > 10) {  
                 $classification =  'Likely Benign';
            }
    }

desired classification
Code:
VUS     ----- default classification
Likely Benign   -----  portion before the else $Score < 5
Likely Benign    ---- portion after the else >50 is used to be Likely Benign

# 20  
Old 04-10-2017
Quote:
Originally Posted by cmccabe
Is there something wrong with my logic?
You decide.

This is not necessary,
Code:
    if ($FuncIDPrefGene !~/exonic/i && $Score < 5 && $GeneDetailIDPrefGene=~/^\D(\d+)$/) {  
        if ($1 > 10) {
            $classification = 'Likely Benign';
        }
    }

its mission is to make $classification = 'Likely Benign', however the condition above does that job already since the $Score is less than 5, regardless if it is not exonic nor the $GeneDetailIDPrefGene is more than 10.
Code:
    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }



Code:
        else {
             my $transcript = $FuncIDPrefGene !~/exonic/i &&$GeneDetailIDPrefGene=~/\.\d+[\+\-](\d+)/;   
             if ($transcript > 10) {  
                 $classification =  'Likely Benign';
            }
    }

The highlighted part does not work for >50 which is what the last line has.

Perhaps this might help, instead

Code:

    if ($Score < 5 || $PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }

    if ($FuncIDPrefGene !~ /exonic/i) {
        # Get a numeric value if exist.
        my ($transcript) = ($GeneDetailIDPrefGene) =~ /(?:\.\d+[+-]|\D)(\d+)/;
        # Give it a value of zero if no numeric value was found.
        $transcript //= 0;
        $classification = 'Likely Benign' if $transcript > 10;
    }


Last edited by Aia; 04-11-2017 at 12:49 AM..
This User Gave Thanks to Aia For This Post:
# 21  
Old 04-11-2017
Using the lines below I am trying to update the $classification by using the rules in the description, but can not get the desired output. Thank you for all your help, I really appreciate it Smilie.

Code:
35	chr1	154562623	154562625	CCG	-	intronic	ADAR	>50	.	.	.	rs779843196	0.0003	.	.	.	.	.	.	0.0001	0.0003	0.0001	.	.	0.0001	.	0.0003	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	Pathogenic|Likely Pathogenic	.	.	.	.	.	.	.	20	VUS	.	.
35	chr1	154562623	154562625	CCG	-	intronic	ADAR	>50	.	.	.	rs779843196	0.0003	.	.	.	.	.	.	0.0001	0.0003	0.0001	.	.	0.0001	.	0.0003	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	Benign|other|unknown	.	.	.	.	.	.	.	20	VUS	.	.
35	chr1	154562623	154562625	CCG	-	intronic	ADAR	>50	.	.	.	rs779843196	0.0003	.	.	.	.	.	.	0.0001	0.0003	0.0001	.	.	0.0001	.	0.0003	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	Uncertain signiffigance	.	.	.	.	.	.	.	20	VUS	.	.

Description:
In the first line classification updated toPathogenic because it follows the rules of the second else statement
In the second line the first else statement is used to update classification
In the third line the first if statement is used to update classification because it is a single entry with no |

ClinSig ---- only allow single entries in classification ----
Code:
Benign is single entry
Benign|Likely benign|Unknown is a multiple entry

Since a | (pipe) character is always presents for multiple entries, maybe: (seems to execute but nothing changes in classification)-

Code:
if ($ClinSig !~/untested|unknown|not provided|other/i && $ClinSig ne "." && $ClinSig ne "|") {
           $classification = $ClinSig;
         }
     }
   else {
        if ($ClinSig !~/Pathogenic|Likely pathogenic|Uncertain significance/i || $ClinSig eq ".") {
             $classification = 'Likely Benign';
   }
        }

   else {
        if ($ClinSig eq "Pathogenic|Likely pathogenic|Uncertain significance" && $ClinSig ne ".") {
                 #$classification = 'Pathogenic';
   }
        }

desired classification
Code:
Pathogenic
Likely Benign
Uncertain signiffigance


Last edited by cmccabe; 04-11-2017 at 11:11 AM.. Reason: fixed format
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK to classify a file into several ones ..

Good Day All, I need to make a script that will do the following : 1- read a .csv file line by line, check the 3 field of each file print the whole line if this field matches the condition (note : FS = ",") 2-from the basic file, the script should genrate 3 new files based on the step #1... (6 Replies)
Discussion started by: engkemo2002
6 Replies

2. Shell Programming and Scripting

How to delete lines from a file in PERL?

Hi, I have 500 MB of file. I want to retain first line and last line of the file. I am unaware of deleting lines from a file in PERL. How can i do it in PERL? Regards VANITHA (3 Replies)
Discussion started by: vanitham
3 Replies

3. UNIX for Dummies Questions & Answers

Classify value to a range

Dear All, I need to classify my data into sets or ranges based on values in the second column of a file as - low medium and high. INPUT: file1.dat 1.tmp 1.03 2.tmp 0.38 3.tmp 3.23 4.tmp 1.34 I would like to classify all the numerical values into a range based on the followng... (3 Replies)
Discussion started by: chen.xiao.po
3 Replies

4. Shell Programming and Scripting

How to get the lines matched of a file in perl?

Hi, I want to match the time in the file and retrieve those contents of the file. I am taking only first two parameters of localtime(time) function minutes and seconds so partial match i am performing. For Example $start = "14:23"; $end = "14:30"; I am matching file contents... (3 Replies)
Discussion started by: vanitham
3 Replies

5. Shell Programming and Scripting

How to use awk to classify file extension from input ls -l

i try to do this for a long time input is command ls -l and output is: Number of files : xx Number of file type – awk : 5 total size: 2345 bytes // file ex type .awk Number of file type – dat : 10 total size: 233 bytes // file ex type .dat ... Number of unknown file type... (1 Reply)
Discussion started by: retsuseiba
1 Replies

6. Shell Programming and Scripting

Using Perl to Merge Multiple Lines in a File

I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated! My input file looks like (Unix): marker,allele1,allele2 RS1002244,1,1 RS1002244,1,3 RS1002244,3,3 RS1003719,2,2 RS1003719,2,4 RS1003719,4,4 Most markers are listed 3 times but a few... (2 Replies)
Discussion started by: Peggy White
2 Replies

7. Shell Programming and Scripting

Parsing a file using perl and skipping some lines

Hi, Consider following file with input: `YFLG:NC^Byad_insert constraint {id=600104470} {profile=GENDER == 2} {profile=BEHAVIOR == 17} {profile=SITEATTR_MULT == siteid:211051} {profile=AGE in } yad_insert ad {id=1718286093336959379} {type=R} ^AYFLG:YOO^Byad_insert constraint {id=600104471}... (1 Reply)
Discussion started by: bvids
1 Replies

8. Shell Programming and Scripting

How to remove the lines from file using perl

Can anyone tell me what could be the solution to following : I have one .txt file which contains some seed information. This seed may appear multiple time in the file so what I want do is if this seed appears again in the file then that line should be removed. here is the contents of .txt... (5 Replies)
Discussion started by: dipakg
5 Replies

9. Shell Programming and Scripting

add lines in file with perl

How to search string like: a and replace to a a a : : a in a file with perl? Thanks, Grace (6 Replies)
Discussion started by: jinsh
6 Replies

10. Shell Programming and Scripting

strip first 4 and last 2 lines from a file using perl

Hi I have a file from which i need to remove the first 4 and the last 2 lines.. i know how to do it with sed but i need to do it in a perl script.. can you please help me how to do that. Thanks (10 Replies)
Discussion started by: meghana
10 Replies
Login or Register to Ask a Question