Classify lines in file using perl


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Classify lines in file using perl
# 22  
Old 04-12-2017
*** Please disregard this note. I didn't notice the context of this thread, and just read post #21 and had seen lots of previous threads from this poster concerning awk. ***

It looks like you need to reread your awk man page. The concatenation of the three strings $ClinSig (i.e., the contents of the file with the field number contained in the awk variable ClinSig), the contents of the variable named ne (an empty string if there is no variable of that name defined in your script), and the string . will yield a true result in an expression as long as the contention of those three strings in not an empty string and does not appear to be a numeric string that evaluates to zero.

If you were trying to create a logical expression instead of a string concatenation, the Not Equal operator in awk is != (not ne) and the Equality operator in awk is == (not eq).

Last edited by Don Cragun; 04-12-2017 at 04:46 PM..
This User Gave Thanks to Don Cragun For This Post:
# 23  
Old 04-12-2017
Quote:
Originally Posted by Don Cragun
It looks like you need to reread your awk man page. The concatenation of the three strings $ClinSig (i.e., the contents of the file with the field number contained in the awk variable ClinSig), the contents of the variable named ne (an empty string if there is no variable of that name defined in your script), and the string . will yield a true result in an expression as long as the contention of those three strings in not an empty string and does not appear to be a numeric string that evaluates to zero.

If you were trying to create a logical expression instead of a string concatenation, the Not Equal operator in awk is != (not ne) and the Equality operator in awk is == (not eq).
Hi Don,

It appears you are under the impression that this thread is about AWK, while all 21 posts, until now, it has been about Perl and your comment is not applicable.

Hi cmccabe,

It is a bit confusing the information you gave in your last post #21. You might need to try to explain some more.
For example, it is not clear if $ClinSig !~/untested|unknown|not provided|other/ you intend it as a regex alternation or if it is a string value contained in $ClinSig. The text (lines) in the file appears to point that it might be the later.
These 2 Users Gave Thanks to Aia For This Post:
# 24  
Old 04-12-2017
Hi Aia,

It is possible for the text in $ClinSig to be either a single string like Pathogenic, or multiple strings separated by a |, like Benign|Other|Unknown.

The regex is meant to test for the presence or absence of those keywords, and update classification accordingly.

Code:
   if ($ClinSig !~/untested|unknown|not provided|other/i && $ClinSig ne "." && $ClinSig ne "|") {      
        $classification = $ClinSig;  
  } 
  elsif ($ClinSig !~/Pathogenic|Likely pathogenic|Uncertain significance/i || $ClinSig eq ".") {
             $classification = 'Likely Benign';
       }  
   else {
           if ($ClinSig=~/Pathogenic|Likely pathogenic/i" && $ClinSig ne ".") {
                 $classification = 'Pathogenic';
           }
        } 
elsif ($ClinSig=~/Uncertain significance/i && $ClinSig ne ".") 
                $classification = 'VUS';
     }

Description
Code:
Only allow single keyword entries, like Pathogenic, that don't contain any of the regex, is not a ., and doesn't contain a |,a pipe character denotes multiple entries, like Benign|Likely benign.
In this case the default value of VUS remains and the conditions in the else statements will be used.

If $Clinsig contains any other words except what in the regex, or a .$classification is Likely Benign

If $Clinsig  contains any of the words in the regex and ., $classification is Pathogenic

If $Clinsig contains the words in the regex and  ., $classification is VUS

I hope this helps and thank you very much for all of your help Smilie.

Last edited by cmccabe; 04-12-2017 at 06:48 PM.. Reason: fixed format
# 25  
Old 04-12-2017
Code:
if ($ClinSig !~ /something/) {
   # This is not something
}
else {
    # Can ONLY be something
}

Code:
if ($ClinSig !~/untested|unknown|not provided|other|[.|]/i) {
   # Anything that it is not untested or unknown or not provide or other or dot or pipe
}
else {
   # Here can only be untested or unknown or not provided or other or dot or pipe.
   # It will never be Pathogenic or any other term except above. Even if you place it inside an if
}

I suggest you start with what it is

Code:
if( $ClinSig =~ /something/ ) {
    # assignment for something
}
elsif( $ClinSig =~ /somethingElse/ ){
    # assignment for somethingElse
}
elsif($ClinSig =~ /notquiteSomething/ ){
    # assignment for notquiteSomething
}
elsif( $ClinSig =~ /likeSomething/ ) {
   # assignment for likeSomething
}


Last edited by Aia; 04-12-2017 at 11:54 PM..
This User Gave Thanks to Aia For This Post:
# 26  
Old 04-13-2017
The 3 languages used the most in science are awk,bash,perl,(not exclusively those) so I try to understand and learn about them as much as possible. Books, this forum, and practice have all been a great help. There are still things that I do not understand, but I am much more knowledgeable. Thank you all Smilie.

---------- Post updated at 07:46 AM ---------- Previous update was at 05:29 AM ----------

Code:
Argument "." isn't numeric in numeric gt (>) at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 35, <> line 16665.

line 16665
chrX	57313357	57313357	T	C	exonic	FAAH2	.	.	synonymous SNV	FAAH2:NM_174912.3:exon1:c.99T>C:p.G33G	rs2516023	0.9	0.53	0.73	0.35	0.29	0.22	0.29	0.41	0.9	0.51	0.37	0.22	0.3	0.38	0.42	0.49	0.88	0.27	0.5	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	GOOD	82	hom	24	Likely Benign	.	.

Seems like the message is from the $GeneDetail.IDP.refGene in red:
So maybe,
$GeneDetail.IDP.refGene = 0 if $Score eq '.'; would fix the error? Thank you Smilie.


I believe the field in bold

Last edited by cmccabe; 04-13-2017 at 05:45 PM.. Reason: added details
# 27  
Old 04-15-2017
Quote:
Originally Posted by cmccabe
The 3 languages used the most in science are awk,bash,perl,(not exclusively those) so I try to understand and learn about them as much as possible. Books, this forum, and practice have all been a great help. There are still things that I do not understand, but I am much more knowledgeable. Thank you all Smilie.

---------- Post updated at 07:46 AM ---------- Previous update was at 05:29 AM ----------

Code:
Argument "." isn't numeric in numeric gt (>) at /home/cmccabe/Desktop/NGS/scripts/classifier.pl line 35, <> line 16665.

line 16665
chrX	57313357	57313357	T	C	exonic	FAAH2	.	.	synonymous SNV	FAAH2:NM_174912.3:exon1:c.99T>C:p.G33G	rs2516023	0.9	0.53	0.73	0.35	0.29	0.22	0.29	0.41	0.9	0.51	0.37	0.22	0.3	0.38	0.42	0.49	0.88	0.27	0.5	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	.	GOOD	82	hom	24	Likely Benign	.	.

Seems like the message is from the $GeneDetail.IDP.refGene in red:
So maybe,
$GeneDetail.IDP.refGene = 0 if $Score eq '.'; would fix the error? Thank you Smilie.


I believe the field in bold
Back in post number 20 I showed you how to handle that.
Quote:
Originally Posted by Aia

Code:
    if ($FuncIDPrefGene !~ /exonic/i) {
        # Get a numeric value if exist.
        my ($transcript) = ($GeneDetailIDPrefGene) =~ /(?:\.\d+[+-]|\D)(\d+)/;
        # Give it a value of zero if no numeric value was found.
        $transcript //= 0;
        $classification = 'Likely Benign' if $transcript > 10;
    }


$GeneDetail.IDP.refGene
variable names should not contain periods.
This User Gave Thanks to Aia For This Post:
# 28  
Old 04-16-2017
Thank you very much for all of your help it getting this to work and for the explanations, I really appreciate them Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

AWK to classify a file into several ones ..

Good Day All, I need to make a script that will do the following : 1- read a .csv file line by line, check the 3 field of each file print the whole line if this field matches the condition (note : FS = ",") 2-from the basic file, the script should genrate 3 new files based on the step #1... (6 Replies)
Discussion started by: engkemo2002
6 Replies

2. Shell Programming and Scripting

How to delete lines from a file in PERL?

Hi, I have 500 MB of file. I want to retain first line and last line of the file. I am unaware of deleting lines from a file in PERL. How can i do it in PERL? Regards VANITHA (3 Replies)
Discussion started by: vanitham
3 Replies

3. UNIX for Dummies Questions & Answers

Classify value to a range

Dear All, I need to classify my data into sets or ranges based on values in the second column of a file as - low medium and high. INPUT: file1.dat 1.tmp 1.03 2.tmp 0.38 3.tmp 3.23 4.tmp 1.34 I would like to classify all the numerical values into a range based on the followng... (3 Replies)
Discussion started by: chen.xiao.po
3 Replies

4. Shell Programming and Scripting

How to get the lines matched of a file in perl?

Hi, I want to match the time in the file and retrieve those contents of the file. I am taking only first two parameters of localtime(time) function minutes and seconds so partial match i am performing. For Example $start = "14:23"; $end = "14:30"; I am matching file contents... (3 Replies)
Discussion started by: vanitham
3 Replies

5. Shell Programming and Scripting

How to use awk to classify file extension from input ls -l

i try to do this for a long time input is command ls -l and output is: Number of files : xx Number of file type – awk : 5 total size: 2345 bytes // file ex type .awk Number of file type – dat : 10 total size: 233 bytes // file ex type .dat ... Number of unknown file type... (1 Reply)
Discussion started by: retsuseiba
1 Replies

6. Shell Programming and Scripting

Using Perl to Merge Multiple Lines in a File

I've hunted and hunted but nothing seems to apply to what I need. Any help will be much appreciated! My input file looks like (Unix): marker,allele1,allele2 RS1002244,1,1 RS1002244,1,3 RS1002244,3,3 RS1003719,2,2 RS1003719,2,4 RS1003719,4,4 Most markers are listed 3 times but a few... (2 Replies)
Discussion started by: Peggy White
2 Replies

7. Shell Programming and Scripting

Parsing a file using perl and skipping some lines

Hi, Consider following file with input: `YFLG:NC^Byad_insert constraint {id=600104470} {profile=GENDER == 2} {profile=BEHAVIOR == 17} {profile=SITEATTR_MULT == siteid:211051} {profile=AGE in } yad_insert ad {id=1718286093336959379} {type=R} ^AYFLG:YOO^Byad_insert constraint {id=600104471}... (1 Reply)
Discussion started by: bvids
1 Replies

8. Shell Programming and Scripting

How to remove the lines from file using perl

Can anyone tell me what could be the solution to following : I have one .txt file which contains some seed information. This seed may appear multiple time in the file so what I want do is if this seed appears again in the file then that line should be removed. here is the contents of .txt... (5 Replies)
Discussion started by: dipakg
5 Replies

9. Shell Programming and Scripting

add lines in file with perl

How to search string like: a and replace to a a a : : a in a file with perl? Thanks, Grace (6 Replies)
Discussion started by: jinsh
6 Replies

10. Shell Programming and Scripting

strip first 4 and last 2 lines from a file using perl

Hi I have a file from which i need to remove the first 4 and the last 2 lines.. i know how to do it with sed but i need to do it in a perl script.. can you please help me how to do that. Thanks (10 Replies)
Discussion started by: meghana
10 Replies
Login or Register to Ask a Question