Sponsored Content
Top Forums Shell Programming and Scripting Perl to update field based on a specific set of rules Post 303000875 by durden_tyler on Friday 21st of July 2017 11:56:44 AM
Old 07-21-2017
Quote:
Originally Posted by cmccabe
...
...
Code:
if ($FuncrefGene !~ /exonic/i) {
                    my ($transcript) = ($GeneDetailrefGene) =~ /(?:[+*-]d=)(\d+)/;   # Get a numeric value if exists using (.) and (+/*/-) and capture digits into $transcript.
                             $transcript //= 0;  # Give it a value of zero if no numeric value was found.
                                $classification = 'Likely Benign' if $transcript > 10; # Reclassify intronic variants (following c. nomenclature) to Likely Benign if distance greater than 10

should capture the 43 in NM_001134408:exon3:c.415-43A>G and that wiill be the value of $transcript?
...
...
No, it will not match because the pattern "[+*-]d=" is not present in $GeneDetailrefGene. The characters "d" and "=" do not follow any one of ("+", "*", "-").
Note that in a regex, "d" matches the character "d", but "\d" matches a single digit in the range [0-9].

Also, the stream of 1 or more digits is to be matched before [+*-].
So, you may want to use this regex:

Code:
/(?:\.\d+[+*-])(\d+)/

Code:
 ...
 ...
 I am not sure how to also use f[13} in this rule.
 ...
 ...

Just use it the way you laid down the rules in your first post.
Here's the relevant excerpt from your first post:


Quote:
Originally Posted by cmccabe
...
...
but the same logic applies, that is if the value is greater than 10 and f[13] is greater than 0.01 f[55] is Likely Benign, if the value is less than 10
and f[13] is less than 0.01 f[55] is VUS in. It is possible for f[13] to be . (dot) but that is the same as zero.
...
...
...
So, since the "value" you talk about is $transcript and f[13] is $PopFreqMax, your logic would be something like:

Code:
 if ($transcript > 10 and $PopFreqMax > 0.01) {
     $classification = "Likely Benign";
 } elsif ($transcript < 10 and $PopFreqMax < 0.01) {
     $classification = "VUS";
 }

Quote:
Originally Posted by cmccabe
...
...
It is possible for f[13] to be . (dot) but that is the same as zero.
...
...
Which you have already taken care of in line # 23 of your Perl code in your post # 1, so no worries.

I don't know why you run "Rule 2" (regarding f[13] or PopFreqMax) on its own in line # 25 through 28 of your Perl code in post # 1.

Since you have to use it in conjunction with $transcript as per your logic above, use it after you have determined the value of $transcript.


Quote:
Originally Posted by cmccabe
...
...
In the cases that have multiple f[8] values, like in line 1, the first can be used.

In line 1 f[8] is

NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G
aand the ; (semi-colon) indicates the start of a new value. NM_001134408:exon3:c.415-43A>G would be the first value, so 43 is read into the $transcript variable and since f[13] is 0.0004, f[55] is VUS. ...
And the first one will be used, as the regex reads from the left of the string and tries to match as early as possible.

See below:
Code:
$
$ echo "NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-44A>G;NM_000833:exon4:c.415-45A>G" | perl -lne '/(?:\.\d+[+*-])(\d+)/ and print $1'
43
$
$

However, if your first "value" within $GeneDetailrefGene (where "values" are delimited by ";") does not match the pattern, then that pattern will be attempted in the second "value" within $GeneDetailrefGene.

In the example below, I have replaced the first "-" character by "#", so the pattern will not match anything in the first "value".
Code:
$
$ echo "NM_001134408:exon3:c.415#43A>G;NM_001134407:exon3:c.415-44A>G;NM_000833:exon4:c.415-45A>G" | perl -lne '/(?:\.\d+[+*-])(\d+)/ and print $1'
44
$
$

Perl keeps looking forward and extracts the first substring it encounters that matches the pattern.
This substring happens to be in the second "value" inside $GeneDetailrefGene.
This User Gave Thanks to durden_tyler For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Update a field in a file based on condition

Hi i am new to scripting. i have a file file.dat with content as : CONTENT_STORAGE PERCENTAGE FLAG: /storage_01 64% 0 /storage_02 17% 1 I need to update the value of FLAG for a particular CONTENT_STORAGE value I have written the following code #!/bin/sh threshold=20... (1 Reply)
Discussion started by: kichu
1 Replies

2. Shell Programming and Scripting

Help with allocated text content based on specific rules...

Input file format: /tag="ABL" /note="abl homolog 2 /tag="ABLIM1" /note="actin binding LIM 1 /tag="ABP1" /note="amiloride binding protein 1 (amine oxidase (copper- containing)) /tag="ABR" /note="active BCR-related /tag="AC003042.1" /note="SDR family member 11 precursor . . . (4 Replies)
Discussion started by: perl_beginner
4 Replies

3. Shell Programming and Scripting

Update specific field in a line of text file

I have a text file like this: subject1:LecturerA:10 subject2:LecturerA:40 if I was given string in column 1 and 2 (which are subject 1 and LecturerA) , i need to update 3rd field of that line containing that given string , which is, number 10 need to be updated to 100 ,for example. The... (6 Replies)
Discussion started by: bmtoan
6 Replies

4. Shell Programming and Scripting

Add specific string to last field of each line in perl based on value

I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the... (12 Replies)
Discussion started by: cmccabe
12 Replies

5. Shell Programming and Scripting

awk to update value in field based on another field

In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=. Hopefully the awk is a start but I... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. Shell Programming and Scripting

Perl to update field in file based of match to another file

In the perl below I am trying to set/update the value of $14 (last field) in file2, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (4 Replies)
Discussion started by: cmccabe
4 Replies

7. Shell Programming and Scripting

Perl to change value based on set of rules

In the perl there is a default rule that sets f to VUS, and then a seris of rules that will change f based on the result that is obtained from the rule. The code below is a rule that is supposed to be applicable to lines 2-4 because this rule just looks at the digit in f. So in line 2 f is 27... (4 Replies)
Discussion started by: cmccabe
4 Replies

8. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

9. Shell Programming and Scripting

Update a specific field in file with Variable value based on other Key Word

I have an input file with A=xyz B=pqr I would want the value in Second Field (xyz or pqr) updated with a value present in Shell Variable based on the value passed in the first field. (A or B ) while read line do NEW_VALUE = `some functionality done on $line` If $line=First Field-... (1 Reply)
Discussion started by: infernalhell
1 Replies

10. UNIX for Beginners Questions & Answers

Problem with getting awk to multiply a field by a value set based on condition of another field

Hi, So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple... (15 Replies)
Discussion started by: cotilloe
15 Replies
All times are GMT -4. The time now is 01:47 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy