Perl to update field based on a specific set of rules Post: 303000875

Sponsored Content

Top Forums Shell Programming and Scripting Perl to update field based on a specific set of rules Post 303000875 by durden_tyler on Friday 21st of July 2017 11:56:44 AM

07-21-2017

Registered User

Quote:

Originally Posted by cmccabe

...
...

Code:

if ($FuncrefGene !~ /exonic/i) {
                    my ($transcript) = ($GeneDetailrefGene) =~ /(?:[+*-]d=)(\d+)/;   # Get a numeric value if exists using (.) and (+/*/-) and capture digits into $transcript.
                             $transcript //= 0;  # Give it a value of zero if no numeric value was found.
                                $classification = 'Likely Benign' if $transcript > 10; # Reclassify intronic variants (following c. nomenclature) to Likely Benign if distance greater than 10

should capture the 43 in NM_001134408:exon3:c.415-43A>G and that wiill be the value of $transcript?
...
...

No, it will not match because the pattern "[+*-]d=" is not present in $GeneDetailrefGene. The characters "d" and "=" do not follow any one of ("+", "*", "-").
Note that in a regex, "d" matches the character "d", but "\d" matches a single digit in the range [0-9].

Also, the stream of 1 or more digits is to be matched before [+*-].
So, you may want to use this regex:

Code:

/(?:\.\d+[+*-])(\d+)/

Code:

 ...
 ...
 I am not sure how to also use f[13} in this rule.
 ...
 ...

Just use it the way you laid down the rules in your first post.
Here's the relevant excerpt from your first post:

Quote:

Originally Posted by cmccabe

...
...
but the same logic applies, that is if the value is greater than 10 and f[13] is greater than 0.01 f[55] is Likely Benign, if the value is less than 10
and f[13] is less than 0.01 f[55] is VUS in. It is possible for f[13] to be . (dot) but that is the same as zero.
...
...
...

So, since the "value" you talk about is $transcript and f[13] is $PopFreqMax, your logic would be something like:

Code:

 if ($transcript > 10 and $PopFreqMax > 0.01) {
     $classification = "Likely Benign";
 } elsif ($transcript < 10 and $PopFreqMax < 0.01) {
     $classification = "VUS";
 }

Quote:

Originally Posted by cmccabe

...
...
It is possible for f[13] to be . (dot) but that is the same as zero.
...
...

Which you have already taken care of in line # 23 of your Perl code in your post # 1, so no worries.

I don't know why you run "Rule 2" (regarding f[13] or PopFreqMax) on its own in line # 25 through 28 of your Perl code in post # 1.

Since you have to use it in conjunction with $transcript as per your logic above, use it after you have determined the value of $transcript.

Quote:

Originally Posted by cmccabe

...
...
In the cases that have multiple f[8] values, like in line 1, the first can be used.

In line 1 f[8] is

NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G
aand the ; (semi-colon) indicates the start of a new value. NM_001134408:exon3:c.415-43A>G would be the first value, so 43 is read into the $transcript variable and since f[13] is 0.0004, f[55] is VUS. ...

And the first one will be used, as the regex reads from the left of the string and tries to match as early as possible.

See below:

Code:

$
$ echo "NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-44A>G;NM_000833:exon4:c.415-45A>G" | perl -lne '/(?:\.\d+[+*-])(\d+)/ and print $1'
43
$
$

However, if your first "value" within $GeneDetailrefGene (where "values" are delimited by ";") does not match the pattern, then that pattern will be attempted in the second "value" within $GeneDetailrefGene.

In the example below, I have replaced the first "-" character by "#", so the pattern will not match anything in the first "value".

Code:

$
$ echo "NM_001134408:exon3:c.415#43A>G;NM_001134407:exon3:c.415-44A>G;NM_000833:exon4:c.415-45A>G" | perl -lne '/(?:\.\d+[+*-])(\d+)/ and print $1'
44
$
$

Perl keeps looking forward and extracts the first substring it encounters that matches the pattern.
This substring happens to be in the second "value" inside $GeneDetailrefGene.

This User Gave Thanks to durden_tyler For This Post:

durden_tyler

View Public Profile for durden_tyler

Find all posts by durden_tyler

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Update a field in a file based on condition

Hi i am new to scripting. i have a file file.dat with content as : CONTENT_STORAGE PERCENTAGE FLAG: /storage_01 64% 0 /storage_02 17% 1 I need to update the value of FLAG for a particular CONTENT_STORAGE value I have written the following code #!/bin/sh threshold=20...

2. Shell Programming and Scripting

Help with allocated text content based on specific rules...

Input file format: /tag="ABL" /note="abl homolog 2 /tag="ABLIM1" /note="actin binding LIM 1 /tag="ABP1" /note="amiloride binding protein 1 (amine oxidase (copper- containing)) /tag="ABR" /note="active BCR-related /tag="AC003042.1" /note="SDR family member 11 precursor . . .

3. Shell Programming and Scripting

Update specific field in a line of text file

I have a text file like this: subject1:LecturerA:10 subject2:LecturerA:40 if I was given string in column 1 and 2 (which are subject 1 and LecturerA) , i need to update 3rd field of that line containing that given string , which is, number 10 need to be updated to 100 ,for example. The...

4. Shell Programming and Scripting

Add specific string to last field of each line in perl based on value

I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the...

5. Shell Programming and Scripting

awk to update value in field based on another field

In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=. Hopefully the awk is a start but I...

6. Shell Programming and Scripting

Perl to update field in file based of match to another file

In the perl below I am trying to set/update the value of $14 (last field) in file2, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;...

7. Shell Programming and Scripting

Perl to change value based on set of rules

In the perl there is a default rule that sets f to VUS, and then a seris of rules that will change f based on the result that is obtained from the rule. The code below is a rule that is supposed to be applicable to lines 2-4 because this rule just looks at the digit in f. So in line 2 f is 27...

8. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk...

9. Shell Programming and Scripting

Update a specific field in file with Variable value based on other Key Word

I have an input file with A=xyz B=pqr I would want the value in Second Field (xyz or pqr) updated with a value present in Shell Variable based on the value passed in the first field. (A or B ) while read line do NEW_VALUE = `some functionality done on $line` If $line=First Field-...

10. UNIX for Beginners Questions & Answers

Problem with getting awk to multiply a field by a value set based on condition of another field

Hi, So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple...

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Update a field in a file based on condition

Discussion started by: kichu

2. Shell Programming and Scripting

Help with allocated text content based on specific rules...

Discussion started by: perl_beginner

3. Shell Programming and Scripting

Update specific field in a line of text file

Discussion started by: bmtoan

4. Shell Programming and Scripting

Add specific string to last field of each line in perl based on value

Discussion started by: cmccabe

5. Shell Programming and Scripting

awk to update value in field based on another field

Discussion started by: cmccabe

6. Shell Programming and Scripting

Perl to update field in file based of match to another file

Discussion started by: cmccabe

7. Shell Programming and Scripting

Perl to change value based on set of rules

Discussion started by: cmccabe

8. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

Discussion started by: cmccabe

9. Shell Programming and Scripting

Update a specific field in file with Variable value based on other Key Word

Discussion started by: infernalhell

10. UNIX for Beginners Questions & Answers

Problem with getting awk to multiply a field by a value set based on condition of another field

Discussion started by: cotilloe