Sponsored Content
Top Forums Shell Programming and Scripting Perl to update field based on a specific set of rules Post 303000868 by cmccabe on Friday 21st of July 2017 08:39:19 AM
Old 07-21-2017
Perl to update field based on a specific set of rules

In the perl below, which does execute, I am having trouble with the else in Rule 3. The digit in f{8} is extracted and used to update f[55] accordinly along with the value in f[13].
There can be either - * or + before the number that is extracted but the same logic applies, that is if the value is greater than 10 and f[13] is greater than 0.01 f[55] is Likely Benign, if the value is less than 10
and f[13] is less than 0.01 f[55] is VUS in. It is possible for f[13] to be . (dot) but that is the same as zero. However, currently the value does not seem to be extracted by the perl below and I
get different output then the desired. If I comment this rule out it fixes some of the lines but causes other lines to be incorrect. I am not sure what is causing the issue or how to fix it. Thank you Smilie.

I added a description to explain each line of the output and what should be happening, which currently is not Smilie.

file
Code:
R_Index    Chr    Start    End    Ref    Alt    Func.refGene    Gene.refGene    GeneDetail.refGene    Inheritence    ExonicFunc.refGene    AAChange.refGene    avsnp147    PopFreqMax    1000G_ALL    1000G_AFR    1000G_AMR    1000G_EAS    1000G_EUR    1000G_SAS    ExAC_ALL    ExAC_AFR    ExAC_AMR    ExAC_EAS    ExAC_FIN    ExAC_NFE    ExAC_OTH    ExAC_SAS    ESP6500siv2_ALL    ESP6500siv2_AA    ESP6500siv2_EA    CG46    SIFT_score    SIFT_pred    Polyphen2_HDIV_score    Polyphen2_HDIV_pred    Polyphen2_HVAR_score    Polyphen2_HVAR_pred    LRT_score    LRT_pred    MutationTaster_score    MutationTaster_pred    MutationAssessor_score    MutationAssessor_pred    dpsi_max_tissue    dpsi_zscore    CLINSIG    CLNDBN    CLNACC    CLNDSDB    CLNDSDBID    Quality    Reads    Zygosity    Score    Classification    HGMD    Sanger
47    chr16    10032451    10032451    T    C    splicing    GRIN2A    NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G    .    .    .    rs760077182    0.0004    .    .    .    .    .    .    0.0001    .    .    .    .    .    .    0.0004    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    0.7699    1.365    .    .    .    .    .    GOOD    93    het    9    .    .    .
77    chr14    76427420    76427420    C    G    splicing    TGFB3    NM_003239:exon6:c.927-1G>C    .    .    .    rs767548724    0.0001    .    .    .    .    .    .    .    .    0.0001    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    1    D    .    .    -3.9123    -2.334    .    .    .    .    .    GOOD    190    het    13    .    .    .
160    chr3    186944313    186944313    G    C    splicing    MASP1    NM_001879:exon12:c.1442-5C>G    .    .    .    rs138989954    0.0055    0.0018    .    .    .    0.005    0.0041    0.0036    0.0008    0.0032    .    0.0007    0.0043    0.0037    0.0055    0.0025    0.0005    0.0036    .    .    .    .    .    .    .    .    .    .    .    .    .    -0.8646    -1.388    .    .    .    .    .    GOOD    265    het    13    .    .    .

desired output tab-delimeted

Code:
R_Index    Chr    Start    End    Ref    Alt    Func.refGene    Gene.refGene    GeneDetail.refGene    Inheritence    ExonicFunc.refGene    AAChange.refGene    avsnp147    PopFreqMax    1000G_ALL    1000G_AFR    1000G_AMR    1000G_EAS    1000G_EUR    1000G_SAS    ExAC_ALL    ExAC_AFR    ExAC_AMR    ExAC_EAS    ExAC_FIN    ExAC_NFE    ExAC_OTH    ExAC_SAS    ESP6500siv2_ALL    ESP6500siv2_AA    ESP6500siv2_EA    CG46    SIFT_score    SIFT_pred    Polyphen2_HDIV_score    Polyphen2_HDIV_pred    Polyphen2_HVAR_score    Polyphen2_HVAR_pred    LRT_score    LRT_pred    MutationTaster_score    MutationTaster_pred    MutationAssessor_score    MutationAssessor_pred    dpsi_max_tissue    dpsi_zscore    CLINSIG    CLNDBN    CLNACC    CLNDSDB    CLNDSDBID    Quality    Reads    Zygosity    Score    Classification    HGMD    Sanger
47    chr16    10032451    10032451    T    C    splicing    GRIN2A    NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G    .    .    .    rs760077182    0.0004    .    .    .    .    .    .    0.0001    .    .    .    .    .    .    0.0004    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    0.7699    1.365    .    .    .    .    .    GOOD    93    het    9    Likely Benign    .    .
77    chr14    76427420    76427420    C    G    splicing    TGFB3    NM_003239:exon6:c.927-1G>C    .    .    .    rs767548724    0.0001    .    .    .    .    .    .    .    .    0.0001    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    1    D    .    .    -3.9123    -2.334    .    .    .    .    .    GOOD    190    het    13    VUS    .    .
160    chr3    186944313    186944313    G    C    splicing    MASP1    NM_001879:exon12:c.1442-5C>G    .    .    .    rs138989954    0.0055    0.0018    .    .    .    0.005    0.0041    0.0036    0.0008    0.0032    .    0.0007    0.0043    0.0037    0.0055    0.0025    0.0005    0.0036    .    .    .    .    .    .    .    .    .    .    .    .    .    -0.8646    -1.388    .    .    .    .    .    GOOD    265    het    13    VUS    .    .

description
Code:
line 1 is Likely Benign in f[55] because in NM_001134408:exon3:c.415-43A>G in f[8] the 43 after the minus is extracted and since that value is greater than 10 f[55] is Likely Benign
line 2 is VUS in f[55] because in NM_003239:exon6:c.927-1G>C in f[8] the 1 after the minus is extracted and since that value is less than 10 and f[13] is less than 0.01 f[55] is VUS
line 3 is VUS in f[55] because in NM_001879:exon12:c.1442-5C>G in f[8] the 1 after the minus is extracted and since that value is less than 10 and f[13] is less than 0.01 f[55] is VUS

perl
Code:
# disables certain Perl expressions that could behave unexpectedly or are difficult to debug, turning them into errors
use strict;
use warnings;

# Read and display the first line of the file passed at command line.
print scalar <>;

# Read line by line the file given at the command line, could be the stdin if no file is give as argument.
while (<>)
{  # start conditional block

# Make tokens out of the line, using the tab as separator.
    my @f = split /\t/;

    # Select 4 tokens and define @f.
    my ($FuncrefGene,$PopFreqMax,$GeneDetailrefGene,$ClinSig) = @f[6,13,8,46];

    # Rule1: Default classification VUS.
    my $classification = 'VUS';

    # map to 0 if it doesn't have numeric meaning.
    $GeneDetailrefGene = 0 if $GeneDetailrefGene eq '.';
    $PopFreqMax = 0 if $PopFreqMax eq '.';

    # Rule2: Change to Likely Benign if conditions occurs.
    if ($PopFreqMax > 0.011) {
        $classification = 'Likely Benign';
    }

     #Rule3: GeneDetail condition
    if ($FuncrefGene !~ /exonic/i && $GeneDetailrefGene=~/^\D(\d+)$/) {   # capture the digits after any non-digit into $1
                if ($1 > 10) {   # 
                      $1 //= 0;  # Give it a value of zero if no numeric value was found.
                        $classification = 'Likely Benign';  # Reclassify intronic variants (with distance only) based on distance to exon > 10 to Likely Benign
         }
     }
   else {
                 if ($FuncrefGene !~ /exonic/i) {
                    my ($transcript) = ($GeneDetailrefGene) =~ /(?:\.\d+[+*-]|\D)(\d+)/;   # Get a numeric value if exists using (.) and (+/*/-) and capture digits into $transcript.
                             $transcript //= 0;  # Give it a value of zero if no numeric value was found.
                                $classification = 'Likely Benign' if $transcript > 10; # Reclassify intronic variants (following c. nomenclature) to Likely Benign if distance greater than 10
                 }
           }

    # Rule4: ClinVar condition for ACMG
           if ($ClinSig =~ /Pathogenic|Likely pathogenic/i) {
               $classification = 'VUS';
           }

# token 55 is classification.
    $f[55] = $classification;

    # display results and update @f.
    print join "\t", @f;
}   # end conditional block

current output
Code:
R_Index    Chr    Start    End    Ref    Alt    Func.refGene    Gene.refGene    GeneDetail.refGene    Inheritence    ExonicFunc.refGene    AAChange.refGene    avsnp147    PopFreqMax    1000G_ALL    1000G_AFR    1000G_AMR    1000G_EAS    1000G_EUR    1000G_SAS    ExAC_ALL    ExAC_AFR    ExAC_AMR    ExAC_EAS    ExAC_FIN    ExAC_NFE    ExAC_OTH    ExAC_SAS    ESP6500siv2_ALL    ESP6500siv2_AA    ESP6500siv2_EA    CG46    SIFT_score    SIFT_pred    Polyphen2_HDIV_score    Polyphen2_HDIV_pred    Polyphen2_HVAR_score    Polyphen2_HVAR_pred    LRT_score    LRT_pred    MutationTaster_score    MutationTaster_pred    MutationAssessor_score    MutationAssessor_pred    dpsi_max_tissue    dpsi_zscore    CLINSIG    CLNDBN    CLNACC    CLNDSDB    CLNDSDBID    Quality    Reads    Zygosity    Score    Classification    HGMD    Sanger
47    chr16    10032451    10032451    T    C    splicing    GRIN2A    NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G    .    .    .    rs760077182    0.0004    .    .    .    .    .    .    0.0001    .    .    .    .    .    .    0.0004    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    0.7699    1.365    .    .    .    .    .    GOOD    93    het    9    Likely Benign    n    .
77    chr14    76427420    76427420    C    G    splicing    TGFB3    NM_003239:exon6:c.927-1G>C    .    .    .    rs767548724    0.0001    .    .    .    .    .    .    .    .    0.0001    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    .    1    D    .    .    -3.9123    -2.334    .    .    .    .    .    GOOD    190    het    13    Likely Benign    .    .
160    chr3    186944313    186944313    G    C    splicing    MASP1    NM_001879:exon12:c.1442-5C>G    .    .    .    rs138989954    0.0055    0.0018    .    .    .    0.005    0.0041    0.0036    0.0008    0.0032    .    0.0007    0.0043    0.0037    0.0055    0.0025    0.0005    0.0036    .    .    .    .    .    .    .    .    .    .    .    .    .    -0.8646    -1.388    .    .    .    .    .    GOOD    265    het    13    Likely Benign    .    .


Last edited by cmccabe; 07-21-2017 at 09:50 AM.. Reason: fixed format, added current output
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Update a field in a file based on condition

Hi i am new to scripting. i have a file file.dat with content as : CONTENT_STORAGE PERCENTAGE FLAG: /storage_01 64% 0 /storage_02 17% 1 I need to update the value of FLAG for a particular CONTENT_STORAGE value I have written the following code #!/bin/sh threshold=20... (1 Reply)
Discussion started by: kichu
1 Replies

2. Shell Programming and Scripting

Help with allocated text content based on specific rules...

Input file format: /tag="ABL" /note="abl homolog 2 /tag="ABLIM1" /note="actin binding LIM 1 /tag="ABP1" /note="amiloride binding protein 1 (amine oxidase (copper- containing)) /tag="ABR" /note="active BCR-related /tag="AC003042.1" /note="SDR family member 11 precursor . . . (4 Replies)
Discussion started by: perl_beginner
4 Replies

3. Shell Programming and Scripting

Update specific field in a line of text file

I have a text file like this: subject1:LecturerA:10 subject2:LecturerA:40 if I was given string in column 1 and 2 (which are subject 1 and LecturerA) , i need to update 3rd field of that line containing that given string , which is, number 10 need to be updated to 100 ,for example. The... (6 Replies)
Discussion started by: bmtoan
6 Replies

4. Shell Programming and Scripting

Add specific string to last field of each line in perl based on value

I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the... (12 Replies)
Discussion started by: cmccabe
12 Replies

5. Shell Programming and Scripting

awk to update value in field based on another field

In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=. Hopefully the awk is a start but I... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. Shell Programming and Scripting

Perl to update field in file based of match to another file

In the perl below I am trying to set/update the value of $14 (last field) in file2, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (4 Replies)
Discussion started by: cmccabe
4 Replies

7. Shell Programming and Scripting

Perl to change value based on set of rules

In the perl there is a default rule that sets f to VUS, and then a seris of rules that will change f based on the result that is obtained from the rule. The code below is a rule that is supposed to be applicable to lines 2-4 because this rule just looks at the digit in f. So in line 2 f is 27... (4 Replies)
Discussion started by: cmccabe
4 Replies

8. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

9. Shell Programming and Scripting

Update a specific field in file with Variable value based on other Key Word

I have an input file with A=xyz B=pqr I would want the value in Second Field (xyz or pqr) updated with a value present in Shell Variable based on the value passed in the first field. (A or B ) while read line do NEW_VALUE = `some functionality done on $line` If $line=First Field-... (1 Reply)
Discussion started by: infernalhell
1 Replies

10. UNIX for Beginners Questions & Answers

Problem with getting awk to multiply a field by a value set based on condition of another field

Hi, So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple... (15 Replies)
Discussion started by: cotilloe
15 Replies
All times are GMT -4. The time now is 10:31 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy