Sponsored Content
Top Forums Shell Programming and Scripting awk to update file based on 5 conditions Post 302990088 by cmccabe on Saturday 21st of January 2017 02:26:02 PM
Old 01-21-2017
awk to update file based on 5 conditions

I am trying to use awk to update the below tab-delimited file based on 5 different rules/conditions. The final output is also
tab-delimited and each line in the file will meet one of the conditions. My attemp is below as well though I am not very confident in it. Thank you Smilie.

Condition 1: The field Classification has a default value of "VUS" for all lines in file

Condition 2: The CLINSIG field updates Classification with the value in it if it hasa lenghth < 12, else it isConflicting is the result
- since it is possible for this field to have multiple strings in it I used the greatest single value "Likely Benign" and if the value in the field exceeds 12 characters
then "Conflicting" is the result, the multiple values are also separated by | symbol

Condition 3: If the Func.IDP.refGene = UTR then the value of Classification is Likely Benign,
unleess CLINSIG had a value already

Condition 4: If the PopFreqMax > .01 then If the Classification is Likely Benign else it is VUS,
unleess CLINSIG had a value already

Condition 5: If Func.IDP.refGene = spicing AND GeneDetail.IDP.refGene has +/- > 10
then the value of Classification is Likely Benign, unleess CLINSIG had a value already

Thank you Smilie.

file
Code:
R_Index    Chr    Start    End    Ref    Alt    Func.IDP.refGene    GeneDetail.IDP.refGene    AAChange.IDP.refGene    PopFreqMax    CLINSIG    CLNDBN    Classification    Quality
1    chr1    40562993    40562993    T    C    UTR5    NM_000310.3:c.-83A>G    .    0.9    .    .    .    15
2    chr5    125887685    125887685    C    T    splicing    NM_001201377.1:exon14:c.1233+28G>A    .    0.82    .    .    .    10
3    chr16    2105400    2105400    C    T    splicing    NM_000548.4:exon6:c.482-3C>T    .    0.21    not provided|not provided|not provided|not provided|other|Benign    TSC    .    25
4    chr16    2110805    2110805    G    A    exonic    .    TSC2:NM_000548.4:exon11:c.1110G>A:p.Q370Q    .004    Pathogenic    TSC    .    40

Descri[tion of fields
Code:
awk 'NR==1{for(i=1;i<=NF;i++){print "Number of field in terms of NF is--> NF-" NF-i", value is-->" $i}}' file
Number of field in terms of NF is--> NF-13, value is-->R_Index
Number of field in terms of NF is--> NF-12, value is-->Chr
Number of field in terms of NF is--> NF-11, value is-->Start
Number of field in terms of NF is--> NF-10, value is-->End
Number of field in terms of NF is--> NF-9, value is-->Ref
Number of field in terms of NF is--> NF-8, value is-->Alt
Number of field in terms of NF is--> NF-7, value is-->Func.IDP.refGene
Number of field in terms of NF is--> NF-6, value is-->GeneDetail.IDP.refGene
Number of field in terms of NF is--> NF-5, value is-->AAChange.IDP.refGene
Number of field in terms of NF is--> NF-4, value is-->PopFreqMax
Number of field in terms of NF is--> NF-3, value is-->CLINSIG
Number of field in terms of NF is--> NF-2, value is-->CLNDBN
Number of field in terms of NF is--> NF-1, value is-->Classification
Number of field in terms of NF is--> NF-0, value is-->Quality

Code:
# default classification to "VUS" 
awk -F'\t' -v OFS='\t' 'NR>1{$(NF-1)="VUS"} 1' file > vus

# check clinvar
awk -F'\t' -v OFS='\t' '{if ($(NF-3=length(<12)=$NF-3) else "Conflicting" 1' vus > clinvar

# UTR check
awk -F'\t' -v OFS='\t' '{if ($(NF-7="UTR")="Likely Benign") else $NF-3} 1' clinvar > utr

# check PopFreq
awk -F'\t' -v OFS='\t' '{if ($(NF-4 > .01)($(NF-1}="Likely Benign")} 1' utr > popfreq

# splicing check
awk -F'\t' -v OFS='\t' '{if ($(NF-7="splicing") AND ($(NF-6)=+/1) else $NF-3} 1' popfreq > final

desired output
Code:
R_Index    Chr    Start    End    Ref    Alt    Func.IDP.refGene    GeneDetail.IDP.refGene    AAChange.IDP.refGene    PopFreqMax    CLINSIG    CLNDBN    Classification    Quality
1    chr1    40562993    40562993    T    C    UTR5    NM_000310.3:c.-83A>G    .    0.9    .    .    Likely Benign    15
2    chr5    125887685    125887685    C    T    splicing    NM_001201377.1:exon14:c.1233+28G>A    .    0.82    .    .    Likely Benign    10
3    chr16    2105400    2105400    C    T    splicing    NM_000548.4:exon6:c.482-3C>T    .    0.21    not provided|not provided|not provided|not provided|other|Benign    TSC    Conflicting    25
4    chr16    2110805    2110805    G    A    exonic    .    TSC2:NM_000548.4:exon11:c.1110G>A:p.Q370Q    .004    Pathogenic    TSC    Pathogenic    40


Last edited by cmccabe; 01-21-2017 at 03:28 PM.. Reason: fixed format and added details
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

validating a file based on conditions

i have a file in unix in which the records are like this aaa 123 233 aaa 234 222 aaa 242 222 bbb 122 111 bbb 122 123 ccc 124 222 In the output i want only the below records aaa ccc The validation logic is 1st column and 2nd column need to be considered if both columns values are... (8 Replies)
Discussion started by: trichyselva
8 Replies

2. Shell Programming and Scripting

using awk to count no of records based on conditions

Hi I am having files with date and time stamp as the folder names like 200906051400,200906051500,200906051600 .....hence everyday 24 files will be generated i need to do certain things on this 24 files daily file contains the data like 200906050016370 0 1244141195225298lessrv3 ... (13 Replies)
Discussion started by: aemunathan
13 Replies

3. Shell Programming and Scripting

awk merging files based on 2 complex conditions

1. if the 1st row IDs of input1 (ID1/ID2.....) is equal to any IDNames of input2 print all relevant values together as defined in the output. 2. A bit tricky part is IDno in the output. All we need to do is numbering same kind of letters as 1 (aa of ID1) and different letters as 2 (ab... (4 Replies)
Discussion started by: ruby_sgp
4 Replies

4. Shell Programming and Scripting

Split File based on different conditions

I need to split the file Conditions: Ignore any record that either starts with 1 or 9 Split the file at position 404 , if position 404 is abc or def then write all the records in a file > File 1 , the remaining records should go in to a file > File 2 Further I want to split the... (7 Replies)
Discussion started by: protech
7 Replies

5. Shell Programming and Scripting

awk to update field file based on match

If $1 in file1 matches $2 in file2. Then the value in $2 of file2 is updated to $1"."$2 of file2. The awk seems to only match the two files but not update. Thank you :). awk awk 'NR==FNR{A ; next} $1 in A { $2 = a }1' file1 file2 file1 name version NM_000593 5 NM_001257406... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

awk to update field in file based of match in another

I am trying to use awk to match two files that are tab-delimited. When a match is found between file1 $1 and file2 $4, $4 in file2 is updated using the $2 value in file1. If no match is found then the next line is processed. Thank you :). file1 uc001bwr.3 ADC uc001bws.3 ADC... (4 Replies)
Discussion started by: cmccabe
4 Replies

7. Shell Programming and Scripting

awk to filter file based on seperate conditions

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV that will only print that line if CI= must be >.05 . The other condition to add is if SVTYPE=Fusion, then in order to print that line READ_COUNT must... (3 Replies)
Discussion started by: cmccabe
3 Replies

8. Shell Programming and Scripting

awk to update value based on pattern match in another file

In the awk, thanks you @RavinderSingh13, for the help in below, hopefully it is close as I am trying to update the value in $12 of the tab-delimeted file2 with the matching value in $1 of the space delimeted file1. I have added comments for each line as well. Thank you :). awk awk '$12 ==... (10 Replies)
Discussion started by: cmccabe
10 Replies

9. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

10. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies
All times are GMT -4. The time now is 09:51 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy