Sponsored Content
Top Forums Shell Programming and Scripting awk to update file based on 5 conditions Post 302990136 by Don Cragun on Sunday 22nd of January 2017 11:44:54 PM
Old 01-23-2017
Quote:
Originally Posted by cmccabe
If CLINSIG has a value in it of Benign, then for the particular line the Classification is Benign. So I guess if CLINSIG has a value in it then that is the Classification and none of the conditions are applied.

My actual dataset does have some fields with null values in it. but I filled in those with ., that is what I mean by there are no empty fields.

I am not sure what you mean by already, you are correct that only Classification is assigned a value.

The first condition does unconditionally set Classification to VUS. That seems to be working.

Condition 2 sets the Classification to the value of CLINSIG unless CLINSIG has multiple values in it like in line 3 of file (if there are multiple values in it they are separated by a |. If this is the case then Classification is Conflicting.

Condition 3 is only run on the lines where Func.IDP.refGene = UTR then the value of Classification is Likely Benign,
if CLINSIG is set to VUS, if it is any other value it is not changed.

Condition 4 is if the PopFreqMax > .01 then the Classification is Likely Benign unless there is a value other
then VUS there. If there is that means CLINSIG had a value already.

Condition 5 is only run if Func.IDP.refGene = spicing AND GeneDetail.IDP.refGene has +/- symbol in it AND the number
after it is > 10 then the value of Classification is Likely Benign, unless there is a value other
then VUS there. If there is that means CLINSIG had a value already.

Thank you Smilie.
There are so many conflicting requirements here that I am completely confused. From two of your paragraphs above:
Quote:
If CLINSIG has a value in it of Benign, then for the particular line the Classification is Benign. So I guess if CLINSIG has a value in it then that is the Classification and none of the conditions are applied.

Condition 2 sets the Classification to the value of CLINSIG unless CLINSIG has multiple values in it like in line 3 of file (if there are multiple values in it they are separated by a |. If this is the case then Classification is Conflicting.
The 2nd paragraph quoted above says that if a line in the input file contains an | character in the CLINSIG in a line, the Classification field in the output for that line is to be set to Conflicting. Otherwise, the Classification field in the output for that line is to be set to the string that was in the CLINSIG field in that input line (even if that string was an empty string or just contains a period character (.). And, according to the last sentence in the 1st paragraph above, once this has been done Conditions 1, 3, 4, and 5 are always to be ignored. I'm sure that isn't what you mean, but it is what you have repeatedly stated.

You talked about empty fields above, but not in earlier posts. Whether or not a field is empty, it has a value. And, you have stated that if a field's value contains less than 12 characters (which certainly includes 0 if that field's value is an empty string or 1 if that field's value is a period), then that value becomes the Classification field's value.

Since I can't make any sense out of your stated requirements, let me see if I can restate your requirements in a way that makes sense to me, hoping that I capture what you intended to say.

Requirements:
Perform the following tests in sequence until the stated condition for a test evaluates to TRUE. For the 1st test whose condition evaluates to TRUE, set the value of the Classification field in that output line to the corresponding stated value:
  1. If the value of the CLINSIG field contains a | character, set the Classification to Conflicted.
  2. If the value of the CLINSIG field is not an empty string, is not the string ., and is not the string VUS; set the Classification field to the value of the CLINSIG field.
  3. If the value of the CLINSIG field is the string VUS and the value of the Func.IDP.refGene field is the string UTR, set the Classification field to the string Likely Benign.
  4. If the value of the CLINSIG field is the string VUS and the value of the PopFreqMax field is greater than .01, set the Classification field to the string Likely Benign.
  5. If the value of the CLINSIG field is the string VUS, the value of the Func.IDP.refGene is the string splicing or the string spicing, and the absolute value of the GeneDetail.IDP.refGene field is greater than 10; set the Classification field to the string Likely Benign.
  6. If none of the above tests succeeded, set the Classification field to the string VUS.
Would code written to meet the above restatement of your requirements do what you want.
This User Gave Thanks to Don Cragun For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

validating a file based on conditions

i have a file in unix in which the records are like this aaa 123 233 aaa 234 222 aaa 242 222 bbb 122 111 bbb 122 123 ccc 124 222 In the output i want only the below records aaa ccc The validation logic is 1st column and 2nd column need to be considered if both columns values are... (8 Replies)
Discussion started by: trichyselva
8 Replies

2. Shell Programming and Scripting

using awk to count no of records based on conditions

Hi I am having files with date and time stamp as the folder names like 200906051400,200906051500,200906051600 .....hence everyday 24 files will be generated i need to do certain things on this 24 files daily file contains the data like 200906050016370 0 1244141195225298lessrv3 ... (13 Replies)
Discussion started by: aemunathan
13 Replies

3. Shell Programming and Scripting

awk merging files based on 2 complex conditions

1. if the 1st row IDs of input1 (ID1/ID2.....) is equal to any IDNames of input2 print all relevant values together as defined in the output. 2. A bit tricky part is IDno in the output. All we need to do is numbering same kind of letters as 1 (aa of ID1) and different letters as 2 (ab... (4 Replies)
Discussion started by: ruby_sgp
4 Replies

4. Shell Programming and Scripting

Split File based on different conditions

I need to split the file Conditions: Ignore any record that either starts with 1 or 9 Split the file at position 404 , if position 404 is abc or def then write all the records in a file > File 1 , the remaining records should go in to a file > File 2 Further I want to split the... (7 Replies)
Discussion started by: protech
7 Replies

5. Shell Programming and Scripting

awk to update field file based on match

If $1 in file1 matches $2 in file2. Then the value in $2 of file2 is updated to $1"."$2 of file2. The awk seems to only match the two files but not update. Thank you :). awk awk 'NR==FNR{A ; next} $1 in A { $2 = a }1' file1 file2 file1 name version NM_000593 5 NM_001257406... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

awk to update field in file based of match in another

I am trying to use awk to match two files that are tab-delimited. When a match is found between file1 $1 and file2 $4, $4 in file2 is updated using the $2 value in file1. If no match is found then the next line is processed. Thank you :). file1 uc001bwr.3 ADC uc001bws.3 ADC... (4 Replies)
Discussion started by: cmccabe
4 Replies

7. Shell Programming and Scripting

awk to filter file based on seperate conditions

The below awk will filter a list of 30,000 lines in the tab-delimited file. What I am having trouble with is adding a condition to SVTYPE=CNV that will only print that line if CI= must be >.05 . The other condition to add is if SVTYPE=Fusion, then in order to print that line READ_COUNT must... (3 Replies)
Discussion started by: cmccabe
3 Replies

8. Shell Programming and Scripting

awk to update value based on pattern match in another file

In the awk, thanks you @RavinderSingh13, for the help in below, hopefully it is close as I am trying to update the value in $12 of the tab-delimeted file2 with the matching value in $1 of the space delimeted file1. I have added comments for each line as well. Thank you :). awk awk '$12 ==... (10 Replies)
Discussion started by: cmccabe
10 Replies

9. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

10. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies
All times are GMT -4. The time now is 03:14 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy