Using awk to parse multiple conditions


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Using awk to parse multiple conditions
# 1  
Old 03-14-2015
Using awk to parse multiple conditions

There are 4 ways the user can input data and unfortunately the parse rules for each are slightly different. The first condition works great and the input file is attached for the second condition. Conditions 3 and 4 will follow I'm sure I will have trouble with them and need help as well. The code below parses condition 1 perfectly:

I apologize for the long post but just wanted to provide all the details. Thank you Smilie.

Code:
 awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" ${id}_position.txt > ${id}_parse.txt

Code:
 
1. c.79G>A
parse rules:
1 four zeros after the NC_  (not always the case) and the digits before the .
2 g. ###   g.###
3 letter before the >
4 letter after the >
Desired Output:  13     20763642     20763642     C     T

2. c.35delG
1 four zeros after the NC_  (not always the case) and the digits before the .
2 g. ###   g.###
3 letter before the del
4 "-" after the del
Desired Output:  13     20763686     20763686     C     -

3. c.575_576delCA 
4. .34_35delGGinsT

# 2  
Old 03-14-2015
Where does the C T or C - come from?
# 3  
Old 03-14-2015
The C T and comes from the ${id}_position.txt which is parsed by the awk in the post.

The C - also comes from the ${id}_position.txt which is parsed, however the del in the field being parsed: NC_000013.10:g.20763686delC is how the C then - (leterr after the del goes first, and a "-" is used in the second position). The attached file has this in it as well. Thank you Smilie.
# 4  
Old 03-16-2015
I don't know if something like the below would work. Also, how does awk know which parser to use?

Code:
 awk 'NR==2 {split($2,a,"[_.del]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" ${id}_position.txt > ${id}_parse.txt

There are two conditions the first has a > and the second has a del in in it:

Code:
 awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" ${id}_position.txt > ${id}_parse.txt

will parse the >, but not the del. So do I need some identifier for the correct parser to be used? Thank you Smilie.

Maybe:

Code:
 echo '>' | awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" ${id}_position.txt > ${id}_parse.txt

echo 'del' | awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" ${id}_position.txt > ${id}_parse.txt


Last edited by cmccabe; 03-16-2015 at 03:25 PM..
# 5  
Old 03-16-2015
I tried the below code on the file attached.

Code:
 echo 'del' | awk 'NR==2 {split($2,a,"[_.del]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[-]}' OFS="\t" del_position.txt >del_parse.txt

awk: cmd. line:1: NR==2 {split($2,a,"[_.del]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[-]}
awk: cmd. line:1:                                                                                                            ^ syntax error
awk: cmd. line:1: error: invalid subscript expression

Desired Output:
13 20763686 20763686 C -



Thank you Smilie.
# 6  
Old 03-16-2015
couple of questions:
  1. what do you think this will do? split($2,a,"[_.del]")
  2. what do you think this will do? length(a[4]-1)
  3. what's the meaning of this? a[-]

Last edited by vgersh99; 03-16-2015 at 05:08 PM..
# 7  
Old 03-16-2015
Here is what I am trying to do and my attempt. Thanks you Smilie.

If ">" in field then use first code, but if "del" in the field then use second code.
Example:
NC_000013.10:g.20763642C>T - uses code 1
NC_000013.10:g.20763686delC - uses code 2

1. split($2,a,"[_.del]") - split on the _ . del
2. length(a[4]-1) - capture all field 3 digits
3. a[-] - typo -[5] - put a "-" in field

Code:
 echo '>' | awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),a[5]}' OFS="\t" ${id}_position.txt > ${id}_parse.txt   # SNP

echo 'del' | awk 'NR==2 {split($2,a,"[_.>]");b=substr(a[4],1,length(a[4]-1));print a[2]+0,b,b,substr(a[4],length(a[4])),-[5]}' OFS="\t" ${id}_position.txt >${id}_parse.txt   # Deletion

Thank you Smilie.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Multiple If conditions

I am analyzing one of the scripts written by another person.script is having multiple if conditions and everything are nested.The code is not formatted properly.Is there any way to identify in Unix to identify begin and end of a particular if block? (6 Replies)
Discussion started by: vamsi.valiveti
6 Replies

2. Shell Programming and Scripting

awk to change value of field using multiple conditions

In the below awk in the first step I default Classification NF-1 to VUS. Next, I am trying to change the value of Classification (NF) to whatever CLINSIG (NF-1) is. If there is only one condition everything works great, but if there are two conditions it does not work. Is the syntax used... (4 Replies)
Discussion started by: cmccabe
4 Replies

3. Shell Programming and Scripting

Multiple conditions in IF

Fellas, Am new to unix os/ and here the situation , I am trying to write multiple condition statement inside if but it throws me a error here is my piece of code , if ] && ] && ] then commands fi error : line 15 : ` can someone please advise me how to fix it Please use... (7 Replies)
Discussion started by: xeccc5z
7 Replies

4. Shell Programming and Scripting

awk to parse multiple lines

What is the correct syntax to have the awk parse the next line as well? The next in bold is where I think it should go, but I wanted to ask the experts since I am a beginner. The file to be parsed is attached as well. Thank you :). awk 'NR==2 {split($2,a,"");b=substr(a,1,length(a-1));print... (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk Parse And Create Multiple Files Based on Field Value

Hello: I am working parsing a large input file which will be broken down into multiples based on the second field in the file, in this case: STORE. The idea is to create each file with the corresponding store number, for example: Report_$STORENUM_$DATETIMESTAMP , and obtaining the... (7 Replies)
Discussion started by: ec012
7 Replies

6. Shell Programming and Scripting

awk multiple search and if conditions

Hi I wanted to search for 2 patterns. These patterns are matched only if the if condition is matched for example: This is the kind of command that I have in mind which is obviously not correct: awk '/abc/ if ($1>10) {print);/xyz/ if ($2>5) {print)' myfile myfile: 12 14 3 20 45 abc 21 ... (7 Replies)
Discussion started by: zorrox
7 Replies

7. Shell Programming and Scripting

awk : Filter a set of data to parse header line and last field of multiple same match.

Hi Experts, I have a data with multiple entry , I want to filter PKG= & the last column "00060110" or "00088150" in the output file: ############################################################################################### PKG= P8SDB :: VGS = vgP8SOra vgP8SDB1 vgP8S001... (5 Replies)
Discussion started by: rveri
5 Replies

8. UNIX for Dummies Questions & Answers

If + multiple conditions

Hello Unix-Forums! It has been a long time since my last post, but finally I've got a new question: I know in case you can use multiple patterns by case $var in a|b|c|ab) and so on. But how would I place an OR between if ] then ... if ] then ... I want to execute the "..." if... (3 Replies)
Discussion started by: intelinside
3 Replies

9. Shell Programming and Scripting

specifying multiple conditions in AWK

how can i specify more than 1 consition in the following AWK statament?? i.e. if $2 is ABCD and $3 is MNOP and $4 is KLPM similarly for OR #!/bin/ksh awk -F '' ' $2 == "ABCD" { print $2, $3;}' file.xml (2 Replies)
Discussion started by: skyineyes
2 Replies

10. Shell Programming and Scripting

Help regarding multiple conditions

Hi All, I am new to shell scripting. Can any one say what is wrong in this if statement, that uses multiple conditions if then *************** else if ( -z $pcs && "$night_time_calc" > "$night_time" ) then ******************************** ... (4 Replies)
Discussion started by: ssenthilkumar
4 Replies
Login or Register to Ask a Question