awk to update value in field of out file using contents of another Ask


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to update value in field of out file using contents of another Ask
# 1  
Old 04-04-2017
awk to update value in field of out file using contents of another Ask

In the out.txt below I am trying to use awk to update the contents of $9.. If $9 contains a + or - then $8 of out.txt is used as a key to lookup in $2 of file. When a match ( there will always be one) is found the $3 value of that file is used to update $9 of out.txt separated by a :. So the original +6 value in out.txtwould be +6:NM_005101.3. The awk below is close but has syntax errors that I can not seem to fix. Thank you Smilie.

out tab-delimited
Code:
R_Index	Chr	Start	End	Ref	Alt	Func.IDP.refGene	Gene.IDP.refGene	GeneDetail.IDP.refGene	Inheritence	ExonicFunc.IDP.refGene	AAChange.IDP.refGene
1	chr1	948846	948846	-	A	upstream	ISG15	-0	.	.	.
2	chr1	948870	948870	C	G	UTR5	ISG15	NM_005101.3:c.-84C>G	.	.
3	chr1	949608	949608	G	A	exonic	ISG15	.	.	nonsynonymous SNV	ISG15:NM_005101.3:exon2:c.248G>A:p.S83N
4	chr1	949925	949925	C	T	downstream	ISG15	+6	.	.	.
5	chr1	207646923	207646923	G	A	intronic	CR2	>50	.	.	.
6	chr2	3653844	3653844	T	C	intronic	COLEC11	>50	.	.	.
7	chr1	154562623	154562625	CCG	-	intronic	ADAR	>50	.	.	.
8	chr1	948840	948840	-	C	upstream	ISG15	-6	.	.	.

file space-delimited
Code:
2 ISG15 NM_005101.3 948846-948956 949363-949919

awk
Code:
awk 'if($9 == "-" || $9 == "+" {printf ":"} FNR==NR{a[$2]=$3; next} a[$9]{$3=a[$8]}1' OFS'\t' out file > result

awk: cmd. line:1: if($9 == "-" || $9 == "+" {printf ":"} FNR==NR{a[$2]=$3; next} a[$8]{$3=a[$8]}1
awk: cmd. line:1: ^ syntax error

Description:
Code:
1. if $9 in out has a + or - in it

2. using $2 of file store the value of $3 as key a

3. match each $8 value in out to the key a and update $9 in out with $3 of file separated by a :

4. if $9 of file does not have a + or - in them, they are skipped

desired out tab-delimited
Code:
R_Index Chr Start   End Ref Alt Func.IDP.refGene    Gene.IDP.refGene    GeneDetail.IDP.refGene  Inheritence ExonicFunc.IDP.refGene  AAChange.IDP.refGene
1   chr1    948846  948846  -   A   upstream    ISG15   -0:NM_005101.3  .   .   .
2   chr1    948870  948870  C   G   UTR5    ISG15   NM_005101.3:c.-84C>G    .   .
4   chr1    949925  949925  C   T   downstream  ISG15   +6:NM_005101.3  .   .   .
5   chr1    207646923   207646923   G   A   intronic    CR2 >50 .   .   .
8   chr1    948840  948840  -   C   upstream    ISG15   -6:NM_005101.3  .   .   .

lines 1, 3, 5 $9 updated with : and value of $3 in file
line 2 and 4 are skipped as these do not have a + or - in them
# 2  
Old 04-04-2017
Code:
if($9 == "-" || $9 == "+" {printf ":"}
# should be:
($9 == "-" || $9 == "+") {printf ":"}

This User Gave Thanks to jim mcnamara For This Post:
# 3  
Old 04-04-2017
Looks like I did something wrong in the fields to update, but that fixed the syntax error. Thank you Smilie.
# 4  
Old 04-04-2017
Hello cmccabe,

Apart from what Jim has stated, we could also put if condition in {if........} braces too.
Code:
{if($9 == "-" || $9 == "+"){printf ":"}}

Also set OFS="\t" in your code.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 04-04-2017
Seems closer and executes but the output is empty:

awk
Code:
awk -F'\t' '$9 ~ /-/ || $9 ~ /+/ {print $9":"}' out | awk 'FNR==NR {a[$2]=$3; next} a[$8]{$9=a[$8]}1' OFS="\t" file

If I run each command seprate I seem to get the output I need. Thank you Smilie.
# 6  
Old 04-04-2017
Quote:
Originally Posted by cmccabe
Seems closer and executes but the output is empty:
awk
Code:
awk -F'\t' '$9 ~ /-/ || $9 ~ /+/ {print $9":"}' out | awk 'FNR==NR {a[$2]=$3; next} a[$8]{$9=a[$8]}1' OFS="\t" file

If I run each command seprate I seem to get the output I need. Thank you Smilie.
Hello cmccabe,

Still your question and expected are not on same page, how come line numbers 3 and 5 are updated? As they don't have -ve or +ve digits in them? If that was a typo then could you please try following once.
Code:
 awk 'FNR==NR{A[$2]=$3;next} ($9 ~ /-[0-9]+$|+[0-9]+$/){Q=$8;for(Q in A){$9=$9":"A[Q]};print}'   Input_file out_file

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 7  
Old 04-05-2017
The value to update is only a - or +, I apologize for any typo.

Here is how I read the awk, which is much closer than mine Smilie. Thank you very much Smilie.

Code:
awk           # Invoke awk
FNR==NR # For each line in the 1st input file (file)...
{A[$2]=$3  # Assign each name in field 2 = to the value in field 3 them to array A
;next}    # Process next line and end block
($9 ~ /-[0-9]+$|+[0-9]+$/)   #  Check if field 9 in out has a -  or = in it
{Q=$8;   # If it does read the contents of the matching field 8 into array Q
or(Q in A){$9=$9 # For each matching array Q in array A
":"A[Q]};print    # print the updated line with the updated NM_ seperated with a :
OFS="\t"  # Add a tab to the output
' file out > new  # Define input and output

Code:
awk 'FNR==NR{A[$2]=$3;next} ($9 ~ /-[0-9]+$|+[0-9]+$/){Q=$8;for(Q in A){$9=$9":"A[Q]};print}' OFS="\t" file out > new

current new (just the 3 updated lines with multiple NM_)
Code:
1	chr1	948846	948846	-	A	upstream	ISG15	-0:NM_005101.3:NM_024027.4:NM_001111.4:NM_001006658.2	.	.	.
4	chr1	949925	949925	C	T	downstream	ISG15	+6:NM_005101.3:NM_024027.4:NM_001111.4:NM_001006658.2	.	.	.
8	chr1	948840	948840	-	C	upstream	ISG15	-6:NM_005101.3:NM_024027.4:NM_001111.4:NM_001006658.2	.	.	.

desired new (all lines printed but only the 3 from above are updated with an :NM_
Code:
R_Index Chr Start   End Ref Alt Func.IDP.refGene    Gene.IDP.refGene    GeneDetail.IDP.refGene  Inheritence ExonicFunc.IDP.refGene  AAChange.IDP.refGene
1   chr1    948846  948846  -   A   upstream    ISG15   -0:NM_005101.3  .   .   .
2   chr1    948870  948870  C   G   UTR5    ISG15   NM_005101.3:c.-84C>G    .   .
4   chr1    949925  949925  C   T   downstream  ISG15   +6:NM_005101.3  .   .   .
5   chr1    207646923   207646923   G   A   intronic    CR2 >50 .   .   .
8   chr1    948840  948840  -   C   upstream    ISG15   -6:NM_005101.3  .   .   .

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to change contents of field based on condition in same file

In the awk below I am trying to copy the entire contents of $6 there may be multiple values seperated by a ;, to $8, if $8 is . (lines 1 and 3 are examples). If that condition $8 is not . (line2 is an example) then that line is skipped and printed as is. The awk does execute but prints the output... (3 Replies)
Discussion started by: cmccabe
3 Replies

2. Shell Programming and Scripting

awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in bold, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (2 Replies)
Discussion started by: cmccabe
2 Replies

3. Shell Programming and Scripting

awk to update field in file2 if not the same as file1

Trying to use awk to: update $2 in file2 with the $2 value in file1, if $1 in file1 matches $13 in file2, which is tab-delimeted. The $2values may already be the same so in that case nothing happens and the next line is processed. There are exactly 4,605 unique $13 values. Thank you :). ... (4 Replies)
Discussion started by: cmccabe
4 Replies

4. Shell Programming and Scripting

awk to update value in field based on another field

In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=. Hopefully the awk is a start but I... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

awk to update field in file based of match in another

I am trying to use awk to match two files that are tab-delimited. When a match is found between file1 $1 and file2 $4, $4 in file2 is updated using the $2 value in file1. If no match is found then the next line is processed. Thank you :). file1 uc001bwr.3 ADC uc001bws.3 ADC... (4 Replies)
Discussion started by: cmccabe
4 Replies

6. Shell Programming and Scripting

awk match to update contents of file

I am trying to match $1 in file1 with $2 in file2. If a match is found then $3 and $4 of file2 are copied to file1. Both files are tab-delimeted and I am getting a syntax error and would also like to update file1 in-place without creating a new file, but am not sure how. Thank you :). file1 ... (19 Replies)
Discussion started by: cmccabe
19 Replies

7. Shell Programming and Scripting

awk to update field file based on match

If $1 in file1 matches $2 in file2. Then the value in $2 of file2 is updated to $1"."$2 of file2. The awk seems to only match the two files but not update. Thank you :). awk awk 'NR==FNR{A ; next} $1 in A { $2 = a }1' file1 file2 file1 name version NM_000593 5 NM_001257406... (3 Replies)
Discussion started by: cmccabe
3 Replies

8. Shell Programming and Scripting

Update a field using awk and keep the formatting.

Look at this simple example. echo " 2 4 6" | awk '{$2+=3;$3-=1}1' 2 7 5 Is there a simple way to update a field and at the same time keep the formatting? I would like to get it like this 2 7 5 I have tested both sub and gsub, it reformat too. (2 Replies)
Discussion started by: Jotne
2 Replies

9. Shell Programming and Scripting

Substituting field contents using AWK

Hello, wondering if anybody may be help me. This is the output of a file, from which I need to display a number of fields regarding which users are using licences for two applications we run. Lines which end with "(linger: 1800)" denote licence use for one application and lines which don't contain... (8 Replies)
Discussion started by: Glyn_Mo
8 Replies

10. Shell Programming and Scripting

How to update the contents in a file conditionally?

Hi All, I have a data file which has two columns Location and the Count. The file looks like this India 1 US 0 UK 2 China 0 What I have to do is whenever I fails to login to Oracle then I have to add 1 to the count for that location. Whenever my script fails to login to Oracle for a... (5 Replies)
Discussion started by: rajus19
5 Replies
Login or Register to Ask a Question