Sponsored Content
Top Forums Shell Programming and Scripting awk to add text to matching pattern in field Post 303011792 by cmccabe on Tuesday 23rd of January 2018 06:34:34 PM
Old 01-23-2018
awk to add text to matching pattern in field

In the awk I am trying to add :p.=? to the end of each $9 that matches the pattern NM_. The below executes andis close but I can not seem to figure out why the :p.=? repeats in the split as in the green in the current output. I have added comments as well. Thank you Smilie.

file
Code:
R_Index	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	Inheritance	ExonicFunc.refGene	AAChange.refGene
1	chr1	155870416	155870416	G	A	splicing	RIT1	NM_001256821:exon6:c.481-7C>T;NM_001256820:exon5:c.322-7C>T;NM_006912:exon6:c.430-7C>T
9	chr10	112760138	112760138	A	-	splicing	SHOC2	NM_007373:exon4:c.842-35A>-;NM_001269039:exon2:c.704-35A>-
11	chr18	53070914	53070914	G	A	exonic	TCF4	.	AD	nonsynonymous SNV	TCF4:NM_001243232:exon1:c.32C>T:p.A11V;TCF4:NM_001306208:exon1:c.32C>T:p.A11V


awk
Code:
awk '
  BEGIN { FS=OFS="\t" }  # define FS and OFS as tab and start processing
  $9 ~ /NM/ {            # look for pattern NM in $9
       # split $9 by ";" and cycle through them
          out=""   # array out is empty
      i=split($9,NM,/;/)
         for (n=1; n<=i; n++) {
          sub(/$/, ":p=", NM[i])   # add :p. to end off each NM[i] before the ;
          out = (out=="" ? "" : out";") NM[i]  # add ? to each NM[i] and store in array out
         }
      $9 = out  # update with array out
}1' file

desired output
Code:
R_Index	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	Inheritance	ExonicFunc.refGene	AAChange.refGene
1	chr1	155870416	155870416	G	A	splicing	RIT1	NM_001256821:exon6:c.481-7C>T:p=?;NM_001256820:exon5:c.322-7C>T:p=?;NM_006912:exon6:c.430-7C>T:p=?
9	chr10	112760138	112760138	A	-	splicing	SHOC2	NM_007373:exon4:c.842-35A>-:p=?;NM_001269039:exon2:c.704-35A>-:p=?
11	chr18	53070914	53070914	G	A	exonic	TCF4	.	AD	nonsynonymous SNV	TCF4:NM_001243232:exon1:c.32C>T:p.A11V;TCF4:NM_001306208:exon1:c.32C>T:p.A11V

current output
Code:
R_Index	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	Inheritance	ExonicFunc.refGene	AAChange.refGene
1	chr1	155870416	155870416	G	A	splicing	RIT1	NM_006912:exon6:c.430-7C>T:p=?;NM_006912:exon6:c.430-7C>T:p=?:p=?;NM_006912:exon6:c.430-7C>T:p=?:p=?:p=?
9	chr10	112760138	112760138	A	-	splicing	SHOC2	NM_001269039:exon2:c.704-35A>-:p=?;NM_001269039:exon2:c.704-35A>-:p=?:p=?
11	chr18	53070914	53070914	G	A	exonic	TCF4	.AD	nonsynonymous SNV	TCF4:NM_001243232:exon1:c.32C>T:p.A11V;TCF4:NM_001306208:exon1:c.32C>T:p.A11V

 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk or sed to add field in a text file

Hi there, I have a csv file with some columns comma sepated like this : 4502-17,PETER,ITA2,LEGUE,92,ME - HALF,23/05/10 15:00 4502-18,CARL,ITA2,LEGUE,96,ME - HALF,20/01/09 14:00 4502-19,OTTO,ITA2,LEGUE,97,ME - MARY,23/05/10 15:00 As you can see the column n. 7 is a timestamp column, I need... (23 Replies)
Discussion started by: capnino
23 Replies

2. Shell Programming and Scripting

awk, comma as field separator and text inside double quotes as a field.

Hi, all I need to get fields in a line that are separated by commas, some of the fields are enclosed with double quotes, and they are supposed to be treated as a single field even if there are commas inside the quotes. sample input: for this line, 5 fields are supposed to be extracted, they... (8 Replies)
Discussion started by: kevintse
8 Replies

3. Shell Programming and Scripting

AWK : Add Fields of lines with matching field

Dear All, I would like to add values of a field, if the lines match in a certain field. Then I would like to divide the sum though the number of lines that have a matched field. This is the Input: Input: Test1 5 Test1 10 Test2 2 Test2 5 Test2 13 Test3 4 Output: Test1 7.5 Test1 7.5... (6 Replies)
Discussion started by: DerSeb
6 Replies

4. Shell Programming and Scripting

AWK: Pattern match between 2 files, then compare a field in file1 as > or < field in file2

First, thanks for the help in previous posts... couldn't have gotten where I am now without it! So here is what I have, I use AWK to match $1 and $2 as 1 string in file1 to $1 and $2 as 1 string in file2. Now I'm wondering if I can extend this AWK command to incorporate the following: If $1... (4 Replies)
Discussion started by: right_coaster
4 Replies

5. Shell Programming and Scripting

Pattern Matching and text deletion using VI

Can someone please assist me, I'm trying to get vi to remove all the occurences of the text in a file i.e. "DEVICE=/dev/mt??". The "??" represents a number variable. Is there a globel search and delete command that I can use? Thank You in Advance. (3 Replies)
Discussion started by: roadrunner
3 Replies

6. Shell Programming and Scripting

awk to parse field and include the text of 1 pipe in field 4

I am trying to parse the input in awk to include the |gc= in $4 but am not able to. The below is close: awk so far: awk '{sub(/\|]+]++/, ""); print }' input.txt Input chr1 955543 955763 AGRN-6|pr=2|gc=75 0 + chr1 957571 957852 AGRN-7|pr=3|gc=61.2 0 + chr1 970621 ... (7 Replies)
Discussion started by: cmccabe
7 Replies

7. Shell Programming and Scripting

awk to remove field and match strings to add text

In file1 field $18 is removed.... column header is "Otherinfo", then each line in file1 is used to search file2 for a match. When a match is found the last four strings in file2 are copied to file1. Maybe: cut -f1-17 file1 and then match each line to file2 file1 Chr Start End ... (6 Replies)
Discussion started by: cmccabe
6 Replies

8. Shell Programming and Scripting

awk to update field using matching value in file1 and substring in field in file2

In the awk below I am trying to set/update the value of $14 in file2 in bold, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (2 Replies)
Discussion started by: cmccabe
2 Replies

9. Shell Programming and Scripting

Using awk to add length of matching characters between field in file

The awk below produces the current output, which will add +1 to $3. However, I am trying to add the length of the matching characters between $5 and $6 to $3. I have tried using sub as a variable to store the length but am not able to do so correctly. I added comments to each line and the... (4 Replies)
Discussion started by: cmccabe
4 Replies

10. Shell Programming and Scripting

awk to add text to each line of matching id

The awk below executes as expected if the id in $4 (like in f) is unique. However most of my data is like f1 where the same id can appear multiple times. I think that is the reason why the awk is not working as expected. I added a comment on the line that I can not change without causing the script... (6 Replies)
Discussion started by: cmccabe
6 Replies
All times are GMT -4. The time now is 09:02 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy