Sponsored Content
Top Forums Shell Programming and Scripting awk to update specific value in file with match and add +1 to specific digit Post 302987823 by cmccabe on Friday 16th of December 2016 11:54:23 AM
Old 12-16-2016
awk to update specific value in file with match and add +1 to specific digit

I am trying to use awk to match the NM_ in file with $1 of id which is tab-delimited. The NM_ will always be in the line of file that starts with > and be after the second _. When there is a match between each NM_ and id, then the value of $2 in id is substituted or used to update the NM_. Each NM_ may not be unique, as in the example below, but will have a match in id.

After the third _ there is a digit 0,1,2,etc that I am trying to add the word exon and add +1 to the digit. Not sure if my awk attempt helps at all to address the first question. Thank you Smilie.


file
Code:
>hg19_refGene_NM_001195684_0 range=chr1:92327018-92327098 5'pad=10 3'pad=10 strand=- repeatMasking=none
agaaataaaaATGACTTCCCATTATGTGATTGCCATCTTTGCCCTGATGA
GCTCCTGTTTAGCCACTGCAGgtaagttgca
>hg19_refGene_NM_001195684_1 range=chr1:92262834-92263038 5'pad=10 3'pad=10 strand=- repeatMasking=none
cccttggcagGTCCAGAGCCTGGTGCACTGTGTGAACTGTCACCTGTCAG
TGCCTCCCATCCTGTCCAGGCCTTGATGGAGAGCTTCACTGTTTTGTCAG
GCTGTGCCAGCAGAGGCACAACTGGGCTGCCACAGGAGGTGCATGTCCTG
AATCTCCGCACTGCAGGCCAGGGGCCTGGCCAGCTACAGAGAGAGgtagg
tgcag
>hg19_refGene_NM_001195684_2 range=chr1:92224160-92224317 5'pad=10 3'pad=10 strand=- repeatMasking=none
tgcttcctagGTCACACTTCACCTGAATCCCATCTCCTCAGTCCACATCC
ACCACAAGTCTGTTGTGTTCCTGCTCAACTCCCCACACCCCCTGGTGTGG
CATCTGAAGACAGAGAGACTTGCCACTGGGGTCTCCAGACTGTTTTTGgt
aagtgctt
>hg19_refGene_NM_001195683_2 range=chr1:92224160-92224317 5'pad=10 3'pad=10 strand=- repeatMasking=none
tgcttcctagGTCACACTTCACCTGAATCCCATCTCCTCAGTCCACATCC
ACCACAAGTCTGTTGTGTTCCTGCTCAACTCCCCACACCCCCTGGTGTGG
CATCTGAAGACAGAGAGACTTGCCACTGGGGTCTCCAGACTGTTTTTGgt
aagtgctt
>hg19_refGene_NM_001195683_3 range=chr1:92200323-92200526 5'pad=10 3'pad=10 strand=- repeatMasking=none
tttcctctagGTGTCTGAGGGTTCTGTGGTCCAGTTTTCATCAGCAAACT
TCTCCTTGACAGCAGAAACAGAAGAAAGGAACTTCCCCCATGGAAATGAA
CATCTGTTAAATTGGGCCCGAAAAGAGTATGGAGCAGTTACTTCATTCAC
CGAACTCAAGATAGCAAGAAACATTTATATTAAAGTGGGGGAAGgtaaat
ttta

id
Code:
NM_001195684    TGFBR3
NM_001206389    FGF8
NM_001197220    PDE4D
NM_001195683   TGFBR3

desired output value in bold updated with $2 in id because NM_ matched in $1 of id,
value in italics added one to the 0 and the word exon
Code:
>hg19_refGene_TGFBR3_exon1 range=chr1:92327018-92327098 5'pad=10 3'pad=10 strand=- repeatMasking=none
agaaataaaaATGACTTCCCATTATGTGATTGCCATCTTTGCCCTGATGA
GCTCCTGTTTAGCCACTGCAGgtaagttgca
>hg19_refGene_TGFBR3_exon2 range=chr1:92262834-92263038 5'pad=10 3'pad=10 strand=- repeatMasking=none
cccttggcagGTCCAGAGCCTGGTGCACTGTGTGAACTGTCACCTGTCAG
TGCCTCCCATCCTGTCCAGGCCTTGATGGAGAGCTTCACTGTTTTGTCAG
GCTGTGCCAGCAGAGGCACAACTGGGCTGCCACAGGAGGTGCATGTCCTG
AATCTCCGCACTGCAGGCCAGGGGCCTGGCCAGCTACAGAGAGAGgtagg
tgcag
>hg19_refGene_TGFBR3_exon3 range=chr1:92224160-92224317 5'pad=10 3'pad=10 strand=- repeatMasking=none
tgcttcctagGTCACACTTCACCTGAATCCCATCTCCTCAGTCCACATCC
ACCACAAGTCTGTTGTGTTCCTGCTCAACTCCCCACACCCCCTGGTGTGG
CATCTGAAGACAGAGAGACTTGCCACTGGGGTCTCCAGACTGTTTTTGgt
aagtgctt
>hg19_refGene_TGFBR3_exon3 range=chr1:92224160-92224317 5'pad=10 3'pad=10 strand=- repeatMasking=none
tgcttcctagGTCACACTTCACCTGAATCCCATCTCCTCAGTCCACATCC
ACCACAAGTCTGTTGTGTTCCTGCTCAACTCCCCACACCCCCTGGTGTGG
CATCTGAAGACAGAGAGACTTGCCACTGGGGTCTCCAGACTGTTTTTGgt
aagtgctt
>hg19_refGene_TGFBR3_exon4 range=chr1:92200323-92200526 5'pad=10 3'pad=10 strand=- repeatMasking=none
tttcctctagGTGTCTGAGGGTTCTGTGGTCCAGTTTTCATCAGCAAACT
TCTCCTTGACAGCAGAAACAGAAGAAAGGAACTTCCCCCATGGAAATGAA
CATCTGTTAAATTGGGCCCGAAAAGAGTATGGAGCAGTTACTTCATTCAC
CGAACTCAAGATAGCAAGAAACATTTATATTAAAGTGGGGGAAGgtaaat
ttta

awk
Code:
awk 'NR==FNR{a[$1];next} {k=$2; sub(/_.*/,"",k)} k in a' file id


Last edited by cmccabe; 12-17-2016 at 11:33 PM.. Reason: fixed format, added details, fixed typo
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Insert a text from a specific row into a specific column using SED or AWK

Hi, I am having trouble converting a text file. I have been working for this whole day now, still i couldn't make it. Here is how the text file looks: _______________________________________________________ DEVICE STATUS INFORMATION FOR LOCATION 1: OPER STATES: Disabled E:Enabled ... (5 Replies)
Discussion started by: Issemael
5 Replies

2. Shell Programming and Scripting

Assigning a specific format to a specific column in a text file using awk and printf

Hi, I have the following text file: 8 T1mapping_flip02 ok 128 108 30 1 665000-000008-000001.dcm 9 T1mapping_flip05 ok 128 108 30 1 665000-000009-000001.dcm 10 T1mapping_flip10 ok 128 108 30 1 665000-000010-000001.dcm 11 T1mapping_flip15 ok 128 108 30... (2 Replies)
Discussion started by: goodbenito
2 Replies

3. Shell Programming and Scripting

Replace specific field on specific line sed or awk

I'm trying to update a text file via sed/awk, after a lot of searching I still can't find a code snippet that I can get to work. Brief overview: I have user input a line to a variable, I then find a specific value in this line 10th field in this case. After asking for new input and doing some... (14 Replies)
Discussion started by: crownedzero
14 Replies

4. Shell Programming and Scripting

How to compare specific digit in number?

Dear All, Lets say I have a number with following format: ####.12e-## now I want to compare place holder in position 1 and 2. How can I do that? Note: My number is stored in a variable say var. example: var=9999.12e-05 Thanks & Regards, linuxUser_ (6 Replies)
Discussion started by: linuxUser_
6 Replies

5. Shell Programming and Scripting

Add tab after digit in specific field in file

I am trying to add a tab after the last digit in $3 in the input. The grep below is all I can think off. Thank you :) sed -n 's/:/&/p' input input chr1 955542 955763AGRN-6|gc=75 chr1 957570 957852AGRN-7|gc=61.2 chr1 976034 976270AGRN-9|gc=74.5 desired output chr1... (5 Replies)
Discussion started by: cmccabe
5 Replies

6. Shell Programming and Scripting

awk partial string match and add specific fields

Trying to combine strings that are a partial match to another in $1 (usually below it). If a match is found than the $2 value is added to the $2 value of the match and the $3 value is added to the $3 value of the match. I am not sure how to do this and need some expert help. Thank you :). file ... (2 Replies)
Discussion started by: cmccabe
2 Replies

7. Shell Programming and Scripting

awk to output match and mismatch with count using specific fields

In the below awk I am trying output to one file those lines that match between $2,$3,$4 of file1 and file2 with the count in (). I am also trying to output those lines that are missing between $2,$3,$4 of file1 and file2 with the count of in () each. Both input files are tab-delimited, but the... (7 Replies)
Discussion started by: cmccabe
7 Replies

8. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

9. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

10. Shell Programming and Scripting

awk to match file1 and extract specific tag values

File2 is tab-delimeted and I am trying to use $2 in file1 (space delimeted) as a search term in file2. If it is found then the AF= in and the FDP= values from file2 are extracted and printed next to the file1 line. I commented the awk before I added the lines in bold the current output resulted. I... (7 Replies)
Discussion started by: cmccabe
7 Replies
All times are GMT -4. The time now is 04:45 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy