Sponsored Content
Top Forums Shell Programming and Scripting Perl to update field based on a specific set of rules Post 303000870 by durden_tyler on Friday 21st of July 2017 10:02:17 AM
Old 07-21-2017
The problem lies in the regular expression you are using with "$GeneDetailrefGene".

See below:
Quote:
Originally Posted by cmccabe
...
...
Code:
 ...
 ...
my ($transcript) = ($GeneDetailrefGene) =~ /(?:\.\d+[+*-]|\D)(\d+)/;   # Get a numeric value if exists using (.) and (+/*/-) and capture digits into $transcript.
...
 ...

...
...
What you are essentially saying is that in the string $GeneDetailrefGene, look for one of the following:

a) a dot character (".") followed by 1 or more digits [0-9] followed by any one of the characters "+", "*" or "-"
Code:
\.\d+[+*-]

== OR ==

b) a non-digit character
Code:
\D

whichever comes first.

If either a) or b) is followed by 1 or more digits [0-9], then capture those digits and set them to $transcript.

The "whichever comes first" part is crucial here.
If you have a regex like this: "A|B", Perl will match either A or B, whichever comes first - reading the string from left to right.

So, when your program sees this value for the $GeneDetailrefGene:

Code:
NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G

it matches the "_001134408" to "\D(\d+)" and therefore sets $transcript to "001134408".

For a clearer explanation, the red part of the regex below matches the red part of the string $GeneDetailrefGene. Ditto with the blue part, which is eventually assigned to $transcript.

Line 1:
Code:
NM_001134408:exon3:c.415-43A>G;NM_001134407:exon3:c.415-43A>G;NM_000833:exon4:c.415-43A>G
/(?:\.\d+[+*-]|\D)(\d+)/

Line 2:
Code:
NM_003239:exon6:c.927-1G>C
/(?:\.\d+[+*-]|\D)(\d+)/

Line 3:
Code:
NM_001879:exon12:c.1442-5C>G
/(?:\.\d+[+*-]|\D)(\d+)/

This User Gave Thanks to durden_tyler For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Update a field in a file based on condition

Hi i am new to scripting. i have a file file.dat with content as : CONTENT_STORAGE PERCENTAGE FLAG: /storage_01 64% 0 /storage_02 17% 1 I need to update the value of FLAG for a particular CONTENT_STORAGE value I have written the following code #!/bin/sh threshold=20... (1 Reply)
Discussion started by: kichu
1 Replies

2. Shell Programming and Scripting

Help with allocated text content based on specific rules...

Input file format: /tag="ABL" /note="abl homolog 2 /tag="ABLIM1" /note="actin binding LIM 1 /tag="ABP1" /note="amiloride binding protein 1 (amine oxidase (copper- containing)) /tag="ABR" /note="active BCR-related /tag="AC003042.1" /note="SDR family member 11 precursor . . . (4 Replies)
Discussion started by: perl_beginner
4 Replies

3. Shell Programming and Scripting

Update specific field in a line of text file

I have a text file like this: subject1:LecturerA:10 subject2:LecturerA:40 if I was given string in column 1 and 2 (which are subject 1 and LecturerA) , i need to update 3rd field of that line containing that given string , which is, number 10 need to be updated to 100 ,for example. The... (6 Replies)
Discussion started by: bmtoan
6 Replies

4. Shell Programming and Scripting

Add specific string to last field of each line in perl based on value

I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the... (12 Replies)
Discussion started by: cmccabe
12 Replies

5. Shell Programming and Scripting

awk to update value in field based on another field

In the tab-delimeted input file below I am trying to use awk to update the value in $2 if TYPE=ins in bold, by adding the value of HRUN= in italics. In the below since in line 1 TYPE=ins the 117282541 value in $2 has 6 added because that is the value of HRUN=. Hopefully the awk is a start but I... (2 Replies)
Discussion started by: cmccabe
2 Replies

6. Shell Programming and Scripting

Perl to update field in file based of match to another file

In the perl below I am trying to set/update the value of $14 (last field) in file2, using the matching NM_ in $12 or $9 in file2 with the NM_ in $2 of file1. The lengths of $9 and $12 can be variable but what is consistent is the start pattern will always be NM_ and the end pattern is always ;... (4 Replies)
Discussion started by: cmccabe
4 Replies

7. Shell Programming and Scripting

Perl to change value based on set of rules

In the perl there is a default rule that sets f to VUS, and then a seris of rules that will change f based on the result that is obtained from the rule. The code below is a rule that is supposed to be applicable to lines 2-4 because this rule just looks at the digit in f. So in line 2 f is 27... (4 Replies)
Discussion started by: cmccabe
4 Replies

8. Shell Programming and Scripting

awk to assign points to variables based on conditions and update specific field

I have been reading old posts and trying to come up with a solution for the below: Use a tab-delimited input file to assign point to variables that are used to update a specific field, Rank. I really couldn't find too much in the way of assigning points to variable, but made an attempt at an awk... (4 Replies)
Discussion started by: cmccabe
4 Replies

9. Shell Programming and Scripting

Update a specific field in file with Variable value based on other Key Word

I have an input file with A=xyz B=pqr I would want the value in Second Field (xyz or pqr) updated with a value present in Shell Variable based on the value passed in the first field. (A or B ) while read line do NEW_VALUE = `some functionality done on $line` If $line=First Field-... (1 Reply)
Discussion started by: infernalhell
1 Replies

10. UNIX for Beginners Questions & Answers

Problem with getting awk to multiply a field by a value set based on condition of another field

Hi, So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple... (15 Replies)
Discussion started by: cotilloe
15 Replies
All times are GMT -4. The time now is 02:48 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy