In the perl there is a default rule that sets f[55] to VUS, and then a seris of rules that will change f[55] based on the result that is
obtained from the rule. The code below is a rule that is supposed to be applicable to lines 2-4 because this rule just looks at the digit in f[8]. So in line 2 f[8] is 27
and that value is greater than 10, so f[55] would be Likely Benign. Since the symbol before the digit could be either a > or + or - in the regex I use \D to look for any non-digit before the number.
The else portion of the rule is supposed to be applicable to lines 1 and 5 as it uses the regex to parse out the digit after the - ot + or *in the string
that begins with NM_ in [ICODE] in f[8]. I am currently only getting the second line's f[55] value to be correct and I am not sure what I am doing incorrect. I have tried
changing the regex but not to the correct one (maybe there is something else I am missing). Thank you .
file
perl
desired output in f[55]
current output in f[55]
...
...
The code below is a rule that is supposed to be applicable to lines 2-4 because this rule just looks at the digit in f[8].
...
...
That's wrong!
The regex looks for "a non-digit followed by one or more digits" ("\D\d+").
Line numbers 2 and 3 do not have that, so it will not match.
Line number 4 has that, so it will match.
Quote:
Originally Posted by cmccabe
...
...
So in line 2 f[8] is 27
and that value is greater than 10, so f[55] would be Likely Benign.
...
...
That's wrong again.
Line # 2 will never match "\D\d+" so 27 will never be extracted and hence never compared to anything.
Quote:
Originally Posted by cmccabe
...
...
Since the symbol before the digit could be either a > or + or - in the regex I use \D to look for any non-digit before the number.
...
...
Wrong again.
There are cases where a symbol does not exist in the first place.
For example, lines 2 and 3 do not have the symbol at all.
You did not do anything for those cases hence those lines fail to match your regex.
Line 5 does have the symbol, so it matches your regex.
Quote:
Originally Posted by cmccabe
...
...
The else portion of the rule is supposed to be applicable to lines 1 and 5 as it uses the regex to parse out the digit after the - ot + or *in the string
that begins with NM_ in [ICODE] in f[8].
...
...
Yes, but before the control goes to the "else" portion, it will go to the "if" portion.
And the "if" portion will match your lines 1 and 5 because both of them have "a non-digit followed by one or more digit" ("\D\d+") in their f[8] values.
So the "else" portion will not even get a chance to execute for lines 1 and 5.
Here's some diagnostic output for your data file:
This User Gave Thanks to durden_tyler For This Post:
Thank you for the diagnostics, they help, is it more or less I am trying to capture to many conditions with the regex? What would you recommend? Thank you .
...
...
is it more or less I am trying to capture to many conditions with the regex?
What would you recommend?
...
...
Yes, from your other Perl related posts, I do get the impression that you are trying to use the regexes for too many things. That should be avoided.
However, for this particular piece of code, I think, you may want to deepen your understanding of regexes.
You have two types of data in F[8] column.
Type 1:
and
Type 2:
So use regular expressions that work specifically with each type of data.
Your regex "\D\d+" is meant for Type 1, but it will actually match Type 2 as well.
Why?
Because "\D" means "non-digit character" and so it matches the "_" after "NM".
And then that is followed by "\d+" - "one or more digits". That's why the regex doesn't work the way you want.
Here's a demonstration:
And for line # 5:
As you can see, the regex meant for Type 2 data is working on Type 1 data as well.
So, determine what exactly is there in Type 1 and Type 2 data that differentiates them? Here are a few observations:
(1) Type 1 has "\d+" - "one or more digits"
(2) Type 1 may or may not have a non-digit at the front. This non-digit could be ">", "+" or "-". But nothing else.
(3) If there is a non-digit at the front, there is only one such non-digit. There cannot be more than one. So you need: "zero or one non-digit". For that, you could use "\D{0,1}" or "\D?".
Let's test this on the one-liner above.
First, notice that "\D\d+" will not work on both ">50" and "50".
That's because there is nothing before "50" in the second case, but the regex "\D\d+" demands exactly one non-digit at the beginning.
Since there was no non-digit, the match failed.
Now notice how "\D?\d+" works for both cases:
Now, we make the regex more robust. We know that the "non-digit" character at the beginning is one of ">", "+" or "-".
So we use the bracket notation: "[>+-]"
This will match exactly one of the characters inside the brackets.
And since there can be 0 or 1 of such characters, we use "?" after the brackets: "[>+-]?"
In other words, we simply replaced "\D" by "[>+-]"
"\D" matches any non-digit character; it could match "#" or "A" or ">" etc.
"[>+-]" matches only one of the characters inside the brackets.
Testing again:
Finally, we only want the sequence of digits at the end.
So we can remove the parentheses around the non-digits at the beginning.
We can also put the "beginning of string anchor", which is "^" to specify that the non-digits are at the beginning of the string.
The updated regex is "^[>+-]?(\d+)"
Testing again:
So that takes care of Type 1 data.
Now for Type 2 data.
Your regex "/(?:\.\d+[+*-])(\d+)/" looks for the following:
(1) A single dot character "." followed by
(2) One or more digits "\d+" followed by
(3) Exactly one of the characters "+", "*", "-" followed by
(4) One or more digits "\d+"
It matches (1), (2), (3) together but does not "group" them into $1 (due to "?:" at the beginning).
It matches (4) and groups the sequence of digits into $1.
Now, if you look at your Line # 5:
the data has:
(1) Single dot character "."
(2) But no sequence of digits after the dot!! There is a "*" after the dot "."
Hence your regex fails.
Here's the demonstration:
So what are the special characteristics of Type 2 data that distinguish it from Type 1 data? And how do we create the regex to match Type 2 data?
Firstly, if all Type 2 data start with "NM_", you could use that in your regex. So we have "NM_"
Now, it has a dot ">" at some point further on. So we get the regex "NM_.*\."
Here ".*" passes through "maximum number of characters till it reaches the right-most dot (.) character". It's a greedy search.
The dot character may or may not have a sequence of digits after it. (Line 1 has, Line 5 does not have.) "\d*" matches "zero or more digits" - "more" means "1 or more", so "zero or 1 or more than 1 digits".
So, we get: "NM_.*\.\d*"
After that, we definitely have one of the following characters "+", "*", "-".
So we use "[+*-]" for that. The regex now becomes "NM_.*\.\d*[+*-]"
Finally, that is followed by a sequence of digits that we want to capture.
Sequence of digits is "\d+". So the final regex is: "NM_.*\.\d*[+*-](\d+)"
Let's test this on Line 1 and Line 5 data:
Because of the "NM_" at the beginning of the regex, we are guaranteed that it will not match Type 1 data.
But let's confirm that that is really the case:
Let's also confirm that the regex for Type 1 data does not match Type 2 data!
Hope that helps.
If you are unable to incorporate the regexes in your script, do post the problem here.
This User Gave Thanks to durden_tyler For This Post:
I think the below will capture lines 2-6, but not line 1 (looks like 018328) is being captured by the regex. Is the syntax correct or is there a better way? Thank you .
In the perl below, which does execute, I am having trouble with the else in Rule 3. The digit in f{8} is extracted and used to update f accordinly along with the value in f.
There can be either - * or + before the number that is extracted but the same logic applies, that is if the value is greater... (5 Replies)
please help solving the following. I have access to redhat linux cluster having 32gigs of ram.
I have duplicate ids for variable names, in the file 1,2 are duplicates;3,4 and 5 are duplicates;6 and 7 are duplicates. My objective is to use only the first occurrence of these duplicates.
Lookup... (4 Replies)
Hi My requirement is very simple .
I juts need to delte some lines from a file.
here comes theactual scenario
I have some data in file like say
srinivasa prabhu kumar antony
srinivas king prabhu antony
srinivas prabhu king yar venkata
venkata kingson srinivas... (6 Replies)
Hy there!
Some time ago I encrypted the harddrive of my notebook.
Now, I can't remember it correctly.
I want to create a list with all possible combinations of the words I used (I still remember all the words....).
The password was created like this:
... (1 Reply)
Hi,
How to check if a string on file2 exactly matches with a part or complete string on file1, and return a match indicator based on some match rules.
1) only records on file1 with category A should be matched. for other category, the output match indicator should default to 'N'
2) on file2... (13 Replies)
dears
i am using solaris 10
i am facing a problem when i make setup for solaris i choose the country egypt and i select the language north america
but i forget to do that the i found the date Jun written in arabic
i want to change character set to written in english
-rw-r--r-- 1 root ... (4 Replies)
Using Solaris 9 and 10.
What we want to do is set up global rules for our password files to restrict all users, not only new ones set up with the rules but also the ones that have been sitting on the system for years.
Is there a global way to force all users to change their password every 90... (1 Reply)
Hi All
I have a Small Requiement
I wanted to replace all the Follwing lines as follows
Input:: file1
EVALUATE WS-TEMP-ATTR(15:1)
WHEN 'D'
MOVE DFHDARK TO WS-ATTR-COLOR
WHEN OTHER
MOVE DFHDFT ... (9 Replies)
i am using perl in win2000advanced server...
---------------------------
perl -version:
---------------------------
This is perl, v5.6.1 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)
Copyright 1987-2001, Larry Wall
Binary build 638 provided by... (1 Reply)