Sponsored Content
Top Forums Shell Programming and Scripting Perl, RegEx - Help me to understand the regex! Post 302946690 by alex_5161 on Thursday 11th of June 2015 11:36:59 AM
Old 06-11-2015
Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language.
Could you help me to understand the regular Perl expression:
Code:
^(?!if\b|else\b|while\b|[\s\*])(?:[\w\*~_& ;]+?\s+){1,6}([\w:\*~_&]+\s*)\([^\);]*\)[^\{;] *?(?:^[^\r\n\{]*;?[\s]+){0,10}\{

------
This is regex to select functions from a C/C++ source and defined in UltraEdit (if interested where it is from)
It works.
I able to use it in my perl script by applying 'm' - multyline regex and having whole file as one string (by 'undef $/')
------
I am going to show here my understanding how much I have and will ask where I do not know what is happening.
------
Please, correct me, if I am wrong in any assumption and give me an idea where I do not have any!
Thanks!
================
So, as I understanding this regex so far, is:
Code:
#initial:
"^(?!if\b|else\b|while\b|[\s\*])(?:[\w\*~_&]+?\s+){1,6}([\w:\*~_&]+\s*)\([^\);]*\)[^\{;]*?(?:^[^\r\n\{]*;?[\s]+){0,10}\{"

#by pieces:
# 1.
"^(?!if\b|else\b|while\b|[\s\*]) # - on beginning DOES NOT have words: 'if','esle','while' or (' ' and '*')
                                 # I guess, '(?!' part means do not select this part (defined by stuff in '(...)')
# 2.
(?:[\w\*~_&]+?\s+){1,6}          # - allowed beginning: any alpha-numeric or '*','~','_' and '&' one or more (shortest) (by +?),
                                 #   followed by ' 's (1 or more) (so, should be a word) , repeating from 1 to 6 times
                                 # Again: '(?:' - do not save it in final selection
# 3.
([\w:\*~_&]+\s*)\([^\);]*\)      # world-chars(plus '*~_&) one+ times; space{0,} (saved as $1), followed by '(...)' without ';' inside
# 4.
[^\{;]*?                         # - no ';' and '{' - any time repeated, but shortest (by *?)
                                 # so, IS IT anything between <func_nm>(..)  and   {...}  ?  So, comments only?
# 5.
(?:^[^\r\n\{]*;?[\s]+){0,10}     # ????  - This I do not understand:
                                 # what is '^[^' ? - beginning and NOT-block?  How it could be beginning? Is it in multyline selection means
                                 # anything on new line?
                                 # After that the \r\n - so, line change.  After 'on beginning' no line change?! So, one new line is fine, but
                                 # two is not???  Seems, nonsense.  How to understand?
                                 #  - Followed by ';' ?!?!  Statement between <fnc_nm>(...) and {...} ?!?!?  - Nonsense?!?!
                                 # after that '?' -so, shortest?
                                 # - folowed by spaces, at least one; and it could be up to 10 times (but do not save it (by (?: on beginning)
                                 # This understanding seems to me unreasonable.
                                 # Help me to get it!
# 6.
\{"                              # Finaly, followed by '{'

Thanks!

Last edited by alex_5161; 06-12-2015 at 01:38 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

q with Perl Regex

For a programming exercise, I am mean to design a Perl script that detects double letters in a text file. I tried the following expressions # Check for any double letter within the alphabet /+/ # Check for any repetition of an alphanumeric character /\w+/ Im aware that the... (8 Replies)
Discussion started by: JamesGoh
8 Replies

2. Shell Programming and Scripting

Perl REGEX

Hi, Can anyone help me to find regular expression for the following in Perl? "The string can only contain lower case letters (a-z) and no more than one of any letter." For example: "table" is accepted, whether "dude" is not. I have coded like this: $str = "table"; if ($str =~ m/\b()\b/) {... (4 Replies)
Discussion started by: evilfreakz
4 Replies

3. Shell Programming and Scripting

Can't quite understand this regex

The following regular expression is found in a book I have been reading. It apparently can be used on an /etc/passwd file to find any accounts which have no password. I am having a heck of a time seeing how it works, and I was wondering if someone could run me through it. I will take a stab at... (1 Reply)
Discussion started by: kermit
1 Replies

4. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

5. UNIX for Dummies Questions & Answers

Perl Regex Help!!!

Hi, I get the following when I cat a file *.log xxxxx ===== dasdas gwdgsg fdsagfsag agsdfag ===== random data ===== My output should look like : If the random data after the 2nd ==== is null then OK should be printed else the random data should be printed. How do I go about this... (5 Replies)
Discussion started by: manutd
5 Replies

6. Programming

Perl regex

HI, I'm new to perl and need simple regex for reading a file using my perl script. The text file reads as - filename=/pot/uio/current/myremificates.txt certificates=/pot/uio/current/userdir/conf/user/gamma/settings/security/... (3 Replies)
Discussion started by: jhamaks
3 Replies

7. Programming

Perl regex

Hi Guys I have the following regex $OSRELEASE = $1 if ($output =~ /(Mac OS X (Server )?10.\d)/); output is currently Mac OS X 10.7.5 when the introduction of Mac 10.8 output changes to OS X 10.8.2 they have dropped the Mac bit so i changed the regex to be (2 Replies)
Discussion started by: ab52
2 Replies

8. Shell Programming and Scripting

?= in perl regex

Could anyone please make me understand how the ?= works below .. After executing this I am getting the same output. $string="I love chocolate."; $string =~ s/chocolate(?= ice)/vanilla/; print "$string\n"; (2 Replies)
Discussion started by: scriptscript
2 Replies

9. Shell Programming and Scripting

Sendmail K command regex: adding exclusion/negative lookahead to regex -a@MATCH

I'm trying to get some exclusions into our sendmail regular expression for the K command. The following configuration & regex works: LOCAL_CONFIG # Kcheckaddress regex -a@MATCH +<@+?\.++?\.(us|info|to|br|bid|cn|ru) LOCAL_RULESETS SLocal_check_mail # check address against various regex... (0 Replies)
Discussion started by: RobbieTheK
0 Replies

10. Shell Programming and Scripting

Perl REGEX help

Experts - I found a script on one of the servers that I work on and I need help understanding one of the lines. I know what the script does, but I'm having a hard time understanding the grouping. Can someone help me with this? Here's the script... #!/usr/bin/perl use strict; use... (2 Replies)
Discussion started by: timj123
2 Replies
regcmp(3C)						   Standard C Library Functions 						regcmp(3C)

NAME
regcmp, regex - compile and execute regular expression SYNOPSIS
#include <libgen.h> char *regcmp(const char *string1, /* char *string2 */ ..., int /*(char*)0*/); char *regex(const char *re, const char *subject, /* char *ret0 */ ...); extern char *__loc1; DESCRIPTION
The regcmp() function compiles a regular expression (consisting of the concatenated arguments) and returns a pointer to the compiled form. The malloc(3C) function is used to create space for the compiled form. It is the user's responsibility to free unneeded space so allocated. A NULL return from regcmp() indicates an incorrect argument. regcmp(1) has been written to generally preclude the need for this routine at execution time. The regex() function executes a compiled pattern against the subject string. Additional arguments are passed to receive values back. The regex() function returns NULL on failure or a pointer to the next unmatched character on success. A global character pointer __loc1 points to where the match began. The regcmp() and regex() functions were mostly borrowed from the editor ed(1); however, the syntax and semantics have been changed slightly. The following are the valid symbols and associated meanings. []*.^ This group of symbols retains its meaning as described on the regexp(5) manual page. $ Matches the end of the string; matches a newline. - Within brackets the minus means through. For example, [a-z] is equivalent to [abcd...xyz]. The - can appear as itself only if used as the first or last character. For example, the character class expression []-] matches the characters ] and -. + A regular expression followed by + means one or more times. For example, [0-9]+ is equivalent to [0-9][0-9]*. {m} {m,} {m,u} Integer values enclosed in {} indicate the number of times the preceding regular expression is to be applied. The value m is the minimum number and u is a number, less than 256, which is the maximum. If only m is present (that is, {m}), it indicates the exact number of times the regular expression is to be applied. The value {m,} is analogous to {m,infinity}. The plus (+) and star (*) operations are equivalent to {1,} and {0,} respectively. ( ... )$n The value of the enclosed regular expression is to be returned. The value will be stored in the (n+1)th argument follow- ing the subject argument. At most, ten enclosed regular expressions are allowed. The regex() function makes its assign- ments unconditionally. ( ... ) Parentheses are used for grouping. An operator, for example, *, +, {}, can work on a single character or a regular expression enclosed in parentheses. For example, (a*(cb+)*)$0. By necessity, all the above defined symbols are special. They must, therefore, be escaped with a (backslash) to be used as themselves. EXAMPLES
Example 1 Example matching a leading newline in the subject string. The following example matches a leading newline in the subject string pointed at by cursor. char *cursor, *newcursor, *ptr; ... newcursor = regex((ptr = regcmp("^ ", (char *)0)), cursor); free(ptr); The following example matches through the string Testing3 and returns the address of the character after the last matched character (the ``4''). The string Testing3 is copied to the character array ret0. char ret0[9]; char *newcursor, *name; ... name = regcmp("([A-Za-z][A-za-z0-9]{0,7})$0", (char *)0); newcursor = regex(name, "012Testing345", ret0); The following example applies a precompiled regular expression in file.i (see regcmp(1)) against string. #include "file.i" char *string, *newcursor; ... newcursor = regex(name, string); ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |MT-Level |MT-Safe | +-----------------------------+-----------------------------+ SEE ALSO
ed(1), regcmp(1), malloc(3C), attributes(5), regexp(5) NOTES
The user program may run out of memory if regcmp() is called iteratively without freeing the vectors no longer required. When compiling multithreaded applications, the _REENTRANT flag must be defined on the compile line. This flag should only be used in mul- tithreaded applications. SunOS 5.11 14 Nov 2002 regcmp(3C)
All times are GMT -4. The time now is 08:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy