Visit Our UNIX and Linux User Community


Perl, RegEx - Help me to understand the regex!


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Perl, RegEx - Help me to understand the regex!
# 1  
Old 06-11-2015
Perl, RegEx - Help me to understand the regex!

I am not a big expert in regex and have just little understanding of that language.
Could you help me to understand the regular Perl expression:
Code:
^(?!if\b|else\b|while\b|[\s\*])(?:[\w\*~_& ;]+?\s+){1,6}([\w:\*~_&]+\s*)\([^\);]*\)[^\{;] *?(?:^[^\r\n\{]*;?[\s]+){0,10}\{

------
This is regex to select functions from a C/C++ source and defined in UltraEdit (if interested where it is from)
It works.
I able to use it in my perl script by applying 'm' - multyline regex and having whole file as one string (by 'undef $/')
------
I am going to show here my understanding how much I have and will ask where I do not know what is happening.
------
Please, correct me, if I am wrong in any assumption and give me an idea where I do not have any!
Thanks!
================
So, as I understanding this regex so far, is:
Code:
#initial:
"^(?!if\b|else\b|while\b|[\s\*])(?:[\w\*~_&]+?\s+){1,6}([\w:\*~_&]+\s*)\([^\);]*\)[^\{;]*?(?:^[^\r\n\{]*;?[\s]+){0,10}\{"

#by pieces:
# 1.
"^(?!if\b|else\b|while\b|[\s\*]) # - on beginning DOES NOT have words: 'if','esle','while' or (' ' and '*')
                                 # I guess, '(?!' part means do not select this part (defined by stuff in '(...)')
# 2.
(?:[\w\*~_&]+?\s+){1,6}          # - allowed beginning: any alpha-numeric or '*','~','_' and '&' one or more (shortest) (by +?),
                                 #   followed by ' 's (1 or more) (so, should be a word) , repeating from 1 to 6 times
                                 # Again: '(?:' - do not save it in final selection
# 3.
([\w:\*~_&]+\s*)\([^\);]*\)      # world-chars(plus '*~_&) one+ times; space{0,} (saved as $1), followed by '(...)' without ';' inside
# 4.
[^\{;]*?                         # - no ';' and '{' - any time repeated, but shortest (by *?)
                                 # so, IS IT anything between <func_nm>(..)  and   {...}  ?  So, comments only?
# 5.
(?:^[^\r\n\{]*;?[\s]+){0,10}     # ????  - This I do not understand:
                                 # what is '^[^' ? - beginning and NOT-block?  How it could be beginning? Is it in multyline selection means
                                 # anything on new line?
                                 # After that the \r\n - so, line change.  After 'on beginning' no line change?! So, one new line is fine, but
                                 # two is not???  Seems, nonsense.  How to understand?
                                 #  - Followed by ';' ?!?!  Statement between <fnc_nm>(...) and {...} ?!?!?  - Nonsense?!?!
                                 # after that '?' -so, shortest?
                                 # - folowed by spaces, at least one; and it could be up to 10 times (but do not save it (by (?: on beginning)
                                 # This understanding seems to me unreasonable.
                                 # Help me to get it!
# 6.
\{"                              # Finaly, followed by '{'

Thanks!

Last edited by alex_5161; 06-12-2015 at 01:38 PM..
# 2  
Old 06-11-2015
Code:
(?:^[^\r\n\{]*;?[\s]+){0,10}

Code:
(){0,10} # a grouping that could repeat from 0 to 10 times max
?: # do not capture this group (do not save in memory to use later)
^ # match start of the line
[^\r\n\{]* # math any character that is not a \r, \n or \{, 0 or more times
;? # match a ; if exist
[\s]+ # match any white space one or more times

This User Gave Thanks to Aia For This Post:
# 3  
Old 06-12-2015
Thanks, Aia!
So, it seems, I did understand pretty correct.
That means, my unclear, acctualy, is to the logic.

What the reason to restrict new-line (the \r,\n) on line beginning in C/C++ ?!
I am about the '^[^\r\n\{]*' in the #5 part: there is no any restriction in C/C++ ot get any number of new-line that does not brake a word!

How that could be having '<anything>;' between <func_nm>(..<params>..) and the {...} - the function body?! - that I see by #4 and beginning #5 :
- [^\{;] *?(?:^[^\r\n\{]*;?
- especially, finished by ';'?! And up to 10 times?!

That RegEx is searching a function declaration in a C/C++ source.
How those regulation could be useful in that task?

Previous Thread | Next Thread
Test Your Knowledge in Computers #870
Difficulty: Medium
Lisp introduced the concept of automatic garbage collection.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl REGEX help

Experts - I found a script on one of the servers that I work on and I need help understanding one of the lines. I know what the script does, but I'm having a hard time understanding the grouping. Can someone help me with this? Here's the script... #!/usr/bin/perl use strict; use... (2 Replies)
Discussion started by: timj123
2 Replies

2. Shell Programming and Scripting

Sendmail K command regex: adding exclusion/negative lookahead to regex -a@MATCH

I'm trying to get some exclusions into our sendmail regular expression for the K command. The following configuration & regex works: LOCAL_CONFIG # Kcheckaddress regex -a@MATCH +<@+?\.++?\.(us|info|to|br|bid|cn|ru) LOCAL_RULESETS SLocal_check_mail # check address against various regex... (0 Replies)
Discussion started by: RobbieTheK
0 Replies

3. Shell Programming and Scripting

?= in perl regex

Could anyone please make me understand how the ?= works below .. After executing this I am getting the same output. $string="I love chocolate."; $string =~ s/chocolate(?= ice)/vanilla/; print "$string\n"; (2 Replies)
Discussion started by: scriptscript
2 Replies

4. Programming

Perl regex

Hi Guys I have the following regex $OSRELEASE = $1 if ($output =~ /(Mac OS X (Server )?10.\d)/); output is currently Mac OS X 10.7.5 when the introduction of Mac 10.8 output changes to OS X 10.8.2 they have dropped the Mac bit so i changed the regex to be (2 Replies)
Discussion started by: ab52
2 Replies

5. Programming

Perl regex

HI, I'm new to perl and need simple regex for reading a file using my perl script. The text file reads as - filename=/pot/uio/current/myremificates.txt certificates=/pot/uio/current/userdir/conf/user/gamma/settings/security/... (3 Replies)
Discussion started by: jhamaks
3 Replies

6. UNIX for Dummies Questions & Answers

Perl Regex Help!!!

Hi, I get the following when I cat a file *.log xxxxx ===== dasdas gwdgsg fdsagfsag agsdfag ===== random data ===== My output should look like : If the random data after the 2nd ==== is null then OK should be printed else the random data should be printed. How do I go about this... (5 Replies)
Discussion started by: manutd
5 Replies

7. Shell Programming and Scripting

Converting perl regex to sed regex

I am having trouble parsing rpm filenames in a shell script.. I found a snippet of perl code that will perform the task but I really don't have time to rewrite the entire script in perl. I cannot for the life of me convert this code into something sed-friendly: if ($rpm =~ /(*)-(*)-(*)\.(.*)/)... (1 Reply)
Discussion started by: suntzu
1 Replies

8. Shell Programming and Scripting

Can't quite understand this regex

The following regular expression is found in a book I have been reading. It apparently can be used on an /etc/passwd file to find any accounts which have no password. I am having a heck of a time seeing how it works, and I was wondering if someone could run me through it. I will take a stab at... (1 Reply)
Discussion started by: kermit
1 Replies

9. Shell Programming and Scripting

Perl REGEX

Hi, Can anyone help me to find regular expression for the following in Perl? "The string can only contain lower case letters (a-z) and no more than one of any letter." For example: "table" is accepted, whether "dude" is not. I have coded like this: $str = "table"; if ($str =~ m/\b()\b/) {... (4 Replies)
Discussion started by: evilfreakz
4 Replies

10. Shell Programming and Scripting

q with Perl Regex

For a programming exercise, I am mean to design a Perl script that detects double letters in a text file. I tried the following expressions # Check for any double letter within the alphabet /+/ # Check for any repetition of an alphanumeric character /\w+/ Im aware that the... (8 Replies)
Discussion started by: JamesGoh
8 Replies

Featured Tech Videos