Unix/Linux Go Back    


CentOS 7.0 - man page for perl::critic::policy::regularexpressions::prohibitcomplexregexes (centos section 3)

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)


Perl::Critic::Policy::RegularEPerl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes(3)

NAME
       Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes - Split long regexps into
       smaller "qr//" chunks.

AFFILIATION
       This Policy is part of the core Perl::Critic distribution.

DESCRIPTION
       Big regexps are hard to read, perhaps even the hardest part of Perl.  A good practice to
       write digestible chunks of regexp and put them together.  This policy flags any regexp
       that is longer than "N" characters, where "N" is a configurable value that defaults to 60.
       If the regexp uses the "x" flag, then the length is computed after parsing out any
       comments or whitespace.

       Unfortunately the use of descriptive (and therefore longish) variable names can cause
       regexps to be in violation of this policy, so interpolated variables are counted as 4
       characters no matter how long their names actually are.

CASE STUDY
       As an example, look at the regexp used to match email addresses in Email::Valid::Loose
       (tweaked lightly to wrap for POD)

	   (?x-ism:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]
	   \000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015
	   "]*)*")(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[
	   \]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n
	   \015"]*)*")|\.)*\@(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,
	   ;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]
	   )(?:\.(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000
	   -\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*)

       which is constructed from the following code:

	   my $esc	   = '\\\\';
	   my $period	   = '\.';
	   my $space	   = '\040';
	   my $open_br	   = '\[';
	   my $close_br    = '\]';
	   my $nonASCII    = '\x80-\xff';
	   my $ctrl	   = '\000-\037';
	   my $cr_list	   = '\n\015';
	   my $qtext	   = qq/[^$esc$nonASCII$cr_list\"]/; # "
	   my $dtext	   = qq/[^$esc$nonASCII$cr_list$open_br$close_br]/;
	   my $quoted_pair = qq<$esc>.qq<[^$nonASCII]>;
	   my $atom_char   = qq/[^($space)<>\@,;:\".$esc$open_br$close_br$ctrl$nonASCII]/;# "
	   my $atom	   = qq<$atom_char+(?!$atom_char)>;
	   my $quoted_str  = qq<\"$qtext*(?:$quoted_pair$qtext*)*\">; # "
	   my $word	   = qq<(?:$atom|$quoted_str)>;
	   my $domain_ref  = $atom;
	   my $domain_lit  = qq<$open_br(?:$dtext|$quoted_pair)*$close_br>;
	   my $sub_domain  = qq<(?:$domain_ref|$domain_lit)>;
	   my $domain	   = qq<$sub_domain(?:$period$sub_domain)*>;
	   my $local_part  = qq<$word(?:$word|$period)*>; # This part is modified
	   $Addr_spec_re   = qr<$local_part\@$domain>;

       If you read the code from bottom to top, it is quite readable.  And, you can even see the
       one violation of RFC822 that Tatsuhiko Miyagawa deliberately put into Email::Valid::Loose
       to allow periods.  Look for the "|\." in the upper regexp to see that same deviation.

       One could certainly argue that the top regexp could be re-written more legibly with "m//x"
       and comments.  But the bottom version is self-documenting and, for example, doesn't repeat
       "\x80-\xff" 18 times.  Furthermore, it's much easier to compare the second version against
       the source BNF grammar in RFC 822 to judge whether the implementation is sound even before
       running tests.

CONFIGURATION
       This policy allows regexps up to "N" characters long, where "N" defaults to 60.	You can
       override this to set it to a different number with the "max_characters" setting.  To do
       this, put entries in a .perlcriticrc file like this:

	   [RegularExpressions::ProhibitComplexRegexes]
	   max_characters = 40

CREDITS
       Initial development of this policy was supported by a grant from the Perl Foundation.

AUTHOR
       Chris Dolan <cdolan@cpan.org>

COPYRIGHT
       Copyright (c) 2007-2011 Chris Dolan.  Many rights reserved.

       This program is free software; you can redistribute it and/or modify it under the same
       terms as Perl itself.  The full text of this license can be found in the LICENSE file
       included with this module

perl v5.16.3		      Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes(3)
Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums


All times are GMT -4. The time now is 12:16 PM.