Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

parse::eyapp::datagenerationtut5.18(3) [mojave man page]

Parse::Eyapp::datagenerationtut(3)			User Contributed Perl Documentation			Parse::Eyapp::datagenerationtut(3)

NAME
Parse::Eyapp::datagenerationtut - Tutorial on Using Parse::Eyapp as a Data Generator for Testing INTRODUCTION
The examples for this tutorial can be found in the directory "examples/generator" in the distribution of "Parse::Eyapp". To understand the code you will need some familiarity with Test::LectroTest::Generator, however, we will make an attempt to introduce the basics of Test::LectroTest::Generator needed. While parsing is the process of determining the membership of a string to a language, generation is the reverse problem. Using a context free grammar it is possible to generate strings belonging to the language described by that grammar. Context free grammars can be used to generate tests. The programmer designs a grammar that defines a set of inputs that will be able to find some set of bugs. This tutorial shows how to use Parse::Eyapp to generate phrases belonging to the language defined by a given grammar. We will generate inputs to test a simple calculator. Compiling and Running the Example The grammar describing the language is in the file "Generator.eyp". Calling eyapp with option "-c" will show the contents of the file without the semantic actions: Parse-Eyapp/examples/generator$ eyapp -c Generator.eyp # file: Generator.eyp # compile with: eyapp -b '' Generator.eyp # then run: ./Generator.pm %strict %token NUM VARDEF VAR %right '=' %left '-' '+' %left '*' '/' %left NEG %right '^' %% stmts: stmt | stmts ';' stmt ; stmt: VARDEF '=' exp ; exp: NUM | VAR | exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp | '-' exp %prec NEG | exp '^' exp | '(' exp ')' ; %% This grammar defines a language of sequences of semicolon separated assignments. The right hand side of an assignment can be any valid arithmetic expression including numbers and variables. First we compile the grammar with option "-b ''" to produce a modulino: Parse-Eyapp/examples/generator$ eyapp -b '' Generator.eyp Now, the module has execution permits and its first line contains the "#!" header: Parse-Eyapp/examples/generator$ ls -ltr | tail -1 -rwxr-xr-x 1 lusasoft lusasoft 7844 2009-01-12 08:30 Generator.pm Parse-Eyapp/examples/generator$ head -1 Generator.pm #!/usr/bin/perl The use of option "-b ''" combined with the fact that we have added these lines 67 unless (caller) { 68 __PACKAGE__->main(@ARGV); 69 } at the end of the grammar file "Generator.eyp" provide the generated file with a dual nature: it is a module and an executable at the same time. This is what is know as a modulino (term coined by Brian d Foy). Here follows the results of several executions. Each run produces a set of assignments. The first output line reports the result of the randomly generated program. Parse-Eyapp/examples/generator$ ./Generator.pm # result: -3 SC=-3 # result: error. Division by zero. M=(-4/6+4)+9+2*8*9+4; XQ=8/3; EI=XQ*2/0/0; BL=5+EI+4/5/XQ As you can see in the former run, only variables that were defined in previous assignments are used in later assignments. However, the generated source may produce run-time errors and exceptions (which is good thing when testing a calculator). Parse-Eyapp/examples/generator$ ./Generator.pm # result: 6 CF=(6) Parse-Eyapp/examples/generator$ ./Generator.pm # result: -710.2 I=(3*-8+7/5); R=2+8*I*4+5*2+I/I Parse-Eyapp/examples/generator$ ./Generator.pm # result: Calculator syntax differs from Perl... RY=2--2+(3+6)+(7*7*4^1+2*0/8*5/3) GENERATING PHRASES FROM A CONTEXT FREE GRAMMAR
Using "YYExpect" and Test::LectroTest::Generator to generate tokens The basic idea of using Parse::Eyapp to generate phrases for the language defined by a given context free grammar is simple: change the lexer by a token generator. Instead of reading from some input, randomly generate one of the valid tokens. We can use the method "YYExpect" to know what tokens are valid. For versions 1.137 and later of Parse::Eyapp, the method "YYExpect" returns the set of valid tokens at the time it is called. For previous versions (and this is also true for Parse::Yapp), "YYExpect" only returns a subset of the whole set of valid tokens. In this example, the token generator has been isolated in the sub "gen_lexer" in the file "GenSupport.pm": 47 sub gen_lexer { 48 my $parser = shift; 49 50 my $token = $parser->generate_token; 51 my $attr = $parser->generate_attribute($token); 52 #$attr = $WHITESPACES->generate.$attr; 53 54 return ($token, $attr); 55 } The token and its attribute are generated in lines 50 and 51. The methods "generate_token" and "generate_attribute" are also in the module "GenSupport.pm". They are methods of the parser object since the grammar "Generator.eyp" not only uses but inherits this module. See line 3 of "Generator.eyp": Parse-Eyapp/examples/generator$ sed -ne '19,24p' Generator.eyp | cat -n 1 %{ 2 use base q{Parse::Eyapp::TokenGen}; 3 use base q{GenSupport}; 4 %} 5 6 %% The method "generate_token" obtains the set of valid tokens using "YYExpect" (line 29). Then uses the "Frequency" function in Test::LectroTest::Generator to produce a "Test::LectroTest::Generator" object (line 31). The method "generate" of such object is used to generate the actual token (line 33). 26 sub generate_token { 27 my $parser = shift; 28 29 my @token = $parser->YYExpect; 30 31 my $tokengen = Frequency( map { [$parser->token_weight($_), Unit($_)] } @token); 32 33 return $tokengen->generate; 34 } The Parse::Eyapp::TokenGen method "token_weight" returns the weight associated with a token, assuming it was previously set using one of the Parse::Eyapp::TokenGen methods like "set_tokenweightsandgenerators" or "set_tokenweights". See the code of method "main" in "GenSupport.pm": examples/generator$ sed -ne '98,/^ *)/p' GenSupport.pm | cat -n 1 my $parser = $package->new(); 2 3 $parser->set_tokenweightsandgenerators( 4 NUM => [ 2, Int(range=>[0, 9], sized=>0)], 5 VAR => [ 6 0, # At the beginning, no variables are defined 7 Gen { 8 return Elements(keys %st)->generate if keys %st; 9 return Int(range=>[0, 9], sized=>0)->generate; 10 }, 11 ], 12 VARDEF => [ 13 2, 14 String( length=>[1,2], charset=>"A-NP-Z", size => 100 ) 15 ], 16 '=' => 2, '-' => 1, '+' => 2, 17 '*' => 4, '/' => 2, '^' => 0.5, 18 ';' => 1, '(' => 1, ')' => 2, 19 '' => 2, 'error' => 0, 20 ); A Brief Introduction to Test::LectroTest::Generator The module "GenSupport.pm" uses Test::LectroTest::Generator to build generators for the required tokens. Thus the call to Int(range=>[0, 9], sized=>0) builds a Test::LectroTest::Generator object that produces integers in the range [0,9]. Such objects have a method "generate" that produces the actual item. The following debugger session illustrates the way to use Test::LectroTest::Generator: pl@europa:~/LEyapp$ perl -wde 0 main::(-e:1): 0 DB<1> use Test::LectroTest::Generator qw{:all} DB<2> $i = Int(range=>[0, 9], sized=>0) DB<3> p $i->generate 6 DB<4> p $i->generate 9 The "String" method builds a Test::LectroTest::Generator object that produces strings: DB<5> $v = String( length=>[1,2], charset=>"A-NP-Z", size => 100 ) DB<6> p $v->generate HM DB<7> p $v->generate Y DB<8> p $v->generate KE The "Elements" method builds a Test::LectroTest::Generator object that produces one of a given list of elements: DB<9> @a = map { $v->generate } 1..10 DB<10> x @a 0 'UC' 1 'P' 2 'IF' 3 'EJ' 4 'H' 5 'VC' 6 'CF' 7 'K' 8 'T' 9 'IG' DB<11> $x = Elements(@a) DB<12> p $x->generate P DB<13> p $x->generate P DB<14> p $x->generate EJ DB<15> p $x->generate VC Even more interesting for our purpose is the "Frequency" method, which produces one of a given list of elements with a given probability distribution. The following example illustrates its use. First we build a weight list where the odd elements have weight 2 and the even elements have weight 1: DB<16> @w = map { $_ % 2 ? 2 : 1 } 0..9 DB<21> @w{@a} = @w DB<24> x \%w 0 HASH(0xd3cc80) 'CF' => 1 'EJ' => 2 'H' => 1 'IF' => 1 'IG' => 2 'K' => 2 'P' => 2 'T' => 1 'UC' => 1 'VC' => 2 We now use "Frequency" to build a Test::LectroTest::Generator object that produces one of the given list of elements @a according to the specified probability: DB<29> $f = Frequency( map { [$w{$_}, Unit($_)] } @a) Let us generate 10 items. We see that odd elements appear more frequently than even elements: DB<30> @r = map { $f->generate } 1..10 DB<31> p "@r" VC UC K UC VC VC K EJ P P Generating Token Attributes Once the token was generated through the call to "generate_token" at line 50: 45 #my $WHITESPACES = String( length=>[0,1], charset=>" ", size => 100 ); 46 47 sub gen_lexer { 48 my $parser = shift; 49 50 my $token = $parser->generate_token; 51 my $attr = $parser->generate_attribute($token); 52 #$attr = $WHITESPACES->generate.$attr; 53 54 return ($token, $attr); 55 } the associated attributed is generated via the "generate_attribute" method in "GenSupport.pm". If needed, random combination of white spaces can be added to the generated attribute via an appropriate generator (line 52). The "generate_attribute" method uses the method "generate" of the generator associated with such token. If no generator object was set, the attribute returned is the token itself (line 42): 36 sub generate_attribute { 37 my $parser = shift; 38 my $token = shift; 39 40 my $gen = $parser->token_generator($token); 41 return $gen->generate if defined($gen); 42 return $token; 43 } Holding Semantic Constraints The attribute generator associated with the token "VAR" is more complex than the others. It was defined in the call to "set_tokenweightsandgenerators": examples/generator$ sed -ne '98,/^ *)/p' GenSupport.pm | cat -n 1 my $parser = $package->new(); 2 3 $parser->set_tokenweightsandgenerators( 4 NUM => [ 2, Int(range=>[0, 9], sized=>0)], 5 VAR => [ 6 0, # At the beginning, no variables are defined 7 Gen { 8 return Elements(keys %st)->generate if keys %st; 9 return Int(range=>[0, 9], sized=>0)->generate; 10 }, 11 ], 12 VARDEF => [ 13 2, 14 String( length=>[1,2], charset=>"A-NP-Z", size => 100 ) 15 ], 16 '=' => 2, '-' => 1, '+' => 2, 17 '*' => 4, '/' => 2, '^' => 0.5, 18 ';' => 1, '(' => 1, ')' => 2, 19 '' => 2, 'error' => 0, 20 ); The "Gen" function of Test::LectroTest::Generator creates a new generator from a given code. Since a variable can't be used unless it is defined, we use a symbol table %st to keep record of the variables that were defined in previous assignments. If no defined variables exists, the defined generator returns a digit between 0 and 9. Each time a new assignment to a variable occurs, such variable is added to the symbol table. This is achieved through the semantic action associated with the assignment production rule: examples/generator$ sed -ne '35,41p' Generator.eyp | cat -n 1 stmt: 2 VARDEF '=' exp 3 { 4 my $parser = shift; 5 $parser->defined_variable($_[0]); 6 "$_[0]=$_[2]"; 7 } The "defined_variable" method in "GenSupport.pm" simply sets the corresponding entry in the symbol table: examples/generator$ sed -ne '19,24p' GenSupport.pm | cat -n 1 my %st; # Symbol Table 2 sub defined_variable { 3 my ($parser, $var) = @_; 4 5 $st{$var} = 1; 6 } The semantic action associated with "VARDEF '=' exp" returns the string "$_[0]=$_[2]" containing the actual phrase. Since this is the semantic action required for most productions we make it our default action: examples/generator$ sed -ne '13,17p' Generator.eyp | cat -n 1 %defaultaction { 2 my $parser = shift; 3 4 return join '', @_; 5 } The syntactic variable "stmts" generates sequences of "stmt" separated by semicolons: examples/generator$ sed -ne '26,33p' Generator.eyp | cat -n 1 stmts: 2 stmt 3 { 4 $_[0]->deltaweight(VAR => +1); # At least one variable is defined now 5 $_[1]; 6 } 7 | stmts ';' { " " } stmt 8 ; The second production is left recursive. As a consequence, the "stmt" in the first production (line 2) is the first statement of the sequence. A small derivation can convince you of this property: stmts-> stmt stmts => stmts';' stmt => stmts';' stmt ';' stmt => stmt ';' stmt ';' stmt ---- Thus, when the reduction by the production "stmts -> stmt" occurs, we are sure that the first statement has been processed. In such case we increase the weight of token "VAR" one unit (which was initially zero, see the call to "set_tokenweightsandgenerators"), $_[0]->deltaweight(VAR => +1); The weight of "VAR" is now 1, giving chances for variables to appear in the right hand side of an assignment. The Parse::Eyapp::Tokengen method "deltaweight" increases (decreases if negative) the weight of the given tokens using the associated values. Dynamically Changing the Probability Distribution The semantic actions for the productions exp -> '(' exp ')' and exp -> '-' exp show a way to modify the weights associated with some tokens: 43 exp: 44 NUM 45 | VAR 46 | exp '+' exp 47 | exp '-' exp 48 | exp '*' exp 49 | exp '/' exp 50 | '-' { $_[0]->pushdeltaweight('-' => -1) } exp %prec NEG 51 { 52 $_[0]->popweight(); 53 "-$_[3]" 54 } 55 | exp '^' exp 56 | '(' { $_[0]->pushdeltaweight('(' => -1, ')' => +1, '+' => +1, ); } 57 exp 58 ')' 59 { 60 $_[0]->popweight; 61 "($_[3])" 62 } 63 ; After seeing a '(' we decrease by one the weight of '(' to avoid expressions with nested parenthesis. We also increase the weight of token '+', since parenthesis are often used to give more priority to a sum over a multiplication or division. This is achieved via the "pushdeltaweight" method. The old weight is recovered after the closing parenthesis is seen using the "popweight" method. Computing the Expected Result Function "evaluate_using_perl" in "GenSupport.pm" finds the expected value for the generated expression. The calculator expression is roughly translated to a Perl expression and evaluated using the Perl interpreter: 57 sub evaluate_using_perl { # if possible 58 my $perlexp = shift; 59 60 $perlexp =~ s/([a-zA-Z])/$$1/g; # substitute A by $A everywhere 61 $perlexp =~ s/^/**/g; # substitute power operator: ^ by ** 62 63 my $res = eval "no warnings; no strict;$perlexp"; 64 if ($@ =~ /Illegal division/) { 65 $res = "error. Division by zero."; 66 } 67 elsif ($@) { # Our calc notation is incompatible with perl in a few gotchas 68 # Perl interprets -- in a different way 69 $@ =~ m{(.*)}; # Show only the first line of error message 70 $res = "Calculator syntax differs from Perl. Can't compute the result: $1"; 71 } 72 73 $res; 74 } The calculator language differs from Perl. In the calculator, two consecutive minus like in "2--3" are interpreted as "2+3" while for Perl the former expression is an error. This limitation is here to illustrate a limitation of the approach: it gives a way to generate complex structured inputs but the programmer must find a way to compute what the expected value is. APPENDIX
: FILES File "GenSupport.pm" Parse-Eyapp/examples/generator$ cat -n GenSupport.pm 1 package GenSupport; 2 use strict; 3 use warnings; 4 5 use Getopt::Long; 6 use Test::LectroTest::Generator qw(:all); 7 use Parse::Eyapp::TokenGen; 8 9 sub _Error { 10 my $parser = shift; 11 12 my $t = $parser->YYCurval; 13 my @e = $parser->YYExpect(); 14 my $attr = $parser->YYSemval(0); 15 local $" = " "; 16 warn "Error: Current attribute: <$attr> Current token: <$t> Expected: <@e> "; 17 } 18 19 my %st; # Symbol Table 20 sub defined_variable { 21 my ($parser, $var) = @_; 22 23 $st{$var} = 1; 24 } 25 26 sub generate_token { 27 my $parser = shift; 28 29 my @token = $parser->YYExpect; 30 31 my $tokengen = Frequency( map { [$parser->token_weight($_), Unit($_)] } @token); 32 33 return $tokengen->generate; 34 } 35 36 sub generate_attribute { 37 my $parser = shift; 38 my $token = shift; 39 40 my $gen = $parser->token_generator($token); 41 return $gen->generate if defined($gen); 42 return $token; 43 } 44 45 #my $WHITESPACES = String( length=>[0,1], charset=>" ", size => 100 ); 46 47 sub gen_lexer { 48 my $parser = shift; 49 50 my $token = $parser->generate_token; 51 my $attr = $parser->generate_attribute($token); 52 #$attr = $WHITESPACES->generate.$attr; 53 54 return ($token, $attr); 55 } 56 57 sub evaluate_using_perl { # if possible 58 my $perlexp = shift; 59 60 $perlexp =~ s/([a-zA-Z])/$$1/g; # substitute A by $A everywhere 61 $perlexp =~ s/^/**/g; # substitute power operator: ^ by ** 62 63 my $res = eval "no warnings; no strict;$perlexp"; 64 if ($@ =~ /Illegal division/) { 65 $res = "error. Division by zero."; 66 } 67 elsif ($@) { # Our calc notation is incompatible with perl in a few gotchas 68 # Perl interprets -- in a different way 69 $@ =~ m{(.*)}; # Show only the first line of error message 70 $res = "Calculator syntax differs from Perl. Can't compute the result: $1"; 71 } 72 73 $res; 74 } 75 76 77 sub Run { 78 my($self)=shift; 79 my $yydebug = shift || 0; 80 81 return $self->YYParse( 82 yylex => &gen_lexer, 83 yyerror => &_Error, 84 yydebug => $yydebug, # 0x1F 85 ); 86 } 87 88 sub main { 89 my $package = shift; 90 91 my $debug = shift || 0; 92 my $result = GetOptions ( 93 "debug!" => $debug, 94 ); 95 96 $debug = 0x1F if $debug; 97 98 my $parser = $package->new(); 99 100 $parser->set_tokenweightsandgenerators( 101 NUM => [ 2, Int(range=>[0, 9], sized=>0)], 102 VAR => [ 103 0, # At the beginning, no variables are defined 104 Gen { 105 return Elements(keys %st)->generate if keys %st; 106 return Int(range=>[0, 9], sized=>0)->generate; 107 }, 108 ], 109 VARDEF => [ 110 2, 111 String( length=>[1,2], charset=>"A-NP-Z", size => 100 ) 112 ], 113 '=' => 2, '-' => 1, '+' => 2, 114 '*' => 4, '/' => 2, '^' => 0.5, 115 ';' => 1, '(' => 1, ')' => 2, 116 '' => 2, 'error' => 0, 117 ); 118 119 my $exp = $parser->Run( $debug ); 120 121 my $res = evaluate_using_perl($exp); 122 123 print " # result: $res $exp "; 124 } 125 126 1; File "Generator.eyp" Parse-Eyapp/examples/generator$ cat -n Generator.eyp 1 # file: Generator.eyp 2 # compile with: eyapp -b '' Generator.eyp 3 # then run: ./Generator.pm 4 %strict 5 %token NUM VARDEF VAR 6 7 %right '=' 8 %left '-' '+' 9 %left '*' '/' 10 %left NEG 11 %right '^' 12 13 %defaultaction { 14 my $parser = shift; 15 16 return join '', @_; 17 } 18 19 %{ 20 use base q{Parse::Eyapp::TokenGen}; 21 use base q{GenSupport}; 22 %} 23 24 %% 25 26 stmts: 27 stmt 28 { 29 $_[0]->deltaweight(VAR => +1); # At least one variable is defined now 30 $_[1]; 31 } 32 | stmts ';' { " " } stmt 33 ; 34 35 stmt: 36 VARDEF '=' exp 37 { 38 my $parser = shift; 39 $parser->defined_variable($_[0]); 40 "$_[0]=$_[2]"; 41 } 42 ; 43 exp: 44 NUM 45 | VAR 46 | exp '+' exp 47 | exp '-' exp 48 | exp '*' exp 49 | exp '/' exp 50 | '-' { $_[0]->pushdeltaweight('-' => -1) } exp %prec NEG 51 { 52 $_[0]->popweight(); 53 "-$_[3]" 54 } 55 | exp '^' exp 56 | '(' { $_[0]->pushdeltaweight('(' => -1, ')' => +1, '+' => +1, ); } 57 exp 58 ')' 59 { 60 $_[0]->popweight; 61 "($_[3])" 62 } 63 ; 64 65 %% 66 67 unless (caller) { 68 __PACKAGE__->main(@ARGV); 69 } SEE ALSO
o Parse::Eyapp::TokenGen o Test::LectroTest::Generator by Tom Moertel o The Design and Implementation of a Grammar-based Data Generator(1992) by Peter M. Maurer, Software Practice and Experience <http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/./vol22/issue3/spe756pm.pdf> o yagg: an easy-to-use generator for structured test inputs by David Coppit and Jiexin Lian. ASE '05: Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering. 2005, pages 356-359. o <http://search.cpan.org/perldoc?yagg::Tutorial>, o <http://www.cs.wm.edu/~coppit/wiki/images/4/4e/yagg_short.pdf>, o <http://www.cs.wm.edu/~coppit/wiki/images/6/69/wm-cs-2005-13.pdf> o Writing Randomly by Randall Schwartz. Linux Magazine Column 04 (Sep 1999). <http://www.stonehenge.com/merlyn/LinuxMag/col04.html> o Generating Test Data with Enhanced Context Free Grammars by Peter M. Maurer <http://cs.baylor.edu/~maurer/aida/dgl-source/documentation/gen_tst.pdf> o Modules as Programs by Brian d Foy <http://www252.pair.com/comdog/mastering_perl/Chapters/18.modulinos.html> o How a Script Becomes a Module by Brian d Foy. On Perlmonks: <http://www.perlmonks.org/index.pl?node_id=396759>. CONTRIBUTORS
o Hal Finkel <http://www.halssoftware.com/> o G. Williams <http://kasei.us/> o Thomas L. Shinnick <http://search.cpan.org/~tshinnic/> o Frank Leray AUTHOR
Casiano Rodriguez-Leon (casiano@ull.es) ACKNOWLEDGMENTS
This work has been supported by CEE (FEDER) and the Spanish Ministry of Educacion y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project <http://www.oplink.ull.es/>). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years. A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien. I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Thanks to the people who have contributed to improve the module (see "CONTRIBUTORS" in Parse::Eyapp). Thanks to Larry Wall for giving us Perl. Special thanks to Juana. LICENCE AND COPYRIGHT
Copyright (c) 2006-2008 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved. Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001 These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. perl v5.18.2 2012-03-23 Parse::Eyapp::datagenerationtut(3)
Man Page