Query: perlhack
OS: suse
Section: 1
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
PERLHACK(1) Perl Programmers Reference Guide PERLHACK(1)NAMEperlhack - How to hack at the Perl internalsDESCRIPTIONThis document attempts to explain how Perl development takes place, and ends with some suggestions for people wanting to become bona fide porters. The perl5-porters mailing list is where the Perl standard distribution is maintained and developed. The list can get anywhere from 10 to 150 messages a day, depending on the heatedness of the debate. Most days there are two or three patches, extensions, features, or bugs being discussed at a time. A searchable archive of the list is at either: http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/ or http://archive.develooper.com/perl5-porters@perl.org/ List subscribers (the porters themselves) come in several flavours. Some are quiet curious lurkers, who rarely pitch in and instead watch the ongoing development to ensure they're forewarned of new changes or features in Perl. Some are representatives of vendors, who are there to make sure that Perl continues to compile and work on their platforms. Some patch any reported bug that they know how to fix, some are actively patching their pet area (threads, Win32, the regexp engine), while others seem to do nothing but complain. In other words, it's your usual mix of technical people. Over this group of porters presides Larry Wall. He has the final word in what does and does not change in the Perl language. Various releases of Perl are shepherded by a "pumpking", a porter responsible for gathering patches, deciding on a patch-by-patch, feature-by- feature basis what will and will not go into the release. For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of Perl, and Jarkko Hietaniemi was the pumpking for the 5.8 release, and Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release. In addition, various people are pumpkings for different things. For instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the Configure pumpkin up till the 5.8 release. For the 5.10 release H.Merijn Brand took over. Larry sees Perl development along the lines of the US government: there's the Legislature (the porters), the Executive branch (the pumpkings), and the Supreme Court (Larry). The legislature can discuss and submit patches to the executive branch all they like, but the executive branch is free to veto them. Rarely, the Supreme Court will side with the executive branch over the legislature, or the legislature over the executive branch. Mostly, however, the legislature and the executive branch are supposed to get along and work out their differences without impeachment or court cases. You might sometimes see reference to Rule 1 and Rule 2. Larry's power as Supreme Court is expressed in The Rules: 1. Larry is always by definition right about how Perl should behave. This means he has final veto power on the core functionality. 2. Larry is allowed to change his mind about any matter at a later date, regardless of whether he previously invoked Rule 1. Got that? Larry is always right, even when he was wrong. It's rare to see either Rule exercised, but they are often alluded to. New features and extensions to the language are contentious, because the criteria used by the pumpkings, Larry, and other porters to decide which features should be implemented and incorporated are not codified in a few small design goals as with some other languages. Instead, the heuristics are flexible and often difficult to fathom. Here is one person's list, roughly in decreasing order of importance, of heuristics that new features have to be weighed against: Does concept match the general goals of Perl? These haven't been written anywhere in stone, but one approximation is: 1. Keep it fast, simple, and useful. 2. Keep features/concepts as orthogonal as possible. 3. No arbitrary limits (platforms, data sizes, cultures). 4. Keep it open and exciting to use/patch/advocate Perl everywhere. 5. Either assimilate new technologies, or build bridges to them. Where is the implementation? All the talk in the world is useless without an implementation. In almost every case, the person or people who argue for a new feature will be expected to be the ones who implement it. Porters capable of coding new features have their own agendas, and are not available to implement your (possibly good) idea. Backwards compatibility It's a cardinal sin to break existing Perl programs. New warnings are contentious--some say that a program that emits warnings is not broken, while others say it is. Adding keywords has the potential to break programs, changing the meaning of existing token sequences or functions might break programs. Could it be a module instead? Perl 5 has extension mechanisms, modules and XS, specifically to avoid the need to keep changing the Perl interpreter. You can write modules that export functions, you can give those functions prototypes so they can be called like built-in functions, you can even write XS code to mess with the runtime data structures of the Perl interpreter if you want to implement really complicated things. If it can be done in a module instead of in the core, it's highly unlikely to be added. Is the feature generic enough? Is this something that only the submitter wants added to the language, or would it be broadly useful? Sometimes, instead of adding a feature with a tight focus, the porters might decide to wait until someone implements the more generalized feature. For instance, instead of implementing a "delayed evaluation" feature, the porters are waiting for a macro system that would permit delayed evaluation and much more. Does it potentially introduce new bugs? Radical rewrites of large chunks of the Perl interpreter have the potential to introduce new bugs. The smaller and more localized the change, the better. Does it preclude other desirable features? A patch is likely to be rejected if it closes off future avenues of development. For instance, a patch that placed a true and final interpretation on prototypes is likely to be rejected because there are still options for the future of prototypes that haven't been addressed. Is the implementation robust? Good patches (tight code, complete, correct) stand more chance of going in. Sloppy or incorrect patches might be placed on the back burner until the pumpking has time to fix, or might be discarded altogether without further notice. Is the implementation generic enough to be portable? The worst patches make use of a system-specific features. It's highly unlikely that non-portable additions to the Perl language will be accepted. Is the implementation tested? Patches which change behaviour (fixing bugs or introducing new features) must include regression tests to verify that everything works as expected. Without tests provided by the original author, how can anyone else changing perl in the future be sure that they haven't unwittingly broken the behaviour the patch implements? And without tests, how can the patch's author be confident that his/her hard work put into the patch won't be accidentally thrown away by someone in the future? Is there enough documentation? Patches without documentation are probably ill-thought out or incomplete. Nothing can be added without documentation, so submitting a patch for the appropriate manpages as well as the source code is always a good idea. Is there another way to do it? Larry said "Although the Perl Slogan is There's More Than One Way to Do It, I hesitate to make 10 ways to do something". This is a tricky heuristic to navigate, though--one man's essential addition is another man's pointless cruft. Does it create too much work? Work for the pumpking, work for Perl programmers, work for module authors, ... Perl is supposed to be easy. Patches speak louder than words Working code is always preferred to pie-in-the-sky ideas. A patch to add a feature stands a much higher chance of making it to the language than does a random feature request, no matter how fervently argued the request might be. This ties into "Will it be useful?", as the fact that someone took the time to make the patch demonstrates a strong desire for the feature. If you're on the list, you might hear the word "core" bandied around. It refers to the standard distribution. "Hacking on the core" means you're changing the C source code to the Perl interpreter. "A core module" is one that ships with Perl. Keeping in sync The source code to the Perl interpreter, in its different versions, is kept in a repository managed by the git revision control system. The pumpkings and a few others have write access to the repository to check in changes. How to clone and use the git perl repository is described in perlrepository. You can also choose to use rsync to get a copy of the current source tree for the bleadperl branch and all maintenance branches : $ rsync -avz rsync://perl5.git.perl.org/APC/perl-current . $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.10.x . $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.8.x . $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.6.x . $ rsync -avz rsync://perl5.git.perl.org/APC/perl-5.005xx . (Add the "--delete" option to remove leftover files) You may also want to subscribe to the perl5-changes mailing list to receive a copy of each patch that gets submitted to the maintenance and development "branches" of the perl repository. See http://lists.perl.org/ for subscription information. If you are a member of the perl5-porters mailing list, it is a good thing to keep in touch with the most recent changes. If not only to verify if what you would have posted as a bug report isn't already solved in the most recent available perl development branch, also known as perl-current, bleading edge perl, bleedperl or bleadperl. Needless to say, the source code in perl-current is usually in a perpetual state of evolution. You should expect it to be very buggy. Do not use it for any purpose other than testing and development. Perlbug administration There is a single remote administrative interface for modifying bug status, category, open issues etc. using the RT bugtracker system, maintained by Robert Spier. Become an administrator, and close any bugs you can get your sticky mitts on: http://bugs.perl.org/ To email the bug system administrators: "perlbug-admin" <perlbug-admin@perl.org> Submitting patches Always submit patches to perl5-porters@perl.org. If you're patching a core module and there's an author listed, send the author a copy (see "Patching a core module"). This lets other porters review your patch, which catches a surprising number of errors in patches. Please patch against the latest development version. (e.g., even if you're fixing a bug in the 5.8 track, patch against the "blead" branch in the git repository.) If changes are accepted, they are applied to the development branch. Then the maintenance pumpking decides which of those patches is to be backported to the maint branch. Only patches that survive the heat of the development branch get applied to maintenance versions. Your patch should update the documentation and test suite. See "Writing a test". If you have added or removed files in the distribution, edit the MANIFEST file accordingly, sort the MANIFEST file using "make manisort", and include those changes as part of your patch. Patching documentation also follows the same order: if accepted, a patch is first applied to development, and if relevant then it's backported to maintenance. (With an exception for some patches that document behaviour that only appears in the maintenance branch, but which has changed in the development version.) To report a bug in Perl, use the program perlbug which comes with Perl (if you can't get Perl to work, send mail to the address perlbug@perl.org or perlbug@perl.com). Reporting bugs through perlbug feeds into the automated bug-tracking system, access to which is provided through the web at http://rt.perl.org/rt3/ . It often pays to check the archives of the perl5-porters mailing list to see whether the bug you're reporting has been reported before, and if so whether it was considered a bug. See above for the location of the searchable archives. The CPAN testers ( http://testers.cpan.org/ ) are a group of volunteers who test CPAN modules on a variety of platforms. Perl Smokers ( http://www.nntp.perl.org/group/perl.daily-build and http://www.nntp.perl.org/group/perl.daily-build.reports/ ) automatically test Perl source releases on platforms with various configurations. Both efforts welcome volunteers. In order to get involved in smoke testing of the perl itself visit http://search.cpan.org/dist/Test-Smoke <http://search.cpan.org/dist/Test-Smoke>. In order to start smoke testing CPAN modules visit http://search.cpan.org/dist/CPANPLUS-YACSmoke/ <http://search.cpan.org/dist/CPANPLUS-YACSmoke/> or <http://search.cpan.org/dist/minismokebox/> or http://search.cpan.org/dist/CPAN-Reporter/ <http://search.cpan.org/dist/CPAN-Reporter/>. It's a good idea to read and lurk for a while before chipping in. That way you'll get to see the dynamic of the conversations, learn the personalities of the players, and hopefully be better prepared to make a useful contribution when do you speak up. If after all this you still think you want to join the perl5-porters mailing list, send mail to perl5-porters-subscribe@perl.org. To unsubscribe, send mail to perl5-porters-unsubscribe@perl.org. To hack on the Perl guts, you'll need to read the following things: perlguts This is of paramount importance, since it's the documentation of what goes where in the Perl source. Read it over a couple of times and it might start to make sense - don't worry if it doesn't yet, because the best way to study it is to read it in conjunction with poking at Perl source, and we'll do that later on. Gisle Aas's "illustrated perlguts", also known as illguts, has very helpful pictures: <http://search.cpan.org/dist/illguts/> perlxstut and perlxs A working knowledge of XSUB programming is incredibly useful for core hacking; XSUBs use techniques drawn from the PP code, the portion of the guts that actually executes a Perl program. It's a lot gentler to learn those techniques from simple examples and explanation than from the core itself. perlapi The documentation for the Perl API explains what some of the internal functions do, as well as the many macros used in the source. Porting/pumpkin.pod This is a collection of words of wisdom for a Perl porter; some of it is only useful to the pumpkin holder, but most of it applies to anyone wanting to go about Perl development. The perl5-porters FAQ This should be available from http://dev.perl.org/perl5/docs/p5p-faq.html . It contains hints on reading perl5-porters, information on how perl5-porters works and how Perl development in general works. Finding Your Way Around Perl maintenance can be split into a number of areas, and certain people (pumpkins) will have responsibility for each area. These areas sometimes correspond to files or directories in the source kit. Among the areas are: Core modules Modules shipped as part of the Perl core live in various subdirectories, where two are dedicated to core-only modules, and two are for the dual-life modules which live on CPAN and may be maintained separately with respect to the Perl core: lib/ is for pure-Perl modules, which exist in the core only. ext/ is for XS extensions, and modules with special Makefile.PL requirements, which exist in the core only. cpan/ is for dual-life modules, where the CPAN module is canonical (should be patched first). dist/ is for dual-life modules, where the blead source is canonical. Tests There are tests for nearly all the modules, built-ins and major bits of functionality. Test files all have a .t suffix. Module tests live in the lib/ and ext/ directories next to the module being tested. Others live in t/. See "Writing a test" Documentation Documentation maintenance includes looking after everything in the pod/ directory, (as well as contributing new documentation) and the documentation to the modules in core. Configure The Configure process is the way we make Perl portable across the myriad of operating systems it supports. Responsibility for the Configure, build and installation process, as well as the overall portability of the core code rests with the Configure pumpkin - others help out with individual operating systems. The three files that fall under his/her responsibility are Configure, config_h.SH, and Porting/Glossary (and a whole bunch of small related files that are less important here). The Configure pumpkin decides how patches to these are dealt with. Currently, the Configure pumpkin will accept patches in most common formats, even directly to these files. Other committers are allowed to commit to these files under the strict condition that they will inform the Configure pumpkin, either on IRC (if he/she happens to be around) or through (personal) e-mail. The files involved are the operating system directories, (win32/, os2/, vms/ and so on) the shell scripts which generate config.h and Makefile, as well as the metaconfig files which generate Configure. (metaconfig isn't included in the core distribution.) See http://perl5.git.perl.org/metaconfig.git/blob/HEAD:/README for a description of the full process involved. Interpreter And of course, there's the core of the Perl interpreter itself. Let's have a look at that in a little more detail. Before we leave looking at the layout, though, don't forget that MANIFEST contains not only the file names in the Perl distribution, but short descriptions of what's in them, too. For an overview of the important files, try this: perl -lne 'print if /^[^/]+.[ch]s+/' MANIFEST Elements of the interpreter The work of the interpreter has two main stages: compiling the code into the internal representation, or bytecode, and then executing it. "Compiled code" in perlguts explains exactly how the compilation stage happens. Here is a short breakdown of perl's operation: Startup The action begins in perlmain.c. (or miniperlmain.c for miniperl) This is very high-level code, enough to fit on a single screen, and it resembles the code found in perlembed; most of the real action takes place in perl.c perlmain.c is generated by writemain from miniperlmain.c at make time, so you should make perl to follow this along. First, perlmain.c allocates some memory and constructs a Perl interpreter, along these lines: 1 PERL_SYS_INIT3(&argc,&argv,&env); 2 3 if (!PL_do_undump) { 4 my_perl = perl_alloc(); 5 if (!my_perl) 6 exit(1); 7 perl_construct(my_perl); 8 PL_perl_destruct_level = 0; 9 } Line 1 is a macro, and its definition is dependent on your operating system. Line 3 references "PL_do_undump", a global variable - all global variables in Perl start with "PL_". This tells you whether the current running program was created with the "-u" flag to perl and then undump, which means it's going to be false in any sane context. Line 4 calls a function in perl.c to allocate memory for a Perl interpreter. It's quite a simple function, and the guts of it looks like this: my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); Here you see an example of Perl's system abstraction, which we'll see later: "PerlMem_malloc" is either your system's "malloc", or Perl's own "malloc" as defined in malloc.c if you selected that option at configure time. Next, in line 7, we construct the interpreter using perl_construct, also in perl.c; this sets up all the special variables that Perl needs, the stacks, and so on. Now we pass Perl the command line options, and tell it to go: exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); if (!exitstatus) perl_run(my_perl); exitstatus = perl_destruct(my_perl); perl_free(my_perl); "perl_parse" is actually a wrapper around "S_parse_body", as defined in perl.c, which processes the command line options, sets up any statically linked XS modules, opens the program and calls "yyparse" to parse it. Parsing The aim of this stage is to take the Perl source, and turn it into an op tree. We'll see what one of those looks like later. Strictly speaking, there's three things going on here. "yyparse", the parser, lives in perly.c, although you're better off reading the original YACC input in perly.y. (Yes, Virginia, there is a YACC grammar for Perl!) The job of the parser is to take your code and "understand" it, splitting it into sentences, deciding which operands go with which operators and so on. The parser is nobly assisted by the lexer, which chunks up your input into tokens, and decides what type of thing each token is: a variable name, an operator, a bareword, a subroutine, a core function, and so on. The main point of entry to the lexer is "yylex", and that and its associated routines can be found in toke.c. Perl isn't much like other computer languages; it's highly context sensitive at times, it can be tricky to work out what sort of token something is, or where a token ends. As such, there's a lot of interplay between the tokeniser and the parser, which can get pretty frightening if you're not used to it. As the parser understands a Perl program, it builds up a tree of operations for the interpreter to perform during execution. The routines which construct and link together the various operations are to be found in op.c, and will be examined later. Optimization Now the parsing stage is complete, and the finished tree represents the operations that the Perl interpreter needs to perform to execute our program. Next, Perl does a dry run over the tree looking for optimisations: constant expressions such as "3 + 4" will be computed now, and the optimizer will also see if any multiple operations can be replaced with a single one. For instance, to fetch the variable $foo, instead of grabbing the glob *foo and looking at the scalar component, the optimizer fiddles the op tree to use a function which directly looks up the scalar in question. The main optimizer is "peep" in op.c, and many ops have their own optimizing functions. Running Now we're finally ready to go: we have compiled Perl byte code, and all that's left to do is run it. The actual execution is done by the "runops_standard" function in run.c; more specifically, it's done by these three innocent looking lines: while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) { PERL_ASYNC_CHECK(); } You may be more comfortable with the Perl version of that: PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; Well, maybe not. Anyway, each op contains a function pointer, which stipulates the function which will actually carry out the operation. This function will return the next op in the sequence - this allows for things like "if" which choose the next op dynamically at run time. The "PERL_ASYNC_CHECK" makes sure that things like signals interrupt execution if required. The actual functions called are known as PP code, and they're spread between four files: pp_hot.c contains the "hot" code, which is most often used and highly optimized, pp_sys.c contains all the system-specific functions, pp_ctl.c contains the functions which implement control structures ("if", "while" and the like) and pp.c contains everything else. These are, if you like, the C code for Perl's built- in functions and operators. Note that each "pp_" function is expected to return a pointer to the next op. Calls to perl subs (and eval blocks) are handled within the same runops loop, and do not consume extra space on the C stack. For example, "pp_entersub" and "pp_entertry" just push a "CxSUB" or "CxEVAL" block struct onto the context stack which contain the address of the op following the sub call or eval. They then return the first op of that sub or eval block, and so execution continues of that sub or block. Later, a "pp_leavesub" or "pp_leavetry" op pops the "CxSUB" or "CxEVAL", retrieves the return op from it, and returns it. Exception handing Perl's exception handing (i.e. "die" etc.) is built on top of the low-level "setjmp()"/"longjmp()" C-library functions. These basically provide a way to capture the current PC and SP registers and later restore them; i.e. a "longjmp()" continues at the point in code where a previous "setjmp()" was done, with anything further up on the C stack being lost. This is why code should always save values using "SAVE_FOO" rather than in auto variables. The perl core wraps "setjmp()" etc in the macros "JMPENV_PUSH" and "JMPENV_JUMP". The basic rule of perl exceptions is that "exit", and "die" (in the absence of "eval") perform a JMPENV_JUMP(2), while "die" within "eval" does a JMPENV_JUMP(3). At entry points to perl, such as "perl_parse()", "perl_run()" and "call_sv(cv, G_EVAL)" each does a "JMPENV_PUSH", then enter a runops loop or whatever, and handle possible exception returns. For a 2 return, final cleanup is performed, such as popping stacks and calling "CHECK" or "END" blocks. Amongst other things, this is how scope cleanup still occurs during an "exit". If a "die" can find a "CxEVAL" block on the context stack, then the stack is popped to that level and the return op in that block is assigned to "PL_restartop"; then a JMPENV_JUMP(3) is performed. This normally passes control back to the guard. In the case of "perl_run" and "call_sv", a non-null "PL_restartop" triggers re-entry to the runops loop. The is the normal way that "die" or "croak" is handled within an "eval". Sometimes ops are executed within an inner runops loop, such as tie, sort or overload code. In this case, something like sub FETCH { eval { die } } would cause a longjmp right back to the guard in "perl_run", popping both runops loops, which is clearly incorrect. One way to avoid this is for the tie code to do a "JMPENV_PUSH" before executing "FETCH" in the inner runops loop, but for efficiency reasons, perl in fact just sets a flag, using "CATCH_SET(TRUE)". The "pp_require", "pp_entereval" and "pp_entertry" ops check this flag, and if true, they call "docatch", which does a "JMPENV_PUSH" and starts a new runops level to execute the code, rather than doing it on the current loop. As a further optimisation, on exit from the eval block in the "FETCH", execution of the code following the block is still carried on in the inner loop. When an exception is raised, "docatch" compares the "JMPENV" level of the "CxEVAL" with "PL_top_env" and if they differ, just re-throws the exception. In this way any inner loops get popped. Here's an example. 1: eval { tie @a, 'A' }; 2: sub A::TIEARRAY { 3: eval { die }; 4: die; 5: } To run this code, "perl_run" is called, which does a "JMPENV_PUSH" then enters a runops loop. This loop executes the eval and tie ops on line 1, with the eval pushing a "CxEVAL" onto the context stack. The "pp_tie" does a "CATCH_SET(TRUE)", then starts a second runops loop to execute the body of "TIEARRAY". When it executes the entertry op on line 3, "CATCH_GET" is true, so "pp_entertry" calls "docatch" which does a "JMPENV_PUSH" and starts a third runops loop, which then executes the die op. At this point the C call stack looks like this: Perl_pp_die Perl_runops # third loop S_docatch_body S_docatch Perl_pp_entertry Perl_runops # second loop S_call_body Perl_call_sv Perl_pp_tie Perl_runops # first loop S_run_body perl_run main and the context and data stacks, as shown by "-Dstv", look like: STACK 0: MAIN CX 0: BLOCK => CX 1: EVAL => AV() PV("A"