PERLHACK(1) Perl Programmers Reference Guide PERLHACK(1)
NAME
perlhack - How to hack at the Perl internals
DESCRIPTION
This document attempts to explain how Perl development takes place, and ends with some suggestions for people wanting to become bona fide
porters.
The perl5-porters mailing list is where the Perl standard distribution is maintained and developed. The list can get anywhere from 10 to
150 messages a day, depending on the heatedness of the debate. Most days there are two or three patches, extensions, features, or bugs
being discussed at a time.
A searchable archive of the list is at either:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
or
http://archive.develooper.com/perl5-porters@perl.org/
List subscribers (the porters themselves) come in several flavours. Some are quiet curious lurkers, who rarely pitch in and instead watch
the ongoing development to ensure they're forewarned of new changes or features in Perl. Some are representatives of vendors, who are
there to make sure that Perl continues to compile and work on their platforms. Some patch any reported bug that they know how to fix, some
are actively patching their pet area (threads, Win32, the regexp engine), while others seem to do nothing but complain. In other words,
it's your usual mix of technical people.
Over this group of porters presides Larry Wall. He has the final word in what does and does not change in the Perl language. Various
releases of Perl are shepherded by a "pumpking", a porter responsible for gathering patches, deciding on a patch-by-patch, feature-by-fea-
ture basis what will and will not go into the release. For instance, Gurusamy Sarathy was the pumpking for the 5.6 release of Perl, and
Jarkko Hietaniemi was the pumpking for the 5.8 release, and Rafael Garcia-Suarez holds the pumpking crown for the 5.10 release.
In addition, various people are pumpkings for different things. For instance, Andy Dougherty and Jarkko Hietaniemi did a grand job as the
Configure pumpkin up till the 5.8 release. For the 5.10 release H.Merijn Brand took over.
Larry sees Perl development along the lines of the US government: there's the Legislature (the porters), the Executive branch (the pump-
kings), and the Supreme Court (Larry). The legislature can discuss and submit patches to the executive branch all they like, but the exec-
utive branch is free to veto them. Rarely, the Supreme Court will side with the executive branch over the legislature, or the legislature
over the executive branch. Mostly, however, the legislature and the executive branch are supposed to get along and work out their differ-
ences without impeachment or court cases.
You might sometimes see reference to Rule 1 and Rule 2. Larry's power as Supreme Court is expressed in The Rules:
1 Larry is always by definition right about how Perl should behave. This means he has final veto power on the core functionality.
2 Larry is allowed to change his mind about any matter at a later date, regardless of whether he previously invoked Rule 1.
Got that? Larry is always right, even when he was wrong. It's rare to see either Rule exercised, but they are often alluded to.
New features and extensions to the language are contentious, because the criteria used by the pumpkings, Larry, and other porters to decide
which features should be implemented and incorporated are not codified in a few small design goals as with some other languages. Instead,
the heuristics are flexible and often difficult to fathom. Here is one person's list, roughly in decreasing order of importance, of
heuristics that new features have to be weighed against:
Does concept match the general goals of Perl?
These haven't been written anywhere in stone, but one approximation is:
1. Keep it fast, simple, and useful.
2. Keep features/concepts as orthogonal as possible.
3. No arbitrary limits (platforms, data sizes, cultures).
4. Keep it open and exciting to use/patch/advocate Perl everywhere.
5. Either assimilate new technologies, or build bridges to them.
Where is the implementation?
All the talk in the world is useless without an implementation. In almost every case, the person or people who argue for a new feature
will be expected to be the ones who implement it. Porters capable of coding new features have their own agendas, and are not available
to implement your (possibly good) idea.
Backwards compatibility
It's a cardinal sin to break existing Perl programs. New warnings are contentious--some say that a program that emits warnings is not
broken, while others say it is. Adding keywords has the potential to break programs, changing the meaning of existing token sequences
or functions might break programs.
Could it be a module instead?
Perl 5 has extension mechanisms, modules and XS, specifically to avoid the need to keep changing the Perl interpreter. You can write
modules that export functions, you can give those functions prototypes so they can be called like built-in functions, you can even
write XS code to mess with the runtime data structures of the Perl interpreter if you want to implement really complicated things. If
it can be done in a module instead of in the core, it's highly unlikely to be added.
Is the feature generic enough?
Is this something that only the submitter wants added to the language, or would it be broadly useful? Sometimes, instead of adding a
feature with a tight focus, the porters might decide to wait until someone implements the more generalized feature. For instance,
instead of implementing a "delayed evaluation" feature, the porters are waiting for a macro system that would permit delayed evaluation
and much more.
Does it potentially introduce new bugs?
Radical rewrites of large chunks of the Perl interpreter have the potential to introduce new bugs. The smaller and more localized the
change, the better.
Does it preclude other desirable features?
A patch is likely to be rejected if it closes off future avenues of development. For instance, a patch that placed a true and final
interpretation on prototypes is likely to be rejected because there are still options for the future of prototypes that haven't been
addressed.
Is the implementation robust?
Good patches (tight code, complete, correct) stand more chance of going in. Sloppy or incorrect patches might be placed on the back
burner until the pumpking has time to fix, or might be discarded altogether without further notice.
Is the implementation generic enough to be portable?
The worst patches make use of a system-specific features. It's highly unlikely that non-portable additions to the Perl language will
be accepted.
Is the implementation tested?
Patches which change behaviour (fixing bugs or introducing new features) must include regression tests to verify that everything works
as expected. Without tests provided by the original author, how can anyone else changing perl in the future be sure that they haven't
unwittingly broken the behaviour the patch implements? And without tests, how can the patch's author be confident that his/her hard
work put into the patch won't be accidentally thrown away by someone in the future?
Is there enough documentation?
Patches without documentation are probably ill-thought out or incomplete. Nothing can be added without documentation, so submitting a
patch for the appropriate manpages as well as the source code is always a good idea.
Is there another way to do it?
Larry said "Although the Perl Slogan is There's More Than One Way to Do It, I hesitate to make 10 ways to do something". This is a
tricky heuristic to navigate, though--one man's essential addition is another man's pointless cruft.
Does it create too much work?
Work for the pumpking, work for Perl programmers, work for module authors, ... Perl is supposed to be easy.
Patches speak louder than words
Working code is always preferred to pie-in-the-sky ideas. A patch to add a feature stands a much higher chance of making it to the
language than does a random feature request, no matter how fervently argued the request might be. This ties into "Will it be useful?",
as the fact that someone took the time to make the patch demonstrates a strong desire for the feature.
If you're on the list, you might hear the word "core" bandied around. It refers to the standard distribution. "Hacking on the core" means
you're changing the C source code to the Perl interpreter. "A core module" is one that ships with Perl.
Keeping in sync
The source code to the Perl interpreter, in its different versions, is kept in a repository managed by a revision control system ( which is
currently the Perforce program, see http://perforce.com/ ). The pumpkings and a few others have access to the repository to check in
changes. Periodically the pumpking for the development version of Perl will release a new version, so the rest of the porters can see
what's changed. The current state of the main trunk of repository, and patches that describe the individual changes that have happened
since the last public release are available at this location:
http://public.activestate.com/pub/apc/
ftp://public.activestate.com/pub/apc/
If you're looking for a particular change, or a change that affected a particular set of files, you may find the Perl Repository Browser
useful:
http://public.activestate.com/cgi-bin/perlbrowse
You may also want to subscribe to the perl5-changes mailing list to receive a copy of each patch that gets submitted to the maintenance and
development "branches" of the perl repository. See http://lists.perl.org/ for subscription information.
If you are a member of the perl5-porters mailing list, it is a good thing to keep in touch with the most recent changes. If not only to
verify if what you would have posted as a bug report isn't already solved in the most recent available perl development branch, also known
as perl-current, bleading edge perl, bleedperl or bleadperl.
Needless to say, the source code in perl-current is usually in a perpetual state of evolution. You should expect it to be very buggy. Do
not use it for any purpose other than testing and development.
Keeping in sync with the most recent branch can be done in several ways, but the most convenient and reliable way is using rsync, available
at ftp://rsync.samba.org/pub/rsync/ . (You can also get the most recent branch by FTP.)
If you choose to keep in sync using rsync, there are two approaches to doing so:
rsync'ing the source tree
Presuming you are in the directory where your perl source resides and you have rsync installed and available, you can "upgrade" to the
bleadperl using:
# rsync -avz rsync://public.activestate.com/perl-current/ .
This takes care of updating every single item in the source tree to the latest applied patch level, creating files that are new (to
your distribution) and setting date/time stamps of existing files to reflect the bleadperl status.
Note that this will not delete any files that were in '.' before the rsync. Once you are sure that the rsync is running correctly, run
it with the --delete and the --dry-run options like this:
# rsync -avz --delete --dry-run rsync://public.activestate.com/perl-current/ .
This will simulate an rsync run that also deletes files not present in the bleadperl master copy. Observe the results from this run
closely. If you are sure that the actual run would delete no files precious to you, you could remove the '--dry-run' option.
You can than check what patch was the latest that was applied by looking in the file .patch, which will show the number of the latest
patch.
If you have more than one machine to keep in sync, and not all of them have access to the WAN (so you are not able to rsync all the
source trees to the real source), there are some ways to get around this problem.
Using rsync over the LAN
Set up a local rsync server which makes the rsynced source tree available to the LAN and sync the other machines against this
directory.
From http://rsync.samba.org/README.html :
"Rsync uses rsh or ssh for communication. It does not need to be
setuid and requires no special privileges for installation. It
does not require an inetd entry or a daemon. You must, however,
have a working rsh or ssh system. Using ssh is recommended for
its security features."
Using pushing over the NFS
Having the other systems mounted over the NFS, you can take an active pushing approach by checking the just updated tree against
the other not-yet synced trees. An example would be
#!/usr/bin/perl -w
use strict;
use File::Copy;
my %MF = map {
m/(S+)/;
$1 => [ (stat $1)[2, 7, 9] ]; # mode, size, mtime
} `cat MANIFEST`;
my %remote = map { $_ => "/$_/pro/3gl/CPAN/perl-5.7.1" } qw(host1 host2);
foreach my $host (keys %remote) {
unless (-d $remote{$host}) {
print STDERR "Cannot Xsync for host $host
";
next;
}
foreach my $file (keys %MF) {
my $rfile = "$remote{$host}/$file";
my ($mode, $size, $mtime) = (stat $rfile)[2, 7, 9];
defined $size or ($mode, $size, $mtime) = (0, 0, 0);
$size == $MF{$file}[1] && $mtime == $MF{$file}[2] and next;
printf "%4s %-34s %8d %9d %8d %9d
",
$host, $file, $MF{$file}[1], $MF{$file}[2], $size, $mtime;
unlink $rfile;
copy ($file, $rfile);
utime time, $MF{$file}[2], $rfile;
chmod $MF{$file}[0], $rfile;
}
}
though this is not perfect. It could be improved with checking file checksums before updating. Not all NFS systems support reliable
utime support (when used over the NFS).
rsync'ing the patches
The source tree is maintained by the pumpking who applies patches to the files in the tree. These patches are either created by the
pumpking himself using "diff -c" after updating the file manually or by applying patches sent in by posters on the perl5-porters list.
These patches are also saved and rsync'able, so you can apply them yourself to the source files.
Presuming you are in a directory where your patches reside, you can get them in sync with
# rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
This makes sure the latest available patch is downloaded to your patch directory.
It's then up to you to apply these patches, using something like
# last="`cat ../perl-current/.patch`.gz"
# rsync -avz rsync://public.activestate.com/perl-current-diffs/ .
# find . -name '*.gz' -newer $last -exec gzcat {} ; >blead.patch
# cd ../perl-current
# patch -p1 -N <../perl-current-diffs/blead.patch
or, since this is only a hint towards how it works, use CPAN-patchaperl from Andreas Konig to have better control over the patching
process.
Why rsync the source tree
It's easier to rsync the source tree
Since you don't have to apply the patches yourself, you are sure all files in the source tree are in the right state.
It's more reliable
While both the rsync-able source and patch areas are automatically updated every few minutes, keep in mind that applying patches may
sometimes mean careful hand-holding, especially if your version of the "patch" program does not understand how to deal with new files,
files with 8-bit characters, or files without trailing newlines.
Why rsync the patches
It's easier to rsync the patches
If you have more than one machine that you want to keep in track with bleadperl, it's easier to rsync the patches only once and then
apply them to all the source trees on the different machines.
In case you try to keep in pace on 5 different machines, for which only one of them has access to the WAN, rsync'ing all the source
trees should than be done 5 times over the NFS. Having rsync'ed the patches only once, I can apply them to all the source trees auto-
matically. Need you say more ;-)
It's a good reference
If you do not only like to have the most recent development branch, but also like to fix bugs, or extend features, you want to dive
into the sources. If you are a seasoned perl core diver, you don't need no manuals, tips, roadmaps, perlguts.pod or other aids to find
your way around. But if you are a starter, the patches may help you in finding where you should start and how to change the bits that
bug you.
The file Changes is updated on occasions the pumpking sees as his own little sync points. On those occasions, he releases a tar-ball of
the current source tree (i.e. perl@7582.tar.gz), which will be an excellent point to start with when choosing to use the 'rsync the
patches' scheme. Starting with perl@7582, which means a set of source files on which the latest applied patch is number 7582, you apply
all succeeding patches available from then on (7583, 7584, ...).
You can use the patches later as a kind of search archive.
Finding a start point
If you want to fix/change the behaviour of function/feature Foo, just scan the patches for patches that mention Foo either in the
subject, the comments, or the body of the fix. A good chance the patch shows you the files that are affected by that patch which
are very likely to be the starting point of your journey into the guts of perl.
Finding how to fix a bug
If you've found where the function/feature Foo misbehaves, but you don't know how to fix it (but you do know the change you want to
make), you can, again, peruse the patches for similar changes and look how others apply the fix.
Finding the source of misbehaviour
When you keep in sync with bleadperl, the pumpking would love to see that the community efforts really work. So after each of his
sync points, you are to 'make test' to check if everything is still in working order. If it is, you do 'make ok', which will send
an OK report to perlbug@perl.org. (If you do not have access to a mailer from the system you just finished successfully 'make
test', you can do 'make okfile', which creates the file "perl.ok", which you can than take to your favourite mailer and mail your-
self).
But of course, as always, things will not always lead to a success path, and one or more test do not pass the 'make test'. Before
sending in a bug report (using 'make nok' or 'make nokfile'), check the mailing list if someone else has reported the bug already
and if so, confirm it by replying to that message. If not, you might want to trace the source of that misbehaviour before sending
in the bug, which will help all the other porters in finding the solution.
Here the saved patches come in very handy. You can check the list of patches to see which patch changed what file and what change
caused the misbehaviour. If you note that in the bug report, it saves the one trying to solve it, looking for that point.
If searching the patches is too bothersome, you might consider using perl's bugtron to find more information about discussions and ram-
blings on posted bugs.
If you want to get the best of both worlds, rsync both the source tree for convenience, reliability and ease and rsync the patches for
reference.
Working with the source
Because you cannot use the Perforce client, you cannot easily generate diffs against the repository, nor will merges occur when you update
via rsync. If you edit a file locally and then rsync against the latest source, changes made in the remote copy will overwrite your local
versions!
The best way to deal with this is to maintain a tree of symlinks to the rsync'd source. Then, when you want to edit a file, you remove the
symlink, copy the real file into the other tree, and edit it. You can then diff your edited file against the original to generate a patch,
and you can safely update the original tree.
Perl's Configure script can generate this tree of symlinks for you. The following example assumes that you have used rsync to pull a copy
of the Perl source into the perl-rsync directory. In the directory above that one, you can execute the following commands:
mkdir perl-dev
cd perl-dev
../perl-rsync/Configure -Dmksymlinks -Dusedevel -D"optimize=-g"
This will start the Perl configuration process. After a few prompts, you should see something like this:
Symbolic links are supported.
Checking how to test for symbolic links...
Your builtin 'test -h' may be broken.
Trying external '/usr/bin/test -h'.
You can test for symbolic links with '/usr/bin/test -h'.
Creating the symbolic links...
(First creating the subdirectories...)
(Then creating the symlinks...)
The specifics may vary based on your operating system, of course. After you see this, you can abort the Configure script, and you will see
that the directory you are in has a tree of symlinks to the perl-rsync directories and files.
If you plan to do a lot of work with the Perl source, here are some Bourne shell script functions that can make your life easier:
function edit {
if [ -L $1 ]; then
mv $1 $1.orig
cp $1.orig $1
vi $1
else
vi $1
fi
}
function unedit {
if [ -L $1.orig ]; then
rm $1
mv $1.orig $1
fi
}
Replace "vi" with your favorite flavor of editor.
Here is another function which will quickly generate a patch for the files which have been edited in your symlink tree:
mkpatchorig() {
local diffopts
for f in `find . -name '*.orig' | sed s,^./,,`
do
case `echo $f | sed 's,.orig$,,;s,.*.,,'` in
c) diffopts=-p ;;
pod) diffopts='-F^=' ;;
*) diffopts= ;;
esac
diff -du $diffopts $f `echo $f | sed 's,.orig$,,'`
done
}
This function produces patches which include enough context to make your changes obvious. This makes it easier for the Perl pumpking(s) to
review them when you send them to the perl5-porters list, and that means they're more likely to get applied.
This function assumed a GNU diff, and may require some tweaking for other diff variants.
Perlbug administration
There is a single remote administrative interface for modifying bug status, category, open issues etc. using the RT bugtracker system,
maintained by Robert Spier. Become an administrator, and close any bugs you can get your sticky mitts on:
http://bugs.perl.org/
To email the bug system administrators:
"perlbug-admin" <perlbug-admin@perl.org>
Submitting patches
Always submit patches to perl5-porters@perl.org. If you're patching a core module and there's an author listed, send the author a copy
(see "Patching a core module"). This lets other porters review your patch, which catches a surprising number of errors in patches. Either
use the diff program (available in source code form from ftp://ftp.gnu.org/pub/gnu/ , or use Johan Vromans' makepatch (available from
CPAN/authors/id/JV/). Unified diffs are preferred, but context diffs are accepted. Do not send RCS-style diffs or diffs without context
lines. More information is given in the Porting/patching.pod file in the Perl source distribution. Please patch against the latest devel-
opment version. (e.g., even if you're fixing a bug in the 5.8 track, patch against the latest development version rsynced from rsync://pub-
lic.activestate.com/perl-current/ )
If changes are accepted, they are applied to the development branch. Then the 5.8 pumpking decides which of those patches is to be back-
ported to the maint branch. Only patches that survive the heat of the development branch get applied to maintenance versions.
Your patch should update the documentation and test suite. See "Writing a test". If you have added or removed files in the distribution,
edit the MANIFEST file accordingly, sort the MANIFEST file using "make manisort", and include those changes as part of your patch.
Patching documentation also follows the same order: if accepted, a patch is first applied to development, and if relevant then it's back-
ported to maintenance. (With an exception for some patches that document behaviour that only appears in the maintenance branch, but which
has changed in the development version.)
To report a bug in Perl, use the program perlbug which comes with Perl (if you can't get Perl to work, send mail to the address perl-
bug@perl.org or perlbug@perl.com). Reporting bugs through perlbug feeds into the automated bug-tracking system, access to which is pro-
vided through the web at http://rt.perl.org/rt3/ . It often pays to check the archives of the perl5-porters mailing list to see whether
the bug you're reporting has been reported before, and if so whether it was considered a bug. See above for the location of the searchable
archives.
The CPAN testers ( http://testers.cpan.org/ ) are a group of volunteers who test CPAN modules on a variety of platforms. Perl Smokers (
http://www.nntp.perl.org/group/perl.daily-build and http://www.nntp.perl.org/group/perl.daily-build.reports/ ) automatically test Perl
source releases on platforms with various configurations. Both efforts welcome volunteers. In order to get involved in smoke testing of
the perl itself visit <http://search.cpan.org/dist/Test-Smoke>. In order to start smoke testing CPAN modules visit
<http://search.cpan.org/dist/CPAN-YACSmoke/> or <http://search.cpan.org/dist/POE-Component-CPAN-YACSmoke/> or
<http://search.cpan.org/dist/CPAN-Reporter/>.
It's a good idea to read and lurk for a while before chipping in. That way you'll get to see the dynamic of the conversations, learn the
personalities of the players, and hopefully be better prepared to make a useful contribution when do you speak up.
If after all this you still think you want to join the perl5-porters mailing list, send mail to perl5-porters-subscribe@perl.org. To
unsubscribe, send mail to perl5-porters-unsubscribe@perl.org.
To hack on the Perl guts, you'll need to read the following things:
perlguts
This is of paramount importance, since it's the documentation of what goes where in the Perl source. Read it over a couple of times and
it might start to make sense - don't worry if it doesn't yet, because the best way to study it is to read it in conjunction with poking
at Perl source, and we'll do that later on.
You might also want to look at Gisle Aas's illustrated perlguts - there's no guarantee that this will be absolutely up-to-date with the
latest documentation in the Perl core, but the fundamentals will be right. ( http://gisle.aas.no/perl/illguts/ )
perlxstut and perlxs
A working knowledge of XSUB programming is incredibly useful for core hacking; XSUBs use techniques drawn from the PP code, the portion
of the guts that actually executes a Perl program. It's a lot gentler to learn those techniques from simple examples and explanation
than from the core itself.
perlapi
The documentation for the Perl API explains what some of the internal functions do, as well as the many macros used in the source.
Porting/pumpkin.pod
This is a collection of words of wisdom for a Perl porter; some of it is only useful to the pumpkin holder, but most of it applies to
anyone wanting to go about Perl development.
The perl5-porters FAQ
This should be available from http://dev.perl.org/perl5/docs/p5p-faq.html . It contains hints on reading perl5-porters, information on
how perl5-porters works and how Perl development in general works.
Finding Your Way Around
Perl maintenance can be split into a number of areas, and certain people (pumpkins) will have responsibility for each area. These areas
sometimes correspond to files or directories in the source kit. Among the areas are:
Core modules
Modules shipped as part of the Perl core live in the lib/ and ext/ subdirectories: lib/ is for the pure-Perl modules, and ext/ contains
the core XS modules.
Tests
There are tests for nearly all the modules, built-ins and major bits of functionality. Test files all have a .t suffix. Module tests
live in the lib/ and ext/ directories next to the module being tested. Others live in t/. See "Writing a test"
Documentation
Documentation maintenance includes looking after everything in the pod/ directory, (as well as contributing new documentation) and the
documentation to the modules in core.
Configure
The configure process is the way we make Perl portable across the myriad of operating systems it supports. Responsibility for the con-
figure, build and installation process, as well as the overall portability of the core code rests with the configure pumpkin - others
help out with individual operating systems.
The files involved are the operating system directories, (win32/, os2/, vms/ and so on) the shell scripts which generate config.h and
Makefile, as well as the metaconfig files which generate Configure. (metaconfig isn't included in the core distribution.)
Interpreter
And of course, there's the core of the Perl interpreter itself. Let's have a look at that in a little more detail.
Before we leave looking at the layout, though, don't forget that MANIFEST contains not only the file names in the Perl distribution, but
short descriptions of what's in them, too. For an overview of the important files, try this:
perl -lne 'print if /^[^/]+.[ch]s+/' MANIFEST
Elements of the interpreter
The work of the interpreter has two main stages: compiling the code into the internal representation, or bytecode, and then executing it.
"Compiled code" in perlguts explains exactly how the compilation stage happens.
Here is a short breakdown of perl's operation:
Startup
The action begins in perlmain.c. (or miniperlmain.c for miniperl) This is very high-level code, enough to fit on a single screen, and it
resembles the code found in perlembed; most of the real action takes place in perl.c
perlmain.c is generated by writemain from miniperlmain.c at make time, so you should make perl to follow this along.
First, perlmain.c allocates some memory and constructs a Perl interpreter, along these lines:
1 PERL_SYS_INIT3(&argc,&argv,&env);
2
3 if (!PL_do_undump) {
4 my_perl = perl_alloc();
5 if (!my_perl)
6 exit(1);
7 perl_construct(my_perl);
8 PL_perl_destruct_level = 0;
9 }
Line 1 is a macro, and its definition is dependent on your operating system. Line 3 references "PL_do_undump", a global variable - all
global variables in Perl start with "PL_". This tells you whether the current running program was created with the "-u" flag to perl and
then undump, which means it's going to be false in any sane context.
Line 4 calls a function in perl.c to allocate memory for a Perl interpreter. It's quite a simple function, and the guts of it looks like
this:
my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
Here you see an example of Perl's system abstraction, which we'll see later: "PerlMem_malloc" is either your system's "malloc", or
Perl's own "malloc" as defined in malloc.c if you selected that option at configure time.
Next, in line 7, we construct the interpreter using perl_construct, also in perl.c; this sets up all the special variables that Perl
needs, the stacks, and so on.
Now we pass Perl the command line options, and tell it to go:
exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
if (!exitstatus)
perl_run(my_perl);
exitstatus = perl_destruct(my_perl);
perl_free(my_perl);
"perl_parse" is actually a wrapper around "S_parse_body", as defined in perl.c, which processes the command line options, sets up any
statically linked XS modules, opens the program and calls "yyparse" to parse it.
Parsing
The aim of this stage is to take the Perl source, and turn it into an op tree. We'll see what one of those looks like later. Strictly
speaking, there's three things going on here.
"yyparse", the parser, lives in perly.c, although you're better off reading the original YACC input in perly.y. (Yes, Virginia, there is
a YACC grammar for Perl!) The job of the parser is to take your code and "understand" it, splitting it into sentences, deciding which
operands go with which operators and so on.
The parser is nobly assisted by the lexer, which chunks up your input into tokens, and decides what type of thing each token is: a vari-
able name, an operator, a bareword, a subroutine, a core function, and so on. The main point of entry to the lexer is "yylex", and that
and its associated routines can be found in toke.c. Perl isn't much like other computer languages; it's highly context sensitive at
times, it can be tricky to work out what sort of token something is, or where a token ends. As such, there's a lot of interplay between
the tokeniser and the parser, which can get pretty frightening if you're not used to it.
As the parser understands a Perl program, it builds up a tree of operations for the interpreter to perform during execution. The rou-
tines which construct and link together the various operations are to be found in op.c, and will be examined later.
Optimization
Now the parsing stage is complete, and the finished tree represents the operations that the Perl interpreter needs to perform to execute
our program. Next, Perl does a dry run over the tree looking for optimisations: constant expressions such as "3 + 4" will be computed
now, and the optimizer will also see if any multiple operations can be replaced with a single one. For instance, to fetch the variable
$foo, instead of grabbing the glob *foo and looking at the scalar component, the optimizer fiddles the op tree to use a function which
directly looks up the scalar in question. The main optimizer is "peep" in op.c, and many ops have their own optimizing functions.
Running
Now we're finally ready to go: we have compiled Perl byte code, and all that's left to do is run it. The actual execution is done by the
"runops_standard" function in run.c; more specifically, it's done by these three innocent looking lines:
while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
PERL_ASYNC_CHECK();
}
You may be more comfortable with the Perl version of that:
PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
Well, maybe not. Anyway, each op contains a function pointer, which stipulates the function which will actually carry out the operation.
This function will return the next op in the sequence - this allows for things like "if" which choose the next op dynamically at run
time. The "PERL_ASYNC_CHECK" makes sure that things like signals interrupt execution if required.
The actual functions called are known as PP code, and they're spread between four files: pp_hot.c contains the "hot" code, which is most
often used and highly optimized, pp_sys.c contains all the system-specific functions, pp_ctl.c contains the functions which implement
control structures ("if", "while" and the like) and pp.c contains everything else. These are, if you like, the C code for Perl's built-
in functions and operators.
Note that each "pp_" function is expected to return a pointer to the next op. Calls to perl subs (and eval blocks) are handled within
the same runops loop, and do not consume extra space on the C stack. For example, "pp_entersub" and "pp_entertry" just push a "CxSUB" or
"CxEVAL" block struct onto the context stack which contain the address of the op following the sub call or eval. They then return the
first op of that sub or eval block, and so execution continues of that sub or block. Later, a "pp_leavesub" or "pp_leavetry" op pops
the "CxSUB" or "CxEVAL", retrieves the return op from it, and returns it.
Exception handing
Perl's exception handing (i.e. "die" etc.) is built on top of the low-level "setjmp()"/"longjmp()" C-library functions. These basically
provide a way to capture the current PC and SP registers and later restore them; i.e. a "longjmp()" continues at the point in code
where a previous "setjmp()" was done, with anything further up on the C stack being lost. This is why code should always save values
using "SAVE_FOO" rather than in auto variables.
The perl core wraps "setjmp()" etc in the macros "JMPENV_PUSH" and "JMPENV_JUMP". The basic rule of perl exceptions is that "exit", and
"die" (in the absence of "eval") perform a JMPENV_JUMP(2), while "die" within "eval" does a JMPENV_JUMP(3).
At entry points to perl, such as "perl_parse()", "perl_run()" and "call_sv(cv, G_EVAL)" each does a "JMPENV_PUSH", then enter a runops
loop or whatever, and handle possible exception returns. For a 2 return, final cleanup is performed, such as popping stacks and calling
"CHECK" or "END" blocks. Amongst other things, this is how scope cleanup still occurs during an "exit".
If a "die" can find a "CxEVAL" block on the context stack, then the stack is popped to that level and the return op in that block is
assigned to "PL_restartop"; then a JMPENV_JUMP(3) is performed. This normally passes control back to the guard. In the case of
"perl_run" and "call_sv", a non-null "PL_restartop" triggers re-entry to the runops loop. The is the normal way that "die" or "croak" is
handled within an "eval".
Sometimes ops are executed within an inner runops loop, such as tie, sort or overload code. In this case, something like
sub FETCH { eval { die } }
would cause a longjmp right back to the guard in "perl_run", popping both runops loops, which is clearly incorrect. One way to avoid
this is for the tie code to do a "JMPENV_PUSH" before executing "FETCH" in the inner runops loop, but for efficiency reasons, perl in
fact just sets a flag, using "CATCH_SET(TRUE)". The "pp_require", "pp_entereval" and "pp_entertry" ops check this flag, and if true,
they call "docatch", which does a "JMPENV_PUSH" and starts a new runops level to execute the code, rather than doing it on the current
loop.
As a further optimisation, on exit from the eval block in the "FETCH", execution of the code following the block is still carried on in
the inner loop. When an exception is raised, "docatch" compares the "JMPENV" level of the "CxEVAL" with "PL_top_env" and if they dif-
fer, just re-throws the exception. In this way any inner loops get popped.
Here's an example.
1: eval { tie @a, 'A' };
2: sub A::TIEARRAY {
3: eval { die };
4: die;
5: }
To run this code, "perl_run" is called, which does a "JMPENV_PUSH" then enters a runops loop. This loop executes the eval and tie ops on
line 1, with the eval pushing a "CxEVAL" onto the context stack.
The "pp_tie" does a "CATCH_SET(TRUE)", then starts a second runops loop to execute the body of "TIEARRAY". When it executes the entertry
op on line 3, "CATCH_GET" is true, so "pp_entertry" calls "docatch" which does a "JMPENV_PUSH" and starts a third runops loop, which
then executes the die op. At this point the C call stack looks like this:
Perl_pp_die
Perl_runops # third loop
S_docatch_body
S_docatch
Perl_pp_entertry
Perl_runops # second loop
S_call_body
Perl_call_sv
Perl_pp_tie
Perl_runops # first loop
S_run_body
perl_run
main
and the context and data stacks, as shown by "-Dstv", look like:
STACK 0: MAIN
CX 0: BLOCK =>
CX 1: EVAL => AV() PV("A"