PERLINTERP(1) Perl Programmers Reference Guide PERLINTERP(1)
NAME
perlinterp - An overview of the Perl interpreter
DESCRIPTION
This document provides an overview of how the Perl interpreter works at the level of C code, along with pointers to the relevant C source
code files.
ELEMENTS OF THE INTERPRETER
The work of the interpreter has two main stages: compiling the code into the internal representation, or bytecode, and then executing it.
"Compiled code" in perlguts explains exactly how the compilation stage happens.
Here is a short breakdown of perl's operation:
Startup
The action begins in perlmain.c. (or miniperlmain.c for miniperl) This is very high-level code, enough to fit on a single screen, and it
resembles the code found in perlembed; most of the real action takes place in perl.c
perlmain.c is generated by "ExtUtils::Miniperl" from miniperlmain.c at make time, so you should make perl to follow this along.
First, perlmain.c allocates some memory and constructs a Perl interpreter, along these lines:
1 PERL_SYS_INIT3(&argc,&argv,&env);
2
3 if (!PL_do_undump) {
4 my_perl = perl_alloc();
5 if (!my_perl)
6 exit(1);
7 perl_construct(my_perl);
8 PL_perl_destruct_level = 0;
9 }
Line 1 is a macro, and its definition is dependent on your operating system. Line 3 references "PL_do_undump", a global variable - all
global variables in Perl start with "PL_". This tells you whether the current running program was created with the "-u" flag to perl and
then undump, which means it's going to be false in any sane context.
Line 4 calls a function in perl.c to allocate memory for a Perl interpreter. It's quite a simple function, and the guts of it looks like
this:
my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter));
Here you see an example of Perl's system abstraction, which we'll see later: "PerlMem_malloc" is either your system's "malloc", or Perl's
own "malloc" as defined in malloc.c if you selected that option at configure time.
Next, in line 7, we construct the interpreter using perl_construct, also in perl.c; this sets up all the special variables that Perl needs,
the stacks, and so on.
Now we pass Perl the command line options, and tell it to go:
exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL);
if (!exitstatus)
perl_run(my_perl);
exitstatus = perl_destruct(my_perl);
perl_free(my_perl);
"perl_parse" is actually a wrapper around "S_parse_body", as defined in perl.c, which processes the command line options, sets up any
statically linked XS modules, opens the program and calls "yyparse" to parse it.
Parsing
The aim of this stage is to take the Perl source, and turn it into an op tree. We'll see what one of those looks like later. Strictly
speaking, there's three things going on here.
"yyparse", the parser, lives in perly.c, although you're better off reading the original YACC input in perly.y. (Yes, Virginia, there is a
YACC grammar for Perl!) The job of the parser is to take your code and "understand" it, splitting it into sentences, deciding which
operands go with which operators and so on.
The parser is nobly assisted by the lexer, which chunks up your input into tokens, and decides what type of thing each token is: a variable
name, an operator, a bareword, a subroutine, a core function, and so on. The main point of entry to the lexer is "yylex", and that and its
associated routines can be found in toke.c. Perl isn't much like other computer languages; it's highly context sensitive at times, it can
be tricky to work out what sort of token something is, or where a token ends. As such, there's a lot of interplay between the tokeniser and
the parser, which can get pretty frightening if you're not used to it.
As the parser understands a Perl program, it builds up a tree of operations for the interpreter to perform during execution. The routines
which construct and link together the various operations are to be found in op.c, and will be examined later.
Optimization
Now the parsing stage is complete, and the finished tree represents the operations that the Perl interpreter needs to perform to execute
our program. Next, Perl does a dry run over the tree looking for optimisations: constant expressions such as "3 + 4" will be computed now,
and the optimizer will also see if any multiple operations can be replaced with a single one. For instance, to fetch the variable $foo,
instead of grabbing the glob *foo and looking at the scalar component, the optimizer fiddles the op tree to use a function which directly
looks up the scalar in question. The main optimizer is "peep" in op.c, and many ops have their own optimizing functions.
Running
Now we're finally ready to go: we have compiled Perl byte code, and all that's left to do is run it. The actual execution is done by the
"runops_standard" function in run.c; more specifically, it's done by these three innocent looking lines:
while ((PL_op = PL_op->op_ppaddr(aTHX))) {
PERL_ASYNC_CHECK();
}
You may be more comfortable with the Perl version of that:
PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}};
Well, maybe not. Anyway, each op contains a function pointer, which stipulates the function which will actually carry out the operation.
This function will return the next op in the sequence - this allows for things like "if" which choose the next op dynamically at run time.
The "PERL_ASYNC_CHECK" makes sure that things like signals interrupt execution if required.
The actual functions called are known as PP code, and they're spread between four files: pp_hot.c contains the "hot" code, which is most
often used and highly optimized, pp_sys.c contains all the system-specific functions, pp_ctl.c contains the functions which implement
control structures ("if", "while" and the like) and pp.c contains everything else. These are, if you like, the C code for Perl's built-in
functions and operators.
Note that each "pp_" function is expected to return a pointer to the next op. Calls to perl subs (and eval blocks) are handled within the
same runops loop, and do not consume extra space on the C stack. For example, "pp_entersub" and "pp_entertry" just push a "CxSUB" or
"CxEVAL" block struct onto the context stack which contain the address of the op following the sub call or eval. They then return the first
op of that sub or eval block, and so execution continues of that sub or block. Later, a "pp_leavesub" or "pp_leavetry" op pops the "CxSUB"
or "CxEVAL", retrieves the return op from it, and returns it.
Exception handing
Perl's exception handing (i.e. "die" etc.) is built on top of the low-level "setjmp()"/"longjmp()" C-library functions. These basically
provide a way to capture the current PC and SP registers and later restore them; i.e. a "longjmp()" continues at the point in code where a
previous "setjmp()" was done, with anything further up on the C stack being lost. This is why code should always save values using
"SAVE_FOO" rather than in auto variables.
The perl core wraps "setjmp()" etc in the macros "JMPENV_PUSH" and "JMPENV_JUMP". The basic rule of perl exceptions is that "exit", and
"die" (in the absence of "eval") perform a JMPENV_JUMP(2), while "die" within "eval" does a JMPENV_JUMP(3).
At entry points to perl, such as "perl_parse()", "perl_run()" and "call_sv(cv, G_EVAL)" each does a "JMPENV_PUSH", then enter a runops loop
or whatever, and handle possible exception returns. For a 2 return, final cleanup is performed, such as popping stacks and calling "CHECK"
or "END" blocks. Amongst other things, this is how scope cleanup still occurs during an "exit".
If a "die" can find a "CxEVAL" block on the context stack, then the stack is popped to that level and the return op in that block is
assigned to "PL_restartop"; then a JMPENV_JUMP(3) is performed. This normally passes control back to the guard. In the case of "perl_run"
and "call_sv", a non-null "PL_restartop" triggers re-entry to the runops loop. The is the normal way that "die" or "croak" is handled
within an "eval".
Sometimes ops are executed within an inner runops loop, such as tie, sort or overload code. In this case, something like
sub FETCH { eval { die } }
would cause a longjmp right back to the guard in "perl_run", popping both runops loops, which is clearly incorrect. One way to avoid this
is for the tie code to do a "JMPENV_PUSH" before executing "FETCH" in the inner runops loop, but for efficiency reasons, perl in fact just
sets a flag, using "CATCH_SET(TRUE)". The "pp_require", "pp_entereval" and "pp_entertry" ops check this flag, and if true, they call
"docatch", which does a "JMPENV_PUSH" and starts a new runops level to execute the code, rather than doing it on the current loop.
As a further optimisation, on exit from the eval block in the "FETCH", execution of the code following the block is still carried on in the
inner loop. When an exception is raised, "docatch" compares the "JMPENV" level of the "CxEVAL" with "PL_top_env" and if they differ, just
re-throws the exception. In this way any inner loops get popped.
Here's an example.
1: eval { tie @a, 'A' };
2: sub A::TIEARRAY {
3: eval { die };
4: die;
5: }
To run this code, "perl_run" is called, which does a "JMPENV_PUSH" then enters a runops loop. This loop executes the eval and tie ops on
line 1, with the eval pushing a "CxEVAL" onto the context stack.
The "pp_tie" does a "CATCH_SET(TRUE)", then starts a second runops loop to execute the body of "TIEARRAY". When it executes the entertry op
on line 3, "CATCH_GET" is true, so "pp_entertry" calls "docatch" which does a "JMPENV_PUSH" and starts a third runops loop, which then
executes the die op. At this point the C call stack looks like this:
Perl_pp_die
Perl_runops # third loop
S_docatch_body
S_docatch
Perl_pp_entertry
Perl_runops # second loop
S_call_body
Perl_call_sv
Perl_pp_tie
Perl_runops # first loop
S_run_body
perl_run
main
and the context and data stacks, as shown by "-Dstv", look like:
STACK 0: MAIN
CX 0: BLOCK =>
CX 1: EVAL => AV() PV("A"