Query: perlinterp
OS: debian
Section: 1
Links: debian man pages | All man pages
Forums: Unix Linux Community | Forum Categories
Format: Original Unix Latex Style Formatted with HTML and a Horizontal Scroll Bar
PERLINTERP(1) Perl Programmers Reference Guide PERLINTERP(1)NAMEperlinterp - An overview of the Perl interpreterDESCRIPTIONThis document provides an overview of how the Perl interpreter works at the level of C code, along with pointers to the relevant C source code files.ELEMENTS OF THE INTERPRETERThe work of the interpreter has two main stages: compiling the code into the internal representation, or bytecode, and then executing it. "Compiled code" in perlguts explains exactly how the compilation stage happens. Here is a short breakdown of perl's operation: Startup The action begins in perlmain.c. (or miniperlmain.c for miniperl) This is very high-level code, enough to fit on a single screen, and it resembles the code found in perlembed; most of the real action takes place in perl.c perlmain.c is generated by "ExtUtils::Miniperl" from miniperlmain.c at make time, so you should make perl to follow this along. First, perlmain.c allocates some memory and constructs a Perl interpreter, along these lines: 1 PERL_SYS_INIT3(&argc,&argv,&env); 2 3 if (!PL_do_undump) { 4 my_perl = perl_alloc(); 5 if (!my_perl) 6 exit(1); 7 perl_construct(my_perl); 8 PL_perl_destruct_level = 0; 9 } Line 1 is a macro, and its definition is dependent on your operating system. Line 3 references "PL_do_undump", a global variable - all global variables in Perl start with "PL_". This tells you whether the current running program was created with the "-u" flag to perl and then undump, which means it's going to be false in any sane context. Line 4 calls a function in perl.c to allocate memory for a Perl interpreter. It's quite a simple function, and the guts of it looks like this: my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); Here you see an example of Perl's system abstraction, which we'll see later: "PerlMem_malloc" is either your system's "malloc", or Perl's own "malloc" as defined in malloc.c if you selected that option at configure time. Next, in line 7, we construct the interpreter using perl_construct, also in perl.c; this sets up all the special variables that Perl needs, the stacks, and so on. Now we pass Perl the command line options, and tell it to go: exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); if (!exitstatus) perl_run(my_perl); exitstatus = perl_destruct(my_perl); perl_free(my_perl); "perl_parse" is actually a wrapper around "S_parse_body", as defined in perl.c, which processes the command line options, sets up any statically linked XS modules, opens the program and calls "yyparse" to parse it. Parsing The aim of this stage is to take the Perl source, and turn it into an op tree. We'll see what one of those looks like later. Strictly speaking, there's three things going on here. "yyparse", the parser, lives in perly.c, although you're better off reading the original YACC input in perly.y. (Yes, Virginia, there is a YACC grammar for Perl!) The job of the parser is to take your code and "understand" it, splitting it into sentences, deciding which operands go with which operators and so on. The parser is nobly assisted by the lexer, which chunks up your input into tokens, and decides what type of thing each token is: a variable name, an operator, a bareword, a subroutine, a core function, and so on. The main point of entry to the lexer is "yylex", and that and its associated routines can be found in toke.c. Perl isn't much like other computer languages; it's highly context sensitive at times, it can be tricky to work out what sort of token something is, or where a token ends. As such, there's a lot of interplay between the tokeniser and the parser, which can get pretty frightening if you're not used to it. As the parser understands a Perl program, it builds up a tree of operations for the interpreter to perform during execution. The routines which construct and link together the various operations are to be found in op.c, and will be examined later. Optimization Now the parsing stage is complete, and the finished tree represents the operations that the Perl interpreter needs to perform to execute our program. Next, Perl does a dry run over the tree looking for optimisations: constant expressions such as "3 + 4" will be computed now, and the optimizer will also see if any multiple operations can be replaced with a single one. For instance, to fetch the variable $foo, instead of grabbing the glob *foo and looking at the scalar component, the optimizer fiddles the op tree to use a function which directly looks up the scalar in question. The main optimizer is "peep" in op.c, and many ops have their own optimizing functions. Running Now we're finally ready to go: we have compiled Perl byte code, and all that's left to do is run it. The actual execution is done by the "runops_standard" function in run.c; more specifically, it's done by these three innocent looking lines: while ((PL_op = PL_op->op_ppaddr(aTHX))) { PERL_ASYNC_CHECK(); } You may be more comfortable with the Perl version of that: PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; Well, maybe not. Anyway, each op contains a function pointer, which stipulates the function which will actually carry out the operation. This function will return the next op in the sequence - this allows for things like "if" which choose the next op dynamically at run time. The "PERL_ASYNC_CHECK" makes sure that things like signals interrupt execution if required. The actual functions called are known as PP code, and they're spread between four files: pp_hot.c contains the "hot" code, which is most often used and highly optimized, pp_sys.c contains all the system-specific functions, pp_ctl.c contains the functions which implement control structures ("if", "while" and the like) and pp.c contains everything else. These are, if you like, the C code for Perl's built-in functions and operators. Note that each "pp_" function is expected to return a pointer to the next op. Calls to perl subs (and eval blocks) are handled within the same runops loop, and do not consume extra space on the C stack. For example, "pp_entersub" and "pp_entertry" just push a "CxSUB" or "CxEVAL" block struct onto the context stack which contain the address of the op following the sub call or eval. They then return the first op of that sub or eval block, and so execution continues of that sub or block. Later, a "pp_leavesub" or "pp_leavetry" op pops the "CxSUB" or "CxEVAL", retrieves the return op from it, and returns it. Exception handing Perl's exception handing (i.e. "die" etc.) is built on top of the low-level "setjmp()"/"longjmp()" C-library functions. These basically provide a way to capture the current PC and SP registers and later restore them; i.e. a "longjmp()" continues at the point in code where a previous "setjmp()" was done, with anything further up on the C stack being lost. This is why code should always save values using "SAVE_FOO" rather than in auto variables. The perl core wraps "setjmp()" etc in the macros "JMPENV_PUSH" and "JMPENV_JUMP". The basic rule of perl exceptions is that "exit", and "die" (in the absence of "eval") perform a JMPENV_JUMP(2), while "die" within "eval" does a JMPENV_JUMP(3). At entry points to perl, such as "perl_parse()", "perl_run()" and "call_sv(cv, G_EVAL)" each does a "JMPENV_PUSH", then enter a runops loop or whatever, and handle possible exception returns. For a 2 return, final cleanup is performed, such as popping stacks and calling "CHECK" or "END" blocks. Amongst other things, this is how scope cleanup still occurs during an "exit". If a "die" can find a "CxEVAL" block on the context stack, then the stack is popped to that level and the return op in that block is assigned to "PL_restartop"; then a JMPENV_JUMP(3) is performed. This normally passes control back to the guard. In the case of "perl_run" and "call_sv", a non-null "PL_restartop" triggers re-entry to the runops loop. The is the normal way that "die" or "croak" is handled within an "eval". Sometimes ops are executed within an inner runops loop, such as tie, sort or overload code. In this case, something like sub FETCH { eval { die } } would cause a longjmp right back to the guard in "perl_run", popping both runops loops, which is clearly incorrect. One way to avoid this is for the tie code to do a "JMPENV_PUSH" before executing "FETCH" in the inner runops loop, but for efficiency reasons, perl in fact just sets a flag, using "CATCH_SET(TRUE)". The "pp_require", "pp_entereval" and "pp_entertry" ops check this flag, and if true, they call "docatch", which does a "JMPENV_PUSH" and starts a new runops level to execute the code, rather than doing it on the current loop. As a further optimisation, on exit from the eval block in the "FETCH", execution of the code following the block is still carried on in the inner loop. When an exception is raised, "docatch" compares the "JMPENV" level of the "CxEVAL" with "PL_top_env" and if they differ, just re-throws the exception. In this way any inner loops get popped. Here's an example. 1: eval { tie @a, 'A' }; 2: sub A::TIEARRAY { 3: eval { die }; 4: die; 5: } To run this code, "perl_run" is called, which does a "JMPENV_PUSH" then enters a runops loop. This loop executes the eval and tie ops on line 1, with the eval pushing a "CxEVAL" onto the context stack. The "pp_tie" does a "CATCH_SET(TRUE)", then starts a second runops loop to execute the body of "TIEARRAY". When it executes the entertry op on line 3, "CATCH_GET" is true, so "pp_entertry" calls "docatch" which does a "JMPENV_PUSH" and starts a third runops loop, which then executes the die op. At this point the C call stack looks like this: Perl_pp_die Perl_runops # third loop S_docatch_body S_docatch Perl_pp_entertry Perl_runops # second loop S_call_body Perl_call_sv Perl_pp_tie Perl_runops # first loop S_run_body perl_run main and the context and data stacks, as shown by "-Dstv", look like: STACK 0: MAIN CX 0: BLOCK => CX 1: EVAL => AV() PV("A"