PERLGUTS(1) Perl Programmers Reference Guide PERLGUTS(1)
NAME
perlguts - Introduction to the Perl API
DESCRIPTION
This document attempts to describe how to use the Perl API, as well as to provide some info on the basic workings of the Perl core. It is
far from complete and probably contains many errors. Please refer any questions or comments to the author below.
Variables
Datatypes
Perl has three typedefs that handle Perl's three main data types:
SV Scalar Value
AV Array Value
HV Hash Value
Each typedef has specific routines that manipulate the various data types.
What is an "IV"?
Perl uses a special typedef IV which is a simple signed integer type that is guaranteed to be large enough to hold a pointer (as well as an
integer). Additionally, there is the UV, which is simply an unsigned IV.
Perl also uses two special typedefs, I32 and I16, which will always be at least 32-bits and 16-bits long, respectively. (Again, there are
U32 and U16, as well.) They will usually be exactly 32 and 16 bits long, but on Crays they will both be 64 bits.
Working with SVs
An SV can be created and loaded with one command. There are five types of values that can be loaded: an integer value (IV), an unsigned
integer value (UV), a double (NV), a string (PV), and another scalar (SV).
The seven routines are:
SV* newSViv(IV);
SV* newSVuv(UV);
SV* newSVnv(double);
SV* newSVpv(const char*, STRLEN);
SV* newSVpvn(const char*, STRLEN);
SV* newSVpvf(const char*, ...);
SV* newSVsv(SV*);
"STRLEN" is an integer type (Size_t, usually defined as size_t in config.h) guaranteed to be large enough to represent the size of any
string that perl can handle.
In the unlikely case of a SV requiring more complex initialisation, you can create an empty SV with newSV(len). If "len" is 0 an empty SV
of type NULL is returned, else an SV of type PV is returned with len + 1 (for the NUL) bytes of storage allocated, accessible via SvPVX.
In both cases the SV has value undef.
SV *sv = newSV(0); /* no storage allocated */
SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage allocated */
To change the value of an already-existing SV, there are eight routines:
void sv_setiv(SV*, IV);
void sv_setuv(SV*, UV);
void sv_setnv(SV*, double);
void sv_setpv(SV*, const char*);
void sv_setpvn(SV*, const char*, STRLEN)
void sv_setpvf(SV*, const char*, ...);
void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool *);
void sv_setsv(SV*, SV*);
Notice that you can choose to specify the length of the string to be assigned by using "sv_setpvn", "newSVpvn", or "newSVpv", or you may
allow Perl to calculate the length by using "sv_setpv" or by specifying 0 as the second argument to "newSVpv". Be warned, though, that
Perl will determine the string's length by using "strlen", which depends on the string terminating with a NUL character.
The arguments of "sv_setpvf" are processed like "sprintf", and the formatted output becomes the value.
"sv_vsetpvfn" is an analogue of "vsprintf", but it allows you to specify either a pointer to a variable argument list or the address and
length of an array of SVs. The last argument points to a boolean; on return, if that boolean is true, then locale-specific information has
been used to format the string, and the string's contents are therefore untrustworthy (see perlsec). This pointer may be NULL if that
information is not important. Note that this function requires you to specify the length of the format.
The "sv_set*()" functions are not generic enough to operate on values that have "magic". See "Magic Virtual Tables" later in this docu-
ment.
All SVs that contain strings should be terminated with a NUL character. If it is not NUL-terminated there is a risk of core dumps and cor-
ruptions from code which passes the string to C functions or system calls which expect a NUL-terminated string. Perl's own functions typi-
cally add a trailing NUL for this reason. Nevertheless, you should be very careful when you pass a string stored in an SV to a C function
or system call.
To access the actual value that an SV points to, you can use the macros:
SvIV(SV*)
SvUV(SV*)
SvNV(SV*)
SvPV(SV*, STRLEN len)
SvPV_nolen(SV*)
which will automatically coerce the actual scalar type into an IV, UV, double, or string.
In the "SvPV" macro, the length of the string returned is placed into the variable "len" (this is a macro, so you do not use &len). If you
do not care what the length of the data is, use the "SvPV_nolen" macro. Historically the "SvPV" macro with the global variable "PL_na" has
been used in this case. But that can be quite inefficient because "PL_na" must be accessed in thread-local storage in threaded Perl. In
any case, remember that Perl allows arbitrary strings of data that may both contain NULs and might not be terminated by a NUL.
Also remember that C doesn't allow you to safely say "foo(SvPV(s, len), len);". It might work with your compiler, but it won't work for
everyone. Break this sort of statement up into separate assignments:
SV *s;
STRLEN len;
char * ptr;
ptr = SvPV(s, len);
foo(ptr, len);
If you want to know if the scalar value is TRUE, you can use:
SvTRUE(SV*)
Although Perl will automatically grow strings for you, if you need to force Perl to allocate more memory for your SV, you can use the macro
SvGROW(SV*, STRLEN newlen)
which will determine if more memory needs to be allocated. If so, it will call the function "sv_grow". Note that "SvGROW" can only
increase, not decrease, the allocated memory of an SV and that it does not automatically add a byte for the a trailing NUL (perl's own
string functions typically do "SvGROW(sv, len + 1)").
If you have an SV and want to know what kind of data Perl thinks is stored in it, you can use the following macros to check the type of SV
you have.
SvIOK(SV*)
SvNOK(SV*)
SvPOK(SV*)
You can get and set the current length of the string stored in an SV with the following macros:
SvCUR(SV*)
SvCUR_set(SV*, I32 val)
You can also get a pointer to the end of the string stored in the SV with the macro:
SvEND(SV*)
But note that these last three macros are valid only if "SvPOK()" is true.
If you want to append something to the end of string stored in an "SV*", you can use the following functions:
void sv_catpv(SV*, const char*);
void sv_catpvn(SV*, const char*, STRLEN);
void sv_catpvf(SV*, const char*, ...);
void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, I32, bool);
void sv_catsv(SV*, SV*);
The first function calculates the length of the string to be appended by using "strlen". In the second, you specify the length of the
string yourself. The third function processes its arguments like "sprintf" and appends the formatted output. The fourth function works
like "vsprintf". You can specify the address and length of an array of SVs instead of the va_list argument. The fifth function extends the
string stored in the first SV with the string stored in the second SV. It also forces the second SV to be interpreted as a string.
The "sv_cat*()" functions are not generic enough to operate on values that have "magic". See "Magic Virtual Tables" later in this docu-
ment.
If you know the name of a scalar variable, you can get a pointer to its SV by using the following:
SV* get_sv("package::varname", FALSE);
This returns NULL if the variable does not exist.
If you want to know if this variable (or any other SV) is actually "defined", you can call:
SvOK(SV*)
The scalar "undef" value is stored in an SV instance called "PL_sv_undef".
Its address can be used whenever an "SV*" is needed. Make sure that you don't try to compare a random sv with &PL_sv_undef. For example
when interfacing Perl code, it'll work correctly for:
foo(undef);
But won't work when called as:
$x = undef;
foo($x);
So to repeat always use SvOK() to check whether an sv is defined.
Also you have to be careful when using &PL_sv_undef as a value in AVs or HVs (see "AVs, HVs and undefined values").
There are also the two values "PL_sv_yes" and "PL_sv_no", which contain boolean TRUE and FALSE values, respectively. Like "PL_sv_undef",
their addresses can be used whenever an "SV*" is needed.
Do not be fooled into thinking that "(SV *) 0" is the same as &PL_sv_undef. Take this code:
SV* sv = (SV*) 0;
if (I-am-to-return-a-real-value) {
sv = sv_2mortal(newSViv(42));
}
sv_setsv(ST(0), sv);
This code tries to return a new SV (which contains the value 42) if it should return a real value, or undef otherwise. Instead it has
returned a NULL pointer which, somewhere down the line, will cause a segmentation violation, bus error, or just weird results. Change the
zero to &PL_sv_undef in the first line and all will be well.
To free an SV that you've created, call "SvREFCNT_dec(SV*)". Normally this call is not necessary (see "Reference Counts and Mortality").
Offsets
Perl provides the function "sv_chop" to efficiently remove characters from the beginning of a string; you give it an SV and a pointer to
somewhere inside the PV, and it discards everything before the pointer. The efficiency comes by means of a little hack: instead of actually
removing the characters, "sv_chop" sets the flag "OOK" (offset OK) to signal to other functions that the offset hack is in effect, and it
puts the number of bytes chopped off into the IV field of the SV. It then moves the PV pointer (called "SvPVX") forward that many bytes,
and adjusts "SvCUR" and "SvLEN".
Hence, at this point, the start of the buffer that we allocated lives at "SvPVX(sv) - SvIV(sv)" in memory and the PV pointer is pointing
into the middle of this allocated storage.
This is best demonstrated by example:
% ./perl -Ilib -MDevel::Peek -le '$a="12345"; $a=~s/.//; Dump($a)'
SV = PVIV(0x8128450) at 0x81340f0
REFCNT = 1
FLAGS = (POK,OOK,pPOK)
IV = 1 (OFFSET)
PV = 0x8135781 ( "1" . ) "2345"