CentOS 7.0 - man page for stap (centos section 1)

Linux & Unix Commands - Search Man Pages

Man Page or Keyword Search:   man
Select Man Page Set:       apropos Keyword Search (sections above)

STAP(1) 										  STAP(1)

       stap - systemtap script translator/driver

       stap [ OPTIONS ] - [ ARGUMENTS ]
       stap [ OPTIONS ] -e SCRIPT [ ARGUMENTS ]
       stap [ OPTIONS ] -l PROBE [ ARGUMENTS ]
       stap [ OPTIONS ] -L PROBE [ ARGUMENTS ]

       The  stap program is the front-end to the Systemtap tool.  It accepts probing instructions
       (written in a simple scripting language), translates those instructions into C code,  com-
       piles this C code, and loads the resulting module into a running Linux kernel or a DynInst
       user-space mutator, to perform the requested system trace/probe functions.  You can supply
       the script in a named file (FILENAME), from standard input (use - instead of FILENAME), or
       from the command line (using -e SCRIPT).  The program runs until it is interrupted by  the
       user, or if the script voluntarily invokes the exit() function, or by sufficient number of
       soft errors.

       The language, which is described in a later section, is strictly typed, declaration  free,
       procedural,  and inspired by awk.  It allows source code points or events in the kernel to
       be associated with handlers, which are subroutines that are executed synchronously.  It is
       somewhat similar conceptually to "breakpoint command lists" in the gdb debugger.

       The  systemtap  translator supports the following options.  Any other option prints a list
       of supported options.  Options may be given on the command line, as usual.   If	the  file
       $SYSTEMTAP_DIR/rc exist, options are also loaded from there and interpreted first.  ($SYS-
       TEMTAP_DIR defaults to $HOME/.systemtap if unset.)

       -      Use standard input instead of a given FILENAME as probe language input,  unless  -e
	      SCRIPT is given.

       -h --help
	      Show help message.

       -V --version
	      Show version message.

       -p NUM Stop  after  pass  NUM.	The passes are numbered 1-5: parse, elaborate, translate,
	      compile, run.  See the PROCESSING section for details.

       -v     Increase verbosity for all passes.  Produce a larger volume of informative (?) out-
	      put each time option repeated.

       --vp ABCDE
	      Increase	verbosity  on  a per-pass basis.  For example, "--vp 002" adds 2 units of
	      verbosity to pass 3 only.  The combination "-v --vp 00004" adds 1 unit of verbosity
	      for all passes, and 4 more for pass 5.

       -k     Keep  the temporary directory after all processing.  This may be useful in order to
	      examine the generated C code, or to reuse the compiled kernel object.

       -g     Guru mode.  Enable parsing of unsafe expert-level constructs like embedded C.

       -P     Prologue-searching mode.	Activate heuristics to work  around  incorrect	debugging
	      information for $target variables.

       -u     Unoptimized mode.  Disable unused code elision during elaboration.

       -w     Suppressed warnings mode.  Disables all warning messages.

       -W     Treat all warnings as errors.

       -b     Use bulk mode (percpu files) for kernel-to-user data transfer.

       -t     Collect timing information on the number of times probe executes and average amount
	      of time spent in each probe-point. Also shows the derivation for each probe-point.

       -sNUM  Use NUM megabyte buffers for kernel-to-user data transfer.  On a multiprocessor  in
	      bulk mode, this is a per-processor amount.

       -I DIR Add  the	given  directory  to the tapset search directory.  See the description of
	      pass 2 for details.

       -D NAME=VALUE
	      Add the given C preprocessor directive to the module Makefile.  These can  be  used
	      to override limit parameters described below.

       -B NAME=VALUE
	      Add  the	given make directive to the kernel module build's make invocation.  These
	      can be used to add or override kconfig options.

       -a ARCH
	      Use a cross-compilation mode for the  given  target  architecture.   This  requires
	      access  to the cross-compiler and the kernel build tree, and goes along with the -B
	      CROSS_COMPILE=arch-tool-prefix- and -r /build/tree options.

       --modinfo NAME=VALUE
	      Add the name/value pair as a MODULE_INFO macro call to the generated module.   This
	      may be useful to inform or override various module-related checks in the kernel.

       -G NAME=VALUE
	      Sets  the  value	of  global  variable NAME to VALUE when staprun is invoked.  This
	      applies to scalar variables declared global in the script/tapset.

       -R DIR Look for the systemtap runtime sources in the given directory.

       -r /DIR
	      Build for kernel in given build tree. Can also be set  with  the	SYSTEMTAP_RELEASE
	      environment variable.

       -r RELEASE
	      Build  for  kernel  in build tree /lib/modules/RELEASE/build.  Can also be set with
	      the SYSTEMTAP_RELEASE environment variable.

       -m MODULE
	      Use the given name for the generated kernel object module, instead of a unique ran-
	      domized  name.   The generated kernel object module is copied to the current direc-

       -d MODULE
	      Add symbol/unwind information for the given module into the kernel  object  module.
	      This  may  enable  symbolic tracebacks from those modules/programs, even if they do
	      not have an explicit probe placed into them.

       --ldd  Add symbol/unwind information for all shared libraries suspected by ldd to be  nec-
	      essary  for user-space binaries being probe or listed with the -d option.  Caution:
	      this can make the probe modules considerably larger.

	      Equivalent to specifying "-dkernel" and a "-d" for each kernel module that is  cur-
	      rently loaded.  Caution: this can make the probe modules considerably larger.

       -o FILE
	      Send  standard  output  to  named  file. In bulk mode, percpu files will start with
	      FILE_ (FILE_cpu with -F) followed by the cpu  number.   This  supports  strftime(3)
	      formats for FILE.

       -c CMD Start the probes, run CMD, and exit when CMD finishes.  This also has the effect of
	      setting target() to the pid of the command ran.

       -x PID Sets target() to PID. This allows scripts to be written that filter on  a  specific

       -e SCRIPT
	      Run the given SCRIPT specified on the command line.

       -l PROBE
	      Instead  of  running  a probe script, just list all available probe points matching
	      the given single probe point.  The pattern may include wildcards and  aliases,  but
	      not  comma-separated  multiple probe points.  The process result code will indicate
	      failure if there are no matches.

       -L PROBE
	      Similar to "-l", but list probe points and script-level local variables.

       -F     Without -o option, load module and start probes, then detach from the module  leav-
	      ing  the probes running.	With -o option, run staprun in background as a daemon and
	      show its pid.

       -S size[,N]
	      Sets the maximum size of output file and the maximum number of  output  files.   If
	      the  size  of  output file will exceed size , systemtap switches output file to the
	      next file. And if the number of output files exceed N , systemtap removes the  old-
	      est output file. You can omit the second argument.

	      Ignore  unresolvable or run-time-inaccessible context variables and substitute with
	      0, without errors.

	      Wrap all probe handlers into something like this

	      try { ... } catch { next }

	      block, which causes any runtime errors to be quietly suppressed.	Suppressed errors
	      do not count against MAXERRORS limits.  In this mode, the MAXSKIPPED limits are al-
	      so suppressed, so that many errors and skipped probes may be accumulated	during	a
	      script's runtime.  Any overall counts will still be reported at shutdown.

       --compatible VERSION
	      Suppress recent script language or tapset changes which are incompatible with given
	      older version of systemtap.  This may be useful if a much  older	systemtap  script
	      fails to run.  See the DEPRECATION section for more details.

	      This  option is used to check if the active script has any constructors that may be
	      systemtap version specific.  See the DEPRECATION section for more details.

	      This option prunes stale entries from the cache directory.  This is  normally  done
	      automatically after successful runs, but this option will trigger the cleanup manu-
	      ally and then exit.  See the CACHING section for more details about cache limits.

       --color[=WHEN], --colour[=WHEN]
	      This option controls coloring of error messages. WHEN can be either  "never",  "al-
	      ways", or "auto" (i.e. enable only if at a terminal). If WHEN is missing, then "al-
	      ways" is assumed. If the option is missing, then "auto" is assumed.

	      Colors can be modified using the SYSTEMTAP_COLORS environment variable. The  format
	      must  be of the form key1=val1:key2=val2:key3=val3 ...etc.  Valid keys are "error",
	      "warning", "source", "caret", and "token".  Values constitute Select Graphic Rendi-
	      tion (SGR) parameter(s). Consult the documentation of your terminal for the SGRs it
	      supports.  As  an   example,   the   default   colors   would   be   expressed   as
	      error=01;31:warning=00;33:source=00;34:caret=01:token=01.   If  SYSTEMTAP_COLORS is
	      absent or empty, the default colors will be used. If it  is  invalid,  coloring  is
	      turned off.

	      This  option disables all use of the cache directory.  No files will be either read
	      from or written to the cache.

	      This option treats files in the cache directory as invalid.  No files will be  read
	      from  the  cache,  but  resulting  files from this run will still be written to the
	      cache.  This is meant as a troubleshooting aid when stap's cached behavior seems to
	      be misbehaving.

       --privilege[=stapusr | =stapsys | =stapdev]
	      This  option  instructs stap to examine the script looking for constructs which are
	      not allowed for the specified privilege level (see UNPRIVILEGED  USERS).	 Compila-
	      tion  fails  if  any such constructs are used.  If stapusr or stapsys are specified
	      when using a compile server (see --use-server), the server will examine the  script
	      and,  if compilation succeeds, the server will cryptographically sign the resulting
	      kernel module, certifying that is it safe for use by users at the specified  privi-
	      lege level.

	      If  --privilege  has not been specified, -pN has not been specified with N < 5, and
	      the invoking user is not root, and is not a member of the group stapdev, then  stap
	      will  automatically  add	the appropriate --privilege option to the options already

	      This option is equivalent to --privilege=stapusr.

       --use-server[=HOSTNAME[:PORT] | =IP_ADDRESS[:PORT] | =CERT_SERIAL]
	      Specify compile-server(s) to be used for compilation  and/or  in	conjunction  with
	      --list-servers  and  --trust-servers  (see below). If no argument is supplied, then
	      the default in unprivileged mode (see --privilege) is to select compatible  servers
	      which  are  trusted as SSL peers and as module signers and currently online. Other-
	      wise the default is to select compatible servers which are trusted as SSL peers and
	      currently  online.   --use-server  may be specified more than once, in which case a
	      list of servers is accumulated in the order specified. Servers may be specified  by
	      host   name,   ip   address,  or	by  certificate  serial  number  (obtained  using
	      --list-servers).	The latter is most commonly used when adding or revoking trust in
	      a  server  (see --trust-servers below). If a server is specified by host name or ip
	      address, then an optional port number may be specified. This is useful for  access-
	      ing servers which are not on the local network or to specify a particular server.

	      IP addresses may be IPv4 or IPv6 addresses.

	      If  a  particular IPv6 address is link local and exists on more than one interface,
	      the intended interface may be specified by appending the	address  with  a  percent
	      sign    (%)    followed	 by   the   intended   interface   name.   For	 example,

	      In order to specify a port number with an IPv6 address, it is necessary to  enclose
	      the  IPv6 address in square brackets ([]) in order to separate the port number from
	      the  rest  of  the  address.  For  example,  "[fe80::5eff:35ff:fe07:55ca]:5000"  or

	      If  --use-server has not been specified, -pN has not been specified with N < 5, and
	      the invoking user not root, is not a member of the group stapdev, but is	a  member
	      of  the group stapusr, then stap will automatically add --use-server to the options
	      already specified.

	      Instructs stap to retry compilation of a script using a compile server if  compila-
	      tion on the local host fails in a manner which suggests that it might succeed using
	      a server.  If this option is not specified, the default is no.  If no  argument  is
	      provided, then the default is yes. Compilation will be retried for certain types of
	      errors (e.g. insufficient data or resources) which may not occur during re-compila-
	      tion  by	a  compile server. Compile servers will be selected automatically for the
	      re-compilation attempt as if --use-server was specified with no arguments.

	      Display the status of the requested SERVERS, where  SERVERS  is  a  comma-separated
	      list of server attributes. The list of attributes is combined to filter the list of
	      servers displayed. Supported attributes are:

	      all    specifies all known servers (trusted SSL peers, trusted module signers,  on-
		     line servers).

		     specifies servers specified using --use-server.

	      online filters the output by retaining information about servers which are current-
		     ly online.

		     filters the output by retaining information about servers which are  trusted
		     as SSL peers.

	      signer filters  the output by retaining information about servers which are trusted
		     as module signers (see --privilege).

		     filters the output by retaining information about servers which are compati-
		     ble with the current kernel release and architecture.

	      If  no  argument	is  provided,  then the default is specified.  If no servers were
	      specified using --use-server, then the default servers for --use-server are listed.

	      Note that --list-servers uses the avahi-daemon service to detect online servers. If
	      this  service  is not available, then --list-servers will fail to detect any online
	      servers. In order for --list-servers to detect servers listening on IPv6 addresses,
	      the  avahi-daemon  configuration	file /etc/avahi/avahi-daemon.conf must contain an
	      active "use-ipv6=yes" line. The service must be restarted after adding this line in
	      order for IPv6 to be enabled.

	      Grant or revoke trust in compile-servers, specified using --use-server as specified
	      by TRUST_SPEC, where TRUST_SPEC is a  comma-separated  list  specifying  the  trust
	      which is to be granted or revoked. Supported elements are:

	      ssl    trust the specified servers as SSL peers.

	      signer trust  the specified servers as module signers (see --privilege).	Only root
		     can specify signer.

		     grant trust as an ssl peer for all users on the local host. The  default  is
		     to  grant	trust as an ssl peer for the current user only. Trust as a module
		     signer is always granted for all users. Only root can specify all-users.

	      revoke revoke the specified trust. The default is to grant it.

		     do not prompt the user for confirmation before carrying  out  the	requested
		     action. The default is to prompt the user for confirmation.

	      If  no argument is provided, then the default is ssl.  If no servers were specified
	      using --use-server, then no trust will be granted or revoked.

	      Unless no-prompt has been specified, the user will be prompted to confirm the trust
	      to be granted or revoked before the operation is performed.

	      Dumps  a	list  of supported probe types. If --privilege=stapusr is also specified,
	      the list will be limited to probe types available to unprivileged users.

       --remote URL
	      Set the execution target to the given host.  This option may be repeated to  target
	      multiple	execution  targets.   Passes 1-4 are completed locally as normal to build
	      the script, and then pass 5 will copy the module to the target and run it.  Accept-
	      able URL forms include:

		     This  mode uses ssh, optionally using a username not matching your own. If a
		     custom ssh_config file is in use, add SendEnv LANG to retain  international-
		     ization functionality.

	      libvirt://DOMAIN, libvirt://DOMAIN/LIBVIRT_URI
		     This  mode  uses  stapvirt to execute the script on a domain managed by lib-
		     virt. Optionally, LIBVIRT_URI may be specified  to  connect  to  a  specific
		     driver and/or a remote host. For example, to connect to the local privileged
		     QEMU driver, use:

		     --remote libvirt://MyDomain/qemu:///system

		     See the page at <http://libvirt.org/uri.html> for supported URIs.	Also  see
		     stapvirt(1) for more information on how to prepare the domain for stap prob-

		     This mode connects to a UNIX socket. This can be used with a QEMU virtio-se-
		     rial port for executing scripts inside a running virtual machine.

		     Special loopback mode to run on the local host.

	      Prefix  each  line  of remote output with "N: ", where N is the index of the remote
	      execution target from which the given line originated.

	      Enable, disable or set a timeout for the automatic  debuginfo  downloading  feature
	      offered by abrt as specified by OPTION, where OPTION is one of the following:

	      yes    enable  automatic downloading of debuginfo with no timeout. This is the same
		     as not providing an OPTION value to --download-debuginfo

	      no     explicitly disable automatic downloading of debuginfo. This is the  same  as
		     not using the option at all.

	      ask    show  abrt  output,  and  ask before continuing download. No timeout will be

		     specify a timeout as a positive number to stop the download if it is  taking
		     too long.

	      Specify the maximum size of the process's virtual memory (address space), in bytes.
	      If nothing is specified, no limits are imposed.

	      Specify the CPU time limit, in seconds. If nothing is specified, no limits are  im-

	      Specify  the  maximum number of processes that can be created. If nothing is speci-
	      fied, no limits are imposed.

	      Specify the maximum size of the process stack, in bytes. If nothing  is  specified,
	      no limits are imposed.

	      Specify the maximum size of files that the process may create, in bytes. If nothing
	      is specified, no limits are imposed.

	      Specify sysroot directory where target files (executables,  libraries,  etc.)   are
	      located.	 With -r RELEASE, the sysroot will be searched for the appropriate kernel
	      build directory.	With -r /DIR, however, the sysroot will not be used to	find  the
	      kernel build.

	      Provide  an alternate value for an environment variable where the value on a remote
	      system differs.  Path variables (e.g. PATH, LD_LIBRARY_PATH) are assumed to be rel-
	      ative to the directory provided by --sysroot, if provided.

	      Disable -DSTP_NO_OVERLOAD -MAXACTION -MAXTRYLOCK options.  This option requires gu-
	      ru mode.

	      Set the pass-5 runtime mode.  Valid options are kernel (default) and dyninst.   See
	      ALTERNATE RUNTIMES below for more information.

	      Shorthand for --runtime=dyninst.

       Any additional arguments on the command line are passed to the script parser for substitu-
       tion.  See below.

       The systemtap script language resembles awk.  There are	two  main  outermost  constructs:
       probes and functions.  Within these, statements and expressions use C-like operator syntax
       and precedence.

       Whitespace is ignored.  Three forms of comments are supported:
	      # ... shell style, to the end of line, except for $# and @#
	      // ... C++ style, to the end of line
	      /* ... C style ... */
       Literals are either strings enclosed in double-quotes (passing through the usual C  escape
       codes  with  backslashes, and with adjacent string literals glued together, also as in C),
       or integers (in decimal, hexadecimal, or octal, using the same notation	as  in	C).   All
       strings	are  limited  in length to some reasonable value (a few hundred bytes).  Integers
       are 64-bit signed quantities, although the parser also accepts (and wraps  around)  values
       above positive 2**63.

       In  addition,  script arguments given at the end of the command line may be inserted.  Use
       $1 ... $<NN> for insertion unquoted, @1 ... @<NN> for insertion as a string literal.   The
       number of arguments may be accessed through $# (as an unquoted number) or through @# (as a
       quoted number).	These may be used at any place a token may begin,  including  within  the
       preprocessing stage.  Reference to an argument number beyond what was actually given is an

       A simple conditional preprocessing stage is run as a part of parsing.  The general form is
       similar to the cond ? exp1 : exp2 ternary operator:


       The  CONDITION is either an expression whose format is determined by its first keyword, or
       a string literals comparison or a numeric literals comparison.  It can be also composed of
       many  alternatives and conjunctions of CONDITIONs (meant as in previous sentence) using ||
       and && respectively.  However, parentheses are not supported yet, so remembering that con-
       junction takes precedence over alternative is important.

       If  the	first part is the identifier kernel_vr or kernel_v to refer to the kernel version
       number, with ("2.6.13-1.322FC3smp") or without ("2.6.13") the release  code  suffix,  then
       the  second part is one of the six standard numeric comparison operators <, <=, ==, !=, >,
       and >=, and the third part is a string literal that contains an RPM-style  version-release
       value.	The condition is deemed satisfied if the version of the target kernel (as option-
       ally overridden by the -r option) compares to the given version string.	The comparison is
       performed  by  the  glibc  function strverscmp.	As a special case, if the operator is for
       simple equality (==), or inequality (!=), and the third part contains any wildcard charac-
       ters  (*  or ? or [), then the expression is treated as a wildcard (mis)match as evaluated
       by fnmatch.

       If, on the other hand, the first part is the identifier arch to refer to the processor ar-
       chitecture (as named by the kernel build system ARCH/SUBARCH), then the second part is one
       of the two string comparison operators == or !=, and the third part is  a  string  literal
       for matching it.  This comparison is a wildcard (mis)match.

       Similarly,  if  the first part is an identifier like CONFIG_something to refer to a kernel
       configuration option, then the second part is == or !=, and the third  part  is	a  string
       literal for matching the value (commonly "y" or "m").  Nonexistent or unset kernel config-
       uration options are represented by the empty string.  This comparison is also  a  wildcard

       If the first part is the identifier systemtap_v, the test refers to the systemtap compati-
       bility version, which may be overridden for old scripts with the --compatible  flag.   The
       comparison  operator is as is for kernel_v and the right operand is a version string.  See
       also the DEPRECATION section below.

       If the first part is the identifier systemtap_privilege, the test refers to the	privilege
       level  that  the  systemtap script is compiled with. Here the second part is == or !=, and
       the third part is a string literal, either "stapusr" or "stapsys" or "stapdev".

       If the first part is the identifier runtime, the test  refers  to  the  systemtap  runtime
       mode.  See  ALTERNATE RUNTIMES below for more information on runtimes.  The second part is
       one of the two string comparison operators == or !=, and the third part is a string liter-
       al for matching it.  This comparison is a wildcard (mis)match.

       Otherwise, the CONDITION is expected to be a comparison between two string literals or two
       numeric literals.  In this case, the arguments are the only variables usable.

       The TRUE-TOKENS and FALSE-TOKENS are zero or more general parser tokens (possibly  includ-
       ing  nested preprocessor conditionals), and are passed into the input stream if the condi-
       tion is true or false.  For example, the following code induces a parse error  unless  the
       target kernel version is newer than 2.6.5:

	      %( kernel_v <= "2.6.5" %? **ERROR** %) # invalid token sequence

       The following code might adapt to hypothetical kernel version drift:

	      probe kernel.function (
		%( kernel_v <= "2.6.12" %? "__mm_do_fault" %:
		   %( kernel_vr == "2.6.13*smp" %? "do_page_fault" %:
		      UNSUPPORTED %) %)
	      ) { /* ... */ }

	      %( arch == "ia64" %?
		 probe syscall.vliw = kernel.function("vliw_widget") {}

       The preprocessor also supports a simple macro facility, run as a separate pass before con-
       ditional preprocessing.

       Macros are defined using the following construct:

	      @define NAME %( BODY %)
	      @define NAME(PARAM_1, PARAM_2, ...) %( BODY %)

       Macros, and parameters inside a macro body, are both invoked by prefixing the  macro  name
       with an @ symbol:

	      @define foo %( x %)
	      @define add(a,b) %( ((@a)+(@b)) %)

		 @foo = @add(2,2)

       Macro  expansion is currently performed in a separate pass before conditional compilation.
       Therefore, both TRUE- and FALSE-tokens in conditional expressions  will	be  macroexpanded
       regardless of how the condition is evaluated. This can sometimes lead to errors:

	      // The following results in a conflict:
	      %( CONFIG_UTRACE == "y" %?
		  @define foo %( process.syscall %)
		  @define foo %( **ERROR** %)

	      // The following works properly as expected:
	      @define foo %(
		%( CONFIG_UTRACE == "y" %? process.syscall %: **ERROR** %)

       The  first example is incorrect because both @defines are evaluated in a pass prior to the
       conditional being evaluated.

       Normally, a macro definition is local to the file it occurs in. Thus, defining a macro  in
       a  tapset  does	not make it available to the user of the tapset. Publically available li-
       brary macros can be defined by including .stpm files on	the  tapset  search  path.  These
       files may only contain @define constructs, which become visible across all tapsets and us-
       er scripts.

       Identifiers for variables and functions are an alphanumeric sequence, and may include  "_"
       and  "$" characters.  They may not start with a plain digit, as in C.  Each variable is by
       default local to the probe or function statement block within which it is  mentioned,  and
       therefore its scope and lifetime is limited to a particular probe or function invocation.

       Scalar variables are implicitly typed as either string or integer.  Associative arrays al-
       so have a string or integer value, and a tuple of strings and/or  integers  serving  as	a
       key.  Here are a few basic expressions.

	      var1 = 5
	      var2 = "bar"
	      array1 [pid()] = "name"	  # single numeric key
	      array2 ["foo",4,i++] += 5   # vector of string/num/num keys
	      if (["hello",5,4] in array2) println ("yes")  # membership test

       The  translator	performs  type	inference on all identifiers, including array indexes and
       function parameters.  Inconsistent type-related use of identifiers signals an error.

       Variables may be declared global, so that they are shared amongst all probes and  live  as
       long  as  the  entire systemtap session.  There is one namespace for all global variables,
       regardless of which script file they are found within.  Concurrent access to global  vari-
       ables  is automatically protected with locks, see the SAFETY AND SECURITY section for more
       details.  A global declaration may be written at the outermost level anywhere, not  within
       a  block of code.  Global variables which are written but never read will be displayed au-
       tomatically at session shutdown.  The translator will infer for each its value  type,  and
       if  it  is used as an array, its key types.  Optionally, scalar globals may be initialized
       with a string or number literal.  The following declaration marks variables as global.

	      global var1, var2, var3=4

       Global variables can also be set as module options. One can do this by either using the -G
       option, or the module must first be compiled using stap -p4.  Global variables can then be
       set on the command line when calling staprun on the module  generated  by  stap	-p4.  See
       staprun(8) for more information.

       Arrays  are  limited  in size by the MAXMAPENTRIES variable -- see the SAFETY AND SECURITY
       section for details.  Optionally, global arrays may be declared with  a	maximum  size  in
       brackets,  overriding  MAXMAPENTRIES for that array only.  Note that this doesn't indicate
       the type of keys for the array, just the size.

	      global tiny_array[10], normal_array, big_array[50000]

       Arrays may be configured for wrapping using the '%' suffix.  This causes older elements to
       be  overwritten if more elements are inserted than the array can hold. This works for both
       associative and statistics typed arrays.

	      global wrapped_array1%[10], wrapped_array2%

       Statements enable procedural control flow.  They may occur within functions and probe han-
       dlers.	The  total number of statements executed in response to any single probe event is
       limited to some number defined by a macro in the translated C code, and is in  the  neigh-
       bourhood of 1000.

       EXP    Execute the string- or integer-valued expression and throw away the value.

       { STMT1 STMT2 ... }
	      Execute each statement in sequence in this block.  Note that separators or termina-
	      tors are generally not necessary between statements.

       ;      Null statement, do nothing.  It is useful as an optional separator  between  state-
	      ments to improve syntax-error detection and to handle certain grammar ambiguities.

       if (EXP) STMT1 [ else STMT2 ]
	      Compare  integer-valued  EXP  to zero.  Execute the first (non-zero) or second STMT

       while (EXP) STMT
	      While integer-valued EXP evaluates to non-zero, execute STMT.

       for (EXP1; EXP2; EXP3) STMT
	      Execute EXP1 as initialization.  While EXP2 is non-zero, execute STMT, then the it-
	      eration expression EXP3.

       foreach (VAR in ARRAY [ limit EXP ]) STMT
	      Loop  over  each	element  of the named global array, assigning current key to VAR.
	      The array may not be modified within the statement.  By adding a single + or -  op-
	      erator after the VAR or the ARRAY identifier, the iteration will proceed in a sort-
	      ed order, by ascending or descending index or value.  If the array contains statis-
	      tics  aggregates, adding the desired @operator between the ARRAY identifier and the
	      + or - will specify the sorting aggregate function.  See the STATISTICS section be-
	      low  for	the ones available.  Default is @count.  Using the optional limit keyword
	      limits the number of loop iterations to EXP times.  EXP is evaluated  once  at  the
	      beginning of the loop.

       foreach ([VAR1, VAR2, ...] in ARRAY [ limit EXP ]) STMT
	      Same as above, used when the array is indexed with a tuple of keys.  A sorting suf-
	      fix may be used on at most one VAR or ARRAY identifier.

       foreach (VALUE = VAR in ARRAY [ limit EXP ]) STMT
	      This variant of foreach saves current value into VALUE on each iteration, so it  is
	      the same as ARRAY[VAR].  This also works with a tuple of keys.  Sorting suffixes on
	      VALUE have the same effect as on ARRAY.

       break, continue
	      Exit or iterate the innermost nesting loop (while or for or foreach) statement.

       return EXP
	      Return EXP value from enclosing function.  If the function's  value  is  not  taken
	      anywhere,  then a return statement is not needed, and the function will have a spe-
	      cial "unknown" type with no return value.

       next   Return now from enclosing probe handler.	This is especially useful in probe alias-
	      es that apply event filtering predicates.

       try { STMT1 } catch { STMT2 }
	      Run  the	statements in the first block.	Upon any run-time errors, abort STMT1 and
	      start executing STMT2.  Any errors in  STMT2  will  propagate  to  outer	try/catch
	      blocks, if any.

       try { STMT1 } catch(VAR) { STMT2 }
	      Same as above, plus assign the error message to the string scalar variable VAR.

       delete ARRAY[INDEX1, INDEX2, ...]
	      Remove  from  ARRAY  the	element  specified by the index tuple.	The value will no
	      longer be available, and subsequent iterations will not report the element.  It  is
	      not an error to delete an element that does not exist.

       delete ARRAY
	      Remove all elements from ARRAY.

       delete SCALAR
	      Removes  the value of SCALAR.  Integers and strings are cleared to 0 and "" respec-
	      tively, while statistics are reset to the initial empty state.

       Systemtap supports a number of operators that have the same general syntax, semantics, and
       precedence as in C and awk.  Arithmetic is performed as per typical C rules for signed in-
       tegers.	Division by zero or overflow is detected and results in an error.

       binary numeric operators
	      * / % + - >> << & ^ | && ||

       binary string operators
	      .  (string concatenation)

       numeric assignment operators
	      = *= /= %= += -= >>= <<= &= ^= |=

       string assignment operators
	      = .=

       unary numeric operators
	      + - ! ~ ++ --

       binary numeric, string comparison or regex matching operators
	      < > <= >= == != =~ !~

       ternary operator
	      cond ? exp1 : exp2

       grouping operator
	      ( exp )

       function call
	      fn ([ arg1, arg2, ... ])

       array membership check
	      exp in array
	      [exp1, exp2, ...] in array

       The scripting language has proof-of-concept support for regular expression  matching.  The
       basic syntax is as follows:

	      exp =~ regex
	      exp !~ regex

       (The  first  operand must be an expression evaluating to a string; the second operand must
       be a string literal containing a syntactically valid regular expression.)

       The regular expression syntax supports most of the features of POSIX Extended Regular  Ex-
       pressions, except for subexpression reuse ("\1") functionality. The ability to capture and
       extract the contents of the matched string and subexpressions has not yet been  implement-

       The main construct in the scripting language identifies probes.	Probes associate abstract
       events with a statement block ("probe handler") that is to be executed when any	of  those
       events occur.  The general syntax is as follows:

	      probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }

       Events  are  specified in a special syntax called "probe points".  There are several vari-
       eties of probe points defined by the translator, and tapset  scripts  may  define  further
       ones  using aliases.  Probe points may be wildcarded, grouped, or listed in preference se-
       quences, or declared optional.  More details on probe point syntax and semantics are list-
       ed on the stapprobes(3stap) manual page.

       The  probe handler is interpreted relative to the context of each event.  For events asso-
       ciated with kernel code, this context may include variables defined in the source code  at
       that  spot.  These "target variables" are presented to the script as variables whose names
       are prefixed with "$".  They may be accessed only if the kernel's compiler preserved  them
       despite optimization.  This is the same constraint that a debugger user faces when working
       with optimized code.   Some  other  events  have  very  little  context.   See  the  stap-
       probes(3stap)  man  pages  to see the kinds of context variables available at each kind of
       probe point.

       New probe points may be defined using "aliases".  Probe	point  aliases	look  similar  to
       probe definitions, but instead of activating a probe at the given point, it just defines a
       new probe point name as an alias to an existing one. There are two types  of  alias,  i.e.
       the prologue style and the epilogue style which are identified by "=" and "+=" respective-

       For prologue style alias, the statement block that follows an alias definition is  implic-
       itly  added  as	a  prologue to any probe that refers to the alias. While for the epilogue
       style alias, the statement block that follows an alias definition is implicitly	added  as
       an epilogue to any probe that refers to the alias.  For example:

	      probe syscall.read = kernel.function("sys_read") {
		fildes = $fd
		if (execname() == "init") next	# skip rest of probe

       defines a new probe point syscall.read, which expands to kernel.function("sys_read"), with
       the given statement as a prologue, which is useful to predefine	some  variables  for  the
       alias user and/or to skip probe processing entirely based on some conditions.  And

	      probe syscall.read += kernel.function("sys_read") {
		if (tracethis) println ($fd)

       defines a new probe point with the given statement as an epilogue, which is useful to take
       actions based upon variables set or left over by the the alias user.  Please note that  in
       each case, the statements in the alias handler block are treated ordinarily, so that vari-
       ables assigned there constitute mere initialization, not a macro substitution.

       An alias is used just like a built-in probe type.

	      probe syscall.read {
		printf("reading fd=%d0, fildes)
		if (fildes > 10) tracethis = 1

       Systemtap scripts may define subroutines to factor out common work.   Functions	take  any
       number  of  scalar (integer or string) arguments, and must return a single scalar (integer
       or string).  An example function declaration looks like this:

	      function thisfn (arg1, arg2) {
		 return arg1 + arg2

       Note the general absence of type declarations, which are instead inferred by the  transla-
       tor.   However,	if  desired, a function definition may include explicit type declarations
       for its return value and/or its arguments.  This  is  especially  helpful  for  embedded-C
       functions.   In the following example, the type inference engine need only infer type type
       of arg2 (a string).

	      function thatfn:string (arg1:long, arg2) {
		 return sprint(arg1) . arg2

       Functions may call others or themselves recursively, up to a fixed  nesting  limit.   This
       limit is defined by a macro in the translated C code and is in the neighbourhood of 10.

       There are a set of function names that are specially treated by the translator.	They for-
       mat values for printing to the standard systemtap output stream in a more convenient  way.
       The sprint* variants return the formatted string instead of printing it.

       print, sprint
	      Print one or more values of any type, concatenated directly together.

       println, sprintln
	      Print values like print and sprint, but also append a newline.

       printd, sprintd
	      Take  a  string  delimiter and two or more values of any type, and print the values
	      with the delimiter interposed.  The delimiter must be a literal string constant.

       printdln, sprintdln
	      Print values with a delimiter like printd and sprintd, but also append a newline.

       printf, sprintf
	      Take a formatting string and a number of values of corresponding types,  and  print
	      them all.  The format must be a literal string constant.

       The  printf  formatting directives similar to those of C, except that they are fully type-
       checked by the translator:

	      %b     Writes a binary blob of the value given, instead of ASCII text.   The  width
		     specifier	determines  the number of bytes to write; valid specifiers are %b
		     %1b %2b %4b %8b.  Default (%b) is 8 bytes.

	      %c     Character.

	      %d,%i  Signed decimal.

	      %m     Safely reads kernel memory at the given address, outputs its  content.   The
		     precision	specifier  determines  the number of bytes to read.  Default is 1

	      %M     Same as %m, but outputs in hexadecimal.  The minimal size of output is  dou-
		     ble the precision specifier.

	      %o     Unsigned octal.

	      %p     Unsigned pointer address.

	      %s     String.

	      %u     Unsigned decimal.

	      %x     Unsigned hex value, in all lower-case.

	      %X     Unsigned hex value, in all upper-case.

	      %%     Writes a %.

       The # flag selects the alternate forms.	For octal, this prefixes a 0.  For hex, this pre-
       fixes 0x or 0X, depending on case.  For characters, this escapes non-printing values  with
       either C-like escapes or raw octal.


	      a = "alice", b = "bob", p = 0x1234abcd, i = 123, j = -1, id[a] = 1234, id[b] = 4567
		   Prints: hello
		   Prints: bob\n
	      println(a . " is " . sprint(16))
		   Prints: alice is 16
	      foreach (name in id)  printdln("|", strlen(name), name, id[name])
		   Prints: 5|alice|1234\n3|bob|4567
	      printf("%c is %s; %x or %X or %p; %d or %u\n",97,a,p,p,p,j,j)
		   Prints: a is alice; 1234abcd or 1234ABCD or 0x1234abcd; -1 or 18446744073709551615\n
	      printf("2 bytes of kernel buffer at address %p: %2m", p, p)
		   Prints: 2 byte of kernel buffer at address 0x1234abcd: <binary data>
	      printf("%4b", p)
		   Prints (these values as binary data): 0x1234abcd
	      printf("%#o %#x %#X\n", 1, 2, 3)
		   Prints: 01 0x2 0X3
	      printf("%#c %#c %#c\n", 0, 9, 42)
		   Prints: \000 \t *

       It  is often desirable to collect statistics in a way that avoids the penalties of repeat-
       edly exclusive locking the global variables those numbers are being put	into.	Systemtap
       provides  a  solution  using  a special operator to accumulate values, and several pseudo-
       functions to extract the statistical aggregates.

       The aggregation operator is <<<, and resembles an assignment, or  a  C++  output-streaming
       operation.   The  left operand specifies a scalar or array-index lvalue, which must be de-
       clared global.  The right operand is a numeric expression.  The meaning is intuitive:  add
       the  given  number to the pile of numbers to compute statistics of.  (The specific list of
       statistics to gather is given separately, by the extraction functions.)

	      foo <<< 1
	      stats[pid()] <<< memsize

       The extraction functions are also special.  For each appearance of a  distinct  extraction
       function operating on a given identifier, the translator arranges to compute a set of sta-
       tistics that satisfy it.  The statistics system is thereby "on-demand".	Each execution of
       an  extraction  function  causes the aggregation to be computed for that moment across all

       Here is the set of extractor functions.	The first argument of each is the same	style  of
       lvalue  used  on  the left hand side of the accumulate operation.  The @count(v), @sum(v),
       @min(v), @max(v), @avg(v) extractor functions compute the number/total/minimum/maximum/av-
       erage  of  all  accumulated values.  The resulting values are all simple integers.  Arrays
       containing aggregates may be sorted and iterated.  See the foreach construct above.

       Histograms are also available, but are more complicated because they have a vector  rather
       than scalar value.  @hist_linear(v,start,stop,interval) represents a linear histogram from
       "start" to "stop" by increments of "interval".  The interval must be positive.  Similarly,
       @hist_log(v)  represents  a  base-2  logarithmic  histogram. Printing a histogram with the
       print family of functions renders a histogram object as a tabular "ASCII art" bar chart.

	      probe timer.profile {
		x[1] <<< pid()
		x[2] <<< uid()
		y <<< tid()
	      global x // an array containing aggregates
	      global y // a scalar
	      probe end {
		foreach ([i] in x @count+) {
		   printf ("x[%d]: avg %d = sum %d / count %d\n",
			   i, @avg(x[i]), @sum(x[i]), @count(x[i]))
		   println (@hist_log(x[i]))
		println ("y:")
		println (@hist_log(y))

       Once a pointer has been saved into a script integer variable,  the  translator  loses  the
       type  information necessary to access members from that pointer.  Using the @cast() opera-
       tor tells the translator how to read a pointer.

	      @cast(p, "type_name"[, "module"])->member

       This will interpret p as a pointer to a struct/union named type_name and  dereference  the
       member  value.  Further ->subfield expressions may be appended to dereference more levels.
	NOTE: the same dereferencing operator -> is used to refer to both direct  containment  or
       pointer indirection.  Systemtap automatically determines which.	The optional module tells
       the translator where to look for information about that type.   Multiple  modules  may  be
       specified  as  a  list with : separators.  If the module is not specified, it will default
       either to the probe module for dwarf probes, or to "kernel" for functions  and  all  other
       probes types.

       The translator can create its own module with type information from a header surrounded by
       angle brackets, in case normal debuginfo is not available.  For kernel headers, prefix  it
       with  "kernel"  to use the appropriate build system.  All other headers are build with de-
       fault GCC parameters into a user module.  Multiple headers may be specified in sequence to
       resolve a codependency.

	      @cast(tv, "timeval", "<sys/time.h>")->tv_sec
	      @cast(task, "task_struct", "kernel<linux/sched.h>")->tgid
	      @cast(task, "task_struct",

       Values  acquired by @cast may be pretty-printed by the  $ " and " $$ suffix operators, the
       same way as described in the CONTEXT VARIABLES section  of  the	stapprobes(3stap)  manual

       When  in guru mode, the translator will also allow scripts to assign new values to members
       of typecasted pointers.

       Typecasting is also useful in the case of void* members whose type may be determinable  at

	      probe foo {
		if ($var->type == 1) {
		  value = @cast($var->data, "type1")->bar
		} else {
		  value = @cast($var->data, "type2")->baz

       When  in  guru  mode, the translator accepts embedded code in the top level of the script.
       Such code is enclosed between %{ and %} markers,  and  is  transcribed  verbatim,  without
       analysis,  in some sequence, into the top level of the generated C code.  At the outermost
       level, this may be useful to add #include instructions, and any auxiliary definitions  for
       use by other embedded code.

       Another	place  where embedded code is permitted is as a function body.	In this case, the
       script language body is replaced entirely by a piece of C code enclosed again  between  %{
       and  %}	markers.  This C code may do anything reasonable and safe.  There are a number of
       undocumented but complex safety constraints on atomicity, concurrency,  resource  consump-
       tion, and run time limits, so this is an advanced technique.

       The  memory locations set aside for input and output values are made available to it using
       macros STAP_ARG_* and STAP_RETVALUE.  Here are some examples:

	      function add_one (val) %{
	      function add_one_str (val) %{

       The function argument and return value types have to be inferred by  the  translator  from
       the  call  sites  in order for this to work.  The user should examine C code generated for
       ordinary script-language functions in order to write compatible embedded-C ones.

       The last place where embedded code is permitted is as an expression rvalue.  In this case,
       the  C  code  enclosed  between %{ and %} markers is interpreted as an ordinary expression
       value.  It is assumed to be a normal 64-bit signed number, unless the marker /* string  */
       is included, in which case it's treated as a string.

	      function add_one (val) {
		return val + %{ 1 %}
	      function add_string_two (val) {
		return val . %{ /* string */ "two" %}

       The embedded-C code may contain markers to assert optimization and safety properties.

       /* pure */
	      means  that  the C code has no side effects and may be elided entirely if its value
	      is not used by script code.

       /* unprivileged */
	      means that the C code is so safe that even unprivileged users are permitted to  use

       /* myproc-unprivileged */
	      means  that the C code is so safe that even unprivileged users are permitted to use
	      it, provided that the target of the current probe is within the user's own process.

       /* guru */
	      means that the C code is so unsafe that a systemtap  user  must  specify	-g  (guru
	      mode) to use this.

       /* unmangled */
	      in  an  embedded-C function, means that the legacy (pre-1.8) argument access syntax
	      should be made available inside the function. Hence, in  addition  to  STAP_ARG_foo
	      and  STAP_RETVALUE  one  can use THIS->foo and THIS->__retvalue respectively inside
	      the function. This is useful for quickly migrating code written for SystemTap  ver-
	      sion 1.7 and earlier.

       /* string */
	      in embedded-C expressions only, means that the expression has const char * type and
	      should be treated as a string value, instead of the default long numeric.

       A set of builtin probe point aliases are provided by the scripts installed in the directo-
       ry  specified  in  the stappaths(7) manual page.  The functions are described in the stap-
       probes(3stap) manual page.

       The translator begins pass 1 by parsing the given input script,	and  all  scripts  (files
       named *.stp) found in a tapset directory.  The directories listed with -I are processed in
       sequence, each processed in "guru mode".  For each directory, a number  of  subdirectories
       are also searched.  These subdirectories are derived from the selected kernel version (the
       -R option), in order to allow more kernel-version-specific scripts to override  less  spe-
       cific  ones.  For example, for a kernel version 2.6.12-23.FC3 the following patterns would
       be searched, in sequence: 2.6.12-23.FC3/*.stp, 2.6.12/*.stp, 2.6/*.stp, and finally *.stp.
       Stopping the translator after pass 1 causes it to print the parse trees.

       In  pass 2, the translator analyzes the input script to resolve symbols and types.  Refer-
       ences to variables, functions, and probe aliases that are unresolved internally are satis-
       fied  by  searching  through the parsed tapset script files.  If any tapset script file is
       selected because it defines an unresolved symbol, then the entirety of that file is  added
       to  the	translator's  resolution  queue.  This process iterates until all symbols are re-
       solved and a subset of tapset script files is selected.

       Next, all probe point descriptions are validated against the wide variety supported by the
       translator.   Probe  points  that refer to code locations ("synchronous probe points") re-
       quire the appropriate kernel debugging information to be  installed.   In  the  associated
       probe  handlers,  target-side  variables  (whose  names begin with "$") are found and have
       their run-time locations decoded.

       Next, all probes and functions are analyzed for optimization opportunities,  in	order  to
       remove variables, expressions, and functions that have no useful value and no side-effect.
       Embedded-C functions are assumed to have side-effects unless they include the magic string
       /* pure */.   Since  this optimization can hide latent code errors such as type mismatches
       or invalid $target variables, it sometimes may be useful to disable the optimizations with
       the -u option.

       Finally,  all variable, function, parameter, array, and index types are inferred from con-
       text (literals and operators).  Stopping the translator after pass 2 causes it to list all
       the  probes, functions, and variables, along with all inferred types.  Any inconsistent or
       unresolved types cause an error.

       In pass 3, the translator writes C code that represents the actions of all selected script
       files,  and creates a Makefile to build that into a kernel object.  These files are placed
       into a temporary directory.  Stopping the translator at this point causes it to print  the
       contents of the C file.

       In  pass 4, the translator invokes the Linux kernel build system to create the actual ker-
       nel object file.  This involves running make in the temporary directory,  and  requires	a
       kernel  module  build  system (headers, config and Makefiles) to be installed in the usual
       spot /lib/modules/VERSION/build.  Stopping the translator after pass 4 is the last  chance
       before running the kernel object.  This may be useful if you want to archive the file.

       In  pass 5, the translator invokes the systemtap auxiliary program staprun program for the
       given kernel object.  This program arranges to load the module then communicates with  it,
       copying trace data from the kernel into temporary files, until the user sends an interrupt
       signal.	Any run-time error encountered by the probe handlers, such as running out of mem-
       ory,  division by zero, exceeding nesting or runtime limits, results in a soft error indi-
       cation.	Soft errors in excess of MAXERRORS block of all subsequent probes (except  error-
       handling  probes),  and	terminate  the session.  Finally, staprun unloads the module, and
       cleans up.

       One should avoid killing the stap process forcibly, for example with SIGKILL, because  the
       stapio  process	(a  child  process of the stap process) and the loaded module may be left
       running on the system.  If this happens, send SIGTERM or SIGINT to  any	remaining  stapio
       processes, then use rmmod to unload the systemtap module.

       See the stapex(3stap) manual page for a collection of samples.

       The  systemtap  translator  caches the pass 3 output (the generated C code) and the pass 4
       output (the compiled kernel module) if pass 4 completes successfully.  This cached  output
       is  reused if the same script is translated again assuming the same conditions exist (same
       kernel version, same systemtap version, etc.).  Cached files are stored	in  the  $SYSTEM-
       TAP_DIR/cache directory. The cache can be limited by having the file cache_mb_limit placed
       in the cache directory (shown above) containing only an	ASCII  integer	representing  how
       many  MiB the cache should not exceed. In the absence of this file, a default will be cre-
       ated with the limit set to 256MiB.  This is a 'soft' limit  in  that  the  cache  will  be
       cleaned	after  a new entry is added if the cache clean interval is exceeded, so the total
       cache size may temporarily exceed this limit. This interval can be specified by having the
       file cache_clean_interval_s placed in the cache directory (shown above) containing only an
       ASCII integer representing the interval in seconds. In the absence of this file, a default
       will be created with the interval set to 30 s.

       Systemtap  is  an administrative tool.  It exposes kernel internal data structures and po-
       tentially private user information.

       To actually run the kernel objects it builds, a user must be one of the following:

       o   the root user;

       o   a member of the stapdev and stapusr groups;

       o   a member of the stapsys and stapusr groups; or

       o   a member of the stapusr group.

       The root user or a user who is a member of both the stapdev and stapusr groups  can  build
       and run any systemtap script.

       A  user who is a member of both the stapsys and stapusr groups can only use pre-built mod-
       ules under the following conditions:

       o   The module has been signed by a trusted signer. Trusted signers are normally systemtap
	   compile-servers  which  sign  modules  when the --privilege option is specified by the
	   client. See the stap-server(8) manual page for more information.

       o   The module was built using the --privilege=stapsys or the --privilege=stapusr options.

       Members of only the stapusr group can only use pre-built modules under the following  con-

       o   The module is located in the /lib/modules/VERSION/systemtap directory.  This directory
	   must be owned by root and not be world writable.


       o   The module has been signed by a trusted signer. Trusted signers are normally systemtap
	   compile-servers  which  sign  modules  when the --privilege option is specified by the
	   client. See the stap-server(8) manual page for more information.

       o   The module was built using the --privilege=stapusr option.

       The kernel modules generated by stap program are run by the staprun program.   The  latter
       is a part of the Systemtap package, dedicated to module loading and unloading (but only in
       the white zone), and kernel-to-user data transfer.  Since staprun does not perform any ad-
       ditional  security checks on the kernel objects it is given, it would be unwise for a sys-
       tem administrator to add untrusted users to the stapdev or stapusr groups.

       The translator asserts certain safety constraints.  It aims to ensure that no handler rou-
       tine can run for very long, allocate memory, perform unsafe operations, or in unintention-
       ally interfere with the	kernel.   Uses	of  script  global  variables  are  automatically
       read/write locked as appropriate, to protect against manipulation by concurrent probe han-
       dlers.  (Deadlocks are detected with timeouts.  Use the -t flag to receive reports of  ex-
       cessive	lock  contention.)   Use  of  guru mode constructs such as embedded C can violate
       these constraints, leading to kernel crash or data corruption.

       The resource use limits are set by macros in the generated C code.  These may be  overrid-
       den with the -D flag.  A selection of these is as follows:

	      Maximum  number  of  nested function calls.  Default determined by script analysis,
	      with a bonus 10 slots added for recursive scripts.

	      Maximum length of strings, default 128.

	      Maximum number of iterations to wait for locks on global variables before declaring
	      possible deadlock and skipping the probe, default 1000.

	      Maximum  number  of  statements to execute during any single probe hit (with inter-
	      rupts disabled), default 1000.

	      Maximum number of statements to execute during any single probe hit which  is  exe-
	      cuted with interrupts enabled (such as begin/end probes), default (MAXACTION * 10).

	      Maximum  number  of  stack frames that will be be processed by the stap runtime un-
	      winder as produced by the backtrace functions in the [u]context-unwind.stp tapsets,
	      default 20.

	      Default maximum number of rows in any single global array, default 2048.	Individu-
	      al arrays may be declared with a larger or smaller limit instead:

	      global big[10000],little[5]

	      or denoted with % to make them wrap-around automatically.

	      Maximum number of soft errors before an exit is triggered, default 0,  which  means
	      that  the  first	error  will  exit the script.  Note that with the --suppress-han-
	      dler-errors option, this limit is not enforced.

	      Maximum number of skipped probes before an exit is triggered, default 100.  Running
	      systemtap  with -t (timing) mode gives more details about skipped probes.  With the
	      default -DINTERRUPTIBLE=1 setting, probes skipped due to reentrancy are not accumu-
	      lated  against  this  limit.   Note that with the --suppress-handler-errors option,
	      this limit is not enforced.

	      Minimum number of free kernel stack bytes required in order to run a probe handler,
	      default  1024.   This  number  should  be  large enough for the probe handler's own
	      needs, plus a safety margin.

	      Maximum number of concurrently armed user-space probes (uprobes), default  somewhat
	      larger  than  the number of user-space probe points named in the script.	This pool
	      needs to be potentialy large because individual  uprobe  objects	(about	64  bytes
	      each) are allocated for each process for each matching script-level probe.

	      Maximum  amount  of memory (in kilobytes) that the systemtap module should use, de-
	      fault unlimited.	The memory size includes the size of the module itself, plus  any
	      additional  allocations.	This only tracks direct allocations by the systemtap run-
	      time.  This does not track indirect allocations (as  done  by  kprobes/uprobes/etc.

	      Size of procfs probe read buffers (in bytes).  Defaults to MAXSTRINGLEN.	This val-
	      ue can be overridden on a per-procfs file basis using the procfs read  probe  .max-
	      size(MAXSIZE) parameter.

       With  scripts  that contain probes on any interrupt path, it is possible that those inter-
       rupts may occur in the middle of another probe handler.	The probe in the  interrupt  han-
       dler  would  be skipped in this case to avoid reentrance.  To work around this issue, exe-
       cute stap with the option -DINTERRUPTIBLE=0 to mask interrupts throughout the  probe  han-
       dler.   This does add some extra overhead to the probes, but it may prevent reentrance for
       common problem cases.  However, probes in NMI handlers and in the  callpath  of	the  stap
       runtime may still be skipped due to reentrance.

       Multiple  scripts  can write data into a relay buffer concurrently. A host script provides
       an interface for accessing its relay buffer to guest scripts.  Then,  the  output  of  the
       guests  are  merged  into the output of the host.  To run a script as a host, execute stap
       with -DRELAYHOST[=name] option. The name identifies your host script among several  hosts.
       While running the host, execute stap with -DRELAYGUEST[=name] to add a guest script to the
       host.  Note that you must unload guests before unloading a host. If there are some  guests
       connected to the host, unloading the host will be failed.

       In  case  something goes wrong with stap or staprun after a probe has already started run-
       ning, one may safely kill both user processes, and remove the active probe  kernel  module
       with rmmod.  Any pending trace messages may be lost.

       In  addition to the methods outlined above, the generated kernel module also uses overload
       processing to make sure that probes can't run  for  too	long.	If  more  than	STP_OVER-
       LOAD_THRESHOLD  cycles  (default  500000000) have been spent in all the probes on a single
       cpu during the last STP_OVERLOAD_INTERVAL cycles (default  1000000000),	the  probes  have
       overloaded the system and an exit is triggered.

       By  default,  overload processing is turned on for all modules.	If you would like to dis-
       able overload processing, define STP_NO_OVERLOAD (or its alias STAP_NO_OVERLOAD).

       Systemtap exposes kernel internal data structures and potentially  private  user  informa-
       tion.  Because of this, use of systemtap's full capabilities are restricted to root and to
       users who are members of the groups stapdev and stapusr.

       However, a restricted set of systemtap's features can be made available	to  trusted,  un-
       privileged  users.  These  users  are members of the group stapusr only, or members of the
       groups stapusr and stapsys.  These users can load systemtap modules which have  been  com-
       piled and certified by a trusted systemtap compile-server. See the descriptions of the op-
       tions --privilege and --use-server. See README.unprivileged in the systemtap  source  code
       for information about setting up a trusted compile server.

       The  restrictions  enforced  when --privilege=stapsys is specified are designed to prevent
       unprivileged users from:

	      o   harming the system maliciously.

       The restrictions enforced when --privilege=stapusr is specified are  designed  to  prevent
       unprivileged users from:

	      o   harming the system maliciously.

	      o   gaining  access  to information which would not normally be available to an un-
		  privileged user.

	      o   disrupting the performance of processes owned by other  users  of  the  system.
		  Some	overhead  to  the system in general is unavoidable since the unprivileged
		  user's probes will be triggered at the appropriate times. What we would like to
		  avoid is targeted interruption of another user's processes which would not nor-
		  mally be possible by an unprivileged user.

       A member of the groups stapusr and stapsys may use all probe points.

       A member of only the group stapusr may use only the following probes:

	      o   begin, begin(n)

	      o   end, end(n)

	      o   error(n)

	      o   never

	      o   process.*, where the target process is owned by the user.

	      o   timer.{jiffies,s,sec,ms,msec,us,usec,ns,nsec}(n)*

	      o   timer.hz(n)

       The following scripting language features are unavailable to all unprivileged users:

	      o   any feature enabled by the Guru Mode (-g) option.

	      o   embedded C code.

       The following runtime restrictions are placed upon all unprivileged users:

	      o   Only the default runtime code (see -R) may be used.

       Additional restrictions are placed on members of only the group stapusr:

	      o   Probing of processes owned by other users is not permitted.

	      o   Access of kernel memory (read and write) is not permitted.

       Some command line options provide access to features which must not be  available  to  all
       unprivileged users:

	      o   -g may not be specified.

	      o   The following options may not be used by the compile-server client:

		      -a, -B, -D, -I, -r, -R

       The following environment variables must not be set for all unprivileged users:


       In general, tapset functions are only available for members of the group stapusr when they
       do not gather information that an ordinary program running  with  that  user's  privileges
       would be denied access to.

       There  are two categories of unprivileged tapset functions. The first category consists of
       utility functions that are unconditionally available to	all  users;  these  include  such
       things as:

	      cpu:long ()
	      exit ()
	      str_replace:string (prnt_str:string, srch_str:string, rplc_str:string)

       The  second  category  consists	of  so-called myproc-unprivileged functions that can only
       gather information within their own processes. Scripts that wish to  use  these	functions
       must test the result of the tapset function is_myproc and only call these functions if the
       result is 1. The script will exit immediately if any of these functions are called  by  an
       unprivileged  user  within a probe within a process which is not owned by that user. Exam-
       ples of myproc-unprivileged functions include:

	      print_usyms (stk:string)
	      user_int:long (addr:long)
	      usymname:string (addr:long)

       A compile error is triggered when any function not in either of the  above  categories  is
       used by members of only the group stapusr.

       No other built-in tapset functions may be used by members of only the group stapusr.

       As  described above, systemtap's default runtime mode involves building and loading kernel
       modules, with various security tradeoffs presented.  Systemtap now includes a  new  proto-
       type  backend,  selected with --runtime=dyninst, which uses Dyninst to instrument a user's
       own processes at runtime. This backend does not use kernel modules, and does  not  require
       root  privileges,  but  is  restricted  with respect to the kinds of probes and other con-
       structs that a script may use.

       The dyninst runtime operates in target-attach mode, so it does require a -c COMMAND or  -x
       PID process.  For example:

	      stap --runtime=dyninst -c 'stap -V' \
		   -e 'probe process.function("main")
		       { println("hi from dyninst!") }'

       It may be necessary to disable a conflicting selinux check with

	      # setsebool allow_execstack 1

       The  systemtap  translator  generally  returns  with  a success code of 0 if the requested
       script was processed and executed successfully through the requested pass.  Otherwise, er-
       rors may be printed to stderr and a failure code is returned.  Use -v or -vp N to increase
       (global or per-pass) verbosity to identify the source of the trouble.

       In listings mode (-l and -L), error messages are normally suppressed.  A success code of 0
       is returned if at least one matching probe was found.

       A script executing in pass 5 that is interrupted with ^C / SIGINT is considered to be suc-

       Over time, some features of the script language and the tapset library may undergo  incom-
       patible	changes,  so  that  a  script  written against an old version of systemtap may no
       longer run.  In these cases, it may help to run systemtap with  the  --compatible  VERSION
       flag,  specifying the last known working version.  Running systemtap with the --check-ver-
       sion flag will output a warning if any possible incompatible elements  have  been  parsed.
       Deprecation historical details may be found in the NEWS file.

       Important files and their corresponding paths can be located in the
	      stappaths(7) manual page.


       Use   the   Bugzilla   link   of   the	project   web	page   or   our   mailing   list.

Unix & Linux Commands & Man Pages : ©2000 - 2018 Unix and Linux Forums

All times are GMT -4. The time now is 08:06 PM.

Unix & Linux Forums Content Copyright©1993-2018. All Rights Reserved.
Show Password

Not a Forum Member?
Forgot Password?