hiprof(1) General Commands Manual hiprof(1)
NAME
hiprof - CPU-time and page-fault call-graph profiler for performance analysis
SYNOPSIS
hiprof [-pthread] [-fork] [-heapbase addr] [-sigdump signal] [-cycles | -faults] [-quiet | [-v] ] [-output file] [-dirname path] [-pids]
-threads [-all] [-excobj lib]... [-incobj lib]... [-Lpath]... [-E proc]... [-run | -display | gprof-option...] program [argument...]
The first few options above may be essential for the correct execution of the program. See the start of the OPTIONS section below for
details, and specify any that are necessary.
The atom -tool hiprof interface is still available, for compatibility with earlier releases. However, it is now undocumented, and it will
be retired in a future major release.
DESCRIPTION
The hiprof command creates an instrumented version of a program (program.hiprof) that produces call-graph and flat profiles of one of a
range of performance statistics: The CPU time spent in each procedure (or optionally, each source line or instruction), measured by sam-
pling the program counter about every millisecond (the default) The CPU time spent in each procedure and procedure call, measured as
machine cycles, including the effects of any memory-access delays (with the -cycles option) The number of page faults suffered by each pro-
cedure and procedure call (with the -faults option)
If you specify program arguments (argument...) or -run, the instrumented program is executed also.
If you specify -display or any gprof-option, the hiprof command runs the instrumented program and then displays the profile by running the
gprof tool (with any specified gprof-option).
If you omit the program name, a usage message is printed.
The following example shows how to instrument, run, and display the profile for a multi-threaded program: cc *.c -pthread -L. -g1 -O2 -o
program -lapp1 -lapp2 hiprof -pthread -L. -all program data/*
The -all option request that all shared libraries be profiled, but threads-related system libraries cannot be safely instrumented to count
procedure calls that are needed to print a call graph. By default, these libraries are still sampled to provide flat CPU-time profiles. The
-cycles and -faults options cannot be used with threaded programs, but the displayed time or page-fault count for a procedure includes the
time or count for any procedures that it calls but that were not selected for instrumentation--for example, any procedures in libraries not
selected by the -all or -incobj options. This means that time is not lost from these profiles by excluding shared libraries.
When shared libraries are profiled (or when -fork or -pthread is specified for a program that was linked -call_shared), hiprof places
instrumented versions of the selected shared libraries in the working directory, so the LD_LIBRARY_PATH environment variable is defined by
hiprof to tell the loader where to find the instrumented libraries.
OPERANDS
File name of a fully linked call-shared or nonshared executable to be profiled. This program should be compiled with the -g or -gn option
(n>=1) to obtain more complete profiling information. If the default symbol table level (-g0) is used, line number information, static
procedure names, and file names are unavailable. Inlined procedure calls are also unavailable. Programs that are stripped or are optimized
by spike or cc -om are not supported. All arguments following the program name are considered to be arguments needed by the instrumented
program to execute the procedures, lines, and instructions of interest. Multiple arguments can be specified. They imply -run if any are
specified, and they can be replaced by -run if none are needed.
OPTIONS
Options can be abbreviated to three characters. The gprof-options, which are provided as alternatives to the -display option, can be abbre-
viated to one character.
For options that specify a procedure name (proc), C++ procedures can omit the argument type list, though this will match all overloaded
procedures with that name. To select a specific procedure, specify the full symbol name (as printed by the nm command). Symbol names con-
taining spaces, asterisks, and so on must be quoted.
Essential Options
Some or all of these options may be needed to prevent the instrumented program malfunctioning: Specify -pthread if the program or any of
its libraries calls pthread_create(3) (for example, if it was compiled with either the -pthread option or the -threads compatibility
option). This will make the collection of profile data thread-safe. Specify -fork if the program or any of its libraries calls fork(2).
The -fork option ensures that forked multi-threaded programs are profiled in a thread-safe way, and it produces separate profiling data
files for the forked subprocesses, including the process-id in their filenames as if -pids was specified. Failure to use -fork might lead
to deadlock in the forked child processes. This option is usually not needed if the sub-processes call exec(2).
For compatibility with earlier releases, a default level of fork support is provided if the executable is non-shared or if libc.so
is instrumented. However, this approach can lead to deadlock and will be retired in a future release, so specifying -fork is recom-
mended. By default, the hiprof code running in the program's process allocates memory for its own use at address 38000000000. If
the program needs to use memory between 38000000000 and 3ff00000000, specify the address that the hiprof code should use. Specify
-sigdump to force the instrumented program to write the current profile data to its file(s) on receipt of the named signal. By
default, the program writes the profiling data file(s) only when the process terminates, but some processes never terminate nor-
mally, so this option lets you generate the file(s) on demand. After a file is written, the instruction counts of the profile are
all set to zero; so by sending two signals, any interval of a test run can be profiled, with the second signal's file(s) overwriting
the first. For example, to use the default kill pid command to signal the program, specify -sigdump TERM. Choose a signal that the
program does not use for another purpose.
Profiling Statistics Options
Profiles CPU time by counting the machine cycles used in each procedure call. Use this option only for non-threaded programs. Profiles
page faults suffered by each procedure instead of the default time spent in each procedure. Use this option only for non-threaded programs.
File Generating Options
Does not print informational and progress messages on the standard error stream. Prints the command lines used to instrument the program
and to execute the instrumented program. Prints the names of any procedures that were not instrumented. Names the instrumented program
file instead of the default program.hiprof. Specifies the directory to which the instrumented program writes the profiling data file(s)
for each test run. The default is the current directory. Adds the process-id of the instrumented program's test run to the name of the
profiling data file produced (that is, program.pid.hiout). By default, the file is named program.hiout. When profiling a threaded program,
specify -threads to produce a separate profile for each pthread in the program. The files are named program[.pid].sequence.hiout, where
sequence is the thread sequence number assigned by pthread_create(3). The -threads option implies the -pthread option. If -sigdump is
needed, -pthread is recommended instead of -threads, to avoid possible synchronization problems.
Shared-Library Profiling Options
Profiles all the shared libraries in addition to the program's executable. If -all was specified, does not profile the shared library lib.
Can be repeated, to exclude multiple libraries. Profiles the shared library lib. Can be repeated to include multiple libraries. Searches
for shared-libraries in the specified directory before searching the default directories. Can be repeated to make a search path. Use the
same options that were used when linking the program with ld. Does not instrument the procedure proc. This option can be used to exclude
procedures that are uninteresting or that interfere with the instrumentation (such as non-standard assembly code).
Execution Control Options
Executes the instrumented program, even if no arguments are specified. By default, the program is just instrumented for later execution.
Prints the tool's version number. Executes the instrumented program, and runs gprof on the resulting file(s). The following gprof options
are supported: Profiles each instruction within selected procedures. Does not report on called procedures. Excludes procedure proc and
its descendants from the profile, but totals all procedures. Includes only procedure proc and its descendants in the profile, but totals
all procedures. Profiles procedures as an indexed call graph (default). Profiles source lines, listing the most heavily used first. Pro-
files source lines, in order within selected procedures. Merges all input files into file. Prints each procedure's starting line number.
Profiles procedures, listing the most heavily used first (default). Profiles the whole executable and any shared libraries. Reports pro-
cedures that were never called.
NOTES
Temporary instrumentation files are created in /tmp. Set the TMPDIR environment variable to a different directory to create the files
elsewhere, for example in a disk partition with more space.
RESTRICTIONS
The default sampled profile only estimates the CPU time spent in each procedure call; profiles made with the -cycles and -faults options
measure it.
When timing a program's procedures by measuring machine cycles (with the -cycles option), the 32-bit cycle-counting hardware will wrap if
no procedure call or return is executed by the program every few seconds -- for example, because of a long-running loop. If the counter
wraps, the profile will be incorrect. Using the -all or -incobj options to profile all non-system libraries and procedures can help avoid
this restriction.
Approximate performance estimates are as follows but will vary according to the application and the machine's CPU count, type, and clock
rate. The hiprof instrumentation takes ~2s per Mb of program file on a 500-MHz EV6 (21264) Alpha system, using ~10 Mb of memory plus
another ~10 Mb per Mb of the largest file. The instrumented files are ~20% larger than the originals, plus ~1 Mb of hiprof code. They run
~4 times slower. By default, each profile data file is at least the size of the instrumented code (and uses this much memory), but these
files are very small for the -cycles and -faults options.
If a procedure contains interprocedural branches or interprocedural jumps, that procedure will not be instrumented with the -cycles or
-faults option, and no information will be reported about that procedure. Use the -v option to see which procedures were not instrumented.
Compilers can optimize return statements or non-returning function calls to interprocedural branches. To avoid this, recompile with the -O0
or -no_inline option.
FILES
Instrumented version of program produced by hiprof Profile data file produced by program.hiprof Instrumented shared libraries produced by
hiprof Temporary file created and deleted in the current and -dirname path directories.
SEE ALSO
cc(1), dxprof(1), gprof(1), kill(1), ld(1), pixie(1), prof_intro(1), uprofile(1). (dxprof is available as an option.)
fork(2), pthread(3)
Programmer's Guide
hiprof(1)