osf1 prof_intro man page on unix.com

prof_intro(1)						      General Commands Manual						     prof_intro(1)

NAME
       prof_intro - Introduction to application profilers, profiling, optimization, and performance analysis

DESCRIPTION
       Tru64  UNIX  supports four approaches to performance improvement: Automatic and profile-directed optimizations. For example: cc -non_shared
       -O3 -om *.c Manual design and code optimizations. For example: hiprof -all -display program data/* | more uprofile -heavy program data/*  |
       more Minimizing system-resource usage. For example: third -display program data/* | more Verifying significance of test cases. For example:
       pixie -testcoverage program data/* | more

       One approach might be enough, but more might be beneficial if no single approach addresses all aspects of a program's performance. The fol-
       lowing sections describe each approach and the tools provided by Tru64 UNIX to support them.

AUTOMATIC AND PROFILE-DIRECTED OPTIMIZATIONS
   Techniques
       Automatic and profile-directed optimizations are the simplest approaches to improving application performance.

       Some degree of automatic optimization can be achieved by using the compiler's and linker's optimization options. These can help in the gen-
       eration of minimal instruction sequences that make best use of the CPU architecture and cache memory.

       However, the compiler and linker can improve their optimizations if they are given information on  which  instructions  are  executed  most
       often  when  the  program  is run with its normal input data and environment. While the default optimizations give improved performance for
       most common situations, the optimizers can do even better if they can tune the program in favor of the heavily used  instruction  sequences
       as determined from a sample run.

       Tru64  UNIX  helps you provide the optimizers with this information on processing hot-spots by allowing a profiler's results to be fed back
       into a recompilation. This customized, profile-directed optimization can be used in conjunction with automatic optimization.

   Tools and Examples
       The cc compiler command's automatic optimization options are selected with the -O, -fast, -inline, -om, -cord, -feedback, and other related
       options. See cc(1) for details and Chapter 10 of the Programmer's Guide for more information on the many options and tradeoffs available.

       For example, this command selects a high degree of optimization in both the compiler and the linker: cc -non_shared -O3 -om *.c

       The  pixie profiler provides profile information that the cc compiler's -feedback and -om options can use to tune the generated instruction
       sequences to the demands placed on the program by particular sets of input data.

       The steps, shown in the following example, consist of(1) preparing the program for profile-directed optimization, (2) creating an  instru-
       mented  version	of  the  program and running it to collect profiling statistics, and(3) feeding that information back to the compiler and
       linker to help them optimize the executable code: cc -non_shared -feedback program -o program -O3 *.c pixie -update program cc  -non_shared
       -feedback program -o program -O3 -om *.c

       This technique is applicable only to executables. To apply profile-directed optimizations to shared libraries, use the alternative feedback
       file format and the -cord option. For example: cc -o libexample.so -shared -g1 -O3 lib*.c cc -o exerciser  -O3  exerciser.c  -L.  -lexample
       pixie -L. -incobj libexample.so -run exerciser prof -pixie -feedback libexample.fb libexample.so exerciser.Counts cc -cord -feedback libex-
       ample.fb -o libexample.so -shared -g1 -O3 lib*.c

MANUAL DESIGN AND CODE OPTIMIZATIONS
   Techniques
       The effectiveness of the automatic optimizations described above is limited by the efficiency of the algorithms that the  program  uses.  A
       program's  performance  can  be	further improved by manually optimizing its algorithms and data structures. Such optimizations may include
       reducing complexity from N-squared to log-N, avoiding copying of data, and reducing the amount of data used. It may also extend	to  tuning
       the  algorithm  to the architecture of the particular machine it will be run on - for example, processing large arrays in small blocks such
       that each block remains in the data cache for all processing, instead of the whole array being read into  the  cache  for  each	processing
       phase.

       Tru64 UNIX supports manual optimization with its profiling tools, which identify the parts of the application that use most CPU resources -
       CPU cycles, cache misses, and so on. By evaluating different profiles of a program, you can identify which parts of the	program  use  most
       CPU  resources  and  your can then redesign or recode algorithms in those parts to use less resources. The profiles also make this exercise
       more cost-effective by helping you to focus on the most demanding code rather than on the least demanding.

   Tools and Examples
       .SS(a) CPU-Time Profiling with Call-Graph

       A call-graph profile shows how much CPU time is used by each procedure, and how much is used by all the other  procedures  that	it  calls.
       This  can  show which phases or subsystems in a program spend most of the total CPU time, which can help in gaining a general understanding
       of the program's performance.

       The hiprof profiler instruments the program and records a call graph while the instrumented program executes. The hiprof profiler does  not
       require	that  the program be compiled in any particular way, but the names of local (for example, static) procedures will be hidden if the
       cc command's default -g0 option was used, and procedures will be hidden if they are inlined. For example: cc -g1 -O2 -o program *.c  hiprof
       -all -display program data/* | more

       By  default, hiprof uses a low-frequency sampling technique and estimates the cost of procedure calls. It can profile all the code executed
       by the program, including all selected libraries, though its call graph excludes procedures in threads-related  system  libraries.  It  can
       also provide detailed profiles at the level of source lines or machine instructions.

       For  non-threaded  programs,  hiprof can alternatively count the number of machine cycles used or page faults suffered by the program.  The
       cost of each procedure call is individually measured, and the CPU time or page-fault count reported for the instrumented routines  includes
       that  for  the  uninstrumented  routines  that  they call. This can summarize the costs and reduce the run-time overhead, but note that the
       machine-cycle counter wraps if no instrumented procedure is called at least every few seconds.

       The cc compiler's -pg option uses the same sampling technique as hiprof, but the program needs to be instrumented by compiling with the -pg
       option.	 Only  the  executable is profiled (not shared libraries), and few system libraries are instrumented to generate a call-graph pro-
       file; so, hiprof may be preferred. However, the cc command's -pg option and gprof are supported in a very similar way on different vendors'
       UNIX systems, so this may be an advantage. For example: cc -g1 -O2 -pg -o program *.c ./program data/* gprof program gmon.out | more

       The optional dxprof command provides a graphical display of various call-graph profiles.

       .SS(b) CPU-Time/Event Profiles for Sourcelines/Instructions

       A  good	performance-improvement  strategy may start with a procedure-level profile of the whole program (perhaps with a call graph too, to
       give the big picture), but it will often progress to detailed profiling of individual source-lines and instructions.

       The uprofile profiler uses a sampling technique to generate a profile of the CPU-time or events such as cache misses associated	with  each
       procedure or source-line or instruction. The sampling frequency depends on the processor type and the statistic being sampled, but for CPU-
       time it is on the order of a millisecond. The profiler achieves this without modifying the target program at all, by using  hardware  coun-
       ters  that  are	built  into the Alpha CPU.  Running the uprofile command with no arguments yields a list of all the kinds of events that a
       particular machine can profile, depending on the nature of its architecture. The default is to profile machine cycles, resulting in a  CPU-
       time profile. The following example shows how to display a profile of the source-lines that suffered the top 90% of data cache misses on an
       EV56 Alpha: cc -g1 -O2 -o program *.c uprofile -h -q 90cum% dcacheldmisses program data/* | more

       This technique has the advantage of very low run-time overhead. Also, the detailed information it can provide on  the  costs  of  executing
       individual instructions or source-lines is essential in identifying exactly which operation in a procedure is slowing the program down.

       The disadvantages of uprofile are that only executables can be profiled, only one program can be profiled with the hardware counters at one
       time, threads can not be profiled individually, and the Alpha EV6 architecture's execution of instructions out  of  sequence  can  signifi-
       cantly reduce the accuracy of fine-grained profiles.

       If  hiprof's  call  counting is not too intrusive, it can provide the same fine-grain profiles as uprofile (CPU time only), but for all the
       shared libraries of a program and for individual threads. For example: hiprof -h -all program data/* | more

       The cc compiler's -p option uses the same low-frequency sampling technique as hiprof. It is common to many  UNIX  systems,  and	(on  Tru64
       UNIX)  it  is  able to profile all the shared libraries used by a program. The program needs to be relinked with the -p option, but it does
       not need to be recompiled from source, so long as the original compilation used an acceptable debug level, such as the -g1 compiler option.
       For  example, to profile individual instructions of a program: cc -p -o program *.o setenv PROFFLAGS '-all -stride 1' ./program data/* prof
       -all -asm -quit 5% program mon.out | more

       The pixie tool can also profile source-lines and instructions (including shared libraries), but	note  that  when  it  displays	counts	of
       "Cycles",  it  is actually reporting counts of instructions executed, not machine cycles. For example: cc -g1 -O2 -o program *.c pixie -all
       -lines -quit 20 program data/* | more

       The optional dxprof command provides a graphical display of profiles collected by either pixie or the cc command's -p option.

MINIMIZING SYSTEM RESOURCE USAGE
   Techniques
       The above techniques can improve an application's use of just the CPU.  Further performance improvements can be made by improving the effi-
       ciency with which the application uses the other components of the computer system: heap memory, disk files, network connections, etc.

       As  with  CPU  profiling,  the  first phase of a resource usage improvement process is to monitor how much memory, data I/O and disk space,
       elapsed time, and so on, is used. Then the throughput of the computer can be increased or tuned in ways that help the program, or the  pro-
       gram's  design can be tuned to make better use of the computer resources that are available. For example: Reduce the size of the data files
       that the program reads and writes.  Use memory-map files instead of regular I/O.  Allocate memory incrementally on demand instead of  allo-
       cating  at start-up the maximum that could be required.	Fix heap leaks, and do not leave allocated memory unused.  See the System Configu-
       ration and Tuning manual for a broader discussion of analyzing and tuning a Tru64 UNIX system.

   Tools and Examples
       .SS(a) System Monitors

       The Tru64 UNIX base system commands ps u, swapon -s, and vmstat 3 can show the currently active processes' usage of system  resources  such
       as CPU-time, physical and virtual memory, swap space, page faults, and so on.

       The optional pview command provides a graphical display of similar information for the processes that comprise an application.

       The time commands provided by the Tru64 UNIX system and command shells provide an easy way to measure the total elapsed and CPU times for a
       program and it descendants.

       Performance Manager is an optional system performance monitoring and management tool with a graphical interface.

       Many other related commands are described in the System Configuration and Tuning manual.

       .SS(b) Heap Memory Analyzers

       The third command reports heap memory leaks in a program, by instrumenting it with the Third Degree memory-usage checker, running  it,  and
       displaying a log of leaks detected at program exit. For example: third -display program data/* | more

       If  you	are interested only in leaks occurring during the normal operation of the program, not during startup or shutdown, you can specify
       additional places to check for previously unreported leaks. For example, the pre-shutdown leak report will  give  this  information:  third
       -display -after startup -before shutdown program data/* | more

       Third  Degree can also detect various kinds of bugs that may be affecting the correctness or performance of a program. See the Programmer's
       Guide for further details on debugging and leak-detection.

       The optional dxheap command provides a graphical display of Third Degree's heap and bug reports.

       The optional mview command provides a graphical analysis of heap usage over time. This view of a program's heap can clearly show the  pres-
       ence (if not the cause) of significant leaks or other undesireable trends such as wasted memory.

VERIFYING SIGNIFICANCE OF TEST CASES
   Techniques
       Most of the above profiling techniques are effective only if you profile and optimize or tune the parts of the program that are executed in
       the scenarios whose performance is important. Careful selection of the data used for the profiled test-runs is often  sufficient,  but  you
       may want a quantitative analysis of which code was and was not executed in a given set of tests.

   Tools and Examples
       The  pixie command's -t[estcoverage] option reports lines of code that were not executed in a given test run. For example: pixie -t program
       data/* | more

       Conversely, pixie's -p[rocedure], -h[eavy], and -a[sm] options show which procedures, source lines, and instructions were executed.

       If multiple test runs are needed to build up a typical scenario, the prof command can be run separately on a set  of  profile  data  files:
       pixie -pids program ./program.pixie data1/* ./program.pixie data2/* prof -pixie -t program program.Counts.*

SEE ALSO
       Profiling:  cc(1), hiprof(1), pixie(1), third(1), uprofile(1)

       SystemMonitoring:  ps(1), swapon(1), vmstat(1)

       Performance Manager, available from the Tru64 UNIX Associated Products installation media:  pmgr(8X)

       Graphical  tools, available from the Graphical Program Analysis subset of the Tru64 UNIX Associated Products installation media, or as part
       of Compaq's Enterprise Toolkit for Windows/NT desktops with Microsoft's Visual Studio 97: dxheap(1), dxprof(1), mview(1), pview(1)

       Programmer's Guide

       System Configuration and Tuning

																     prof_intro(1)
osf1 man page for prof_intro