numa_intro(3) osf1 man page

numa_intro(3)						     Library Functions Manual						     numa_intro(3)

NAME
       numa_intro - Introduction to NUMA support

DESCRIPTION
       NUMA,  or  Non-Uniform  Memory Access, refers to a hardware architectural feature in modern multi-processor	platforms that attempts to
       address the increasing disparity between requirements for processor speed and bandwidth and the bandwidth capabilities of  memory  systems,
       including  the interconnect between processors and memory. NUMA systems address this problem by grouping resources--processors, I/O busses,
       and memory--into building blocks that balance an appropriate number of processors and I/O busses with a local memory system  that  delivers
       the  necessary bandwidth.  The local building blocks are combined into a larger system by means of a system level interconnect with a plat-
       form-specific topology.

       The local processor and I/O components on a particular building block can access their own "local" memory with the lowest possible  latency
       for  a particular system design. The local building block can in turn access the resources (processors, I/O, and memory) of remote building
       blocks at the cost of increased access latency and decreased global access bandwidth. The term "Non-Uniform Memory Access"  refers  to  the
       difference in latency between "local" and "remote" memory accesses that can occur on a NUMA platform.

       Overall	system throughput and individual application performance is optimized on a NUMA platform by maximizing the ratio of local resource
       accesses to remote accesses. This is achieved by recognizing and preserving the "affinity" that processes have for the various resources on
       the system building blocks.  For this reason, the building blocks are called "Resource Affinity Domains" or RADs.

       RADs are supported only on a class of platforms known as Cache Coherent NUMA, or CC NUMA, where all memory is accessible and cache coherent
       with respect to all processors and I/O busses. The Tru64 UNIX operating system includes enhancements  to  optimize  system  throughput  and
       application performance on CC NUMA platforms for legacy applications as well as those that use NUMA aware APIs. System enhancements to sup-
       port NUMA are discussed in the following subsections.  Along with system performance monitoring and tuning facilities,  these  enhancements
       allow  the operating system to make a "best effort" to optimize the performance of any given collection of applications or application com-
       ponents on a CC-NUMA platform.

   NUMA Enhancements to Basic UNIX Algorithms and Default Behaviors
       For NUMA, modifications to basic UNIX algorithms (scheduling, memory allocation, and so forth) and  to  default	behaviors  maximize  local
       accesses transparently to applications. These modifications, which include the following, directly benefit legacy and non-NUMA-aware appli-
       cations that were designed for uniprocessors or Uniform Memory Access Symmetric Multiprocessors but run on  CC  NUMA  platforms:  Topology-
       aware placement of data

	      The  operating  system  attempts	to  allocate memory for application (and kernel) data on the RAD closest to where the data will be
	      accessed; or, for data that is globally accessed, the operating system may allocate memory across the available RADs.  When there is
	      insufficient free memory on optimal RADs, the memory allocations for data may "overflow" onto nearby RADs.  Replication of read-only
	      code and data

	      The operating system will attempt to make a local copy of read-only data, such as shared program and library code.  Kernel code  and
	      kernel  read-only data are replicated on all RADs at boot time. If insufficient free local memory is available, the operating system
	      may choose to utilize a remote copy rather than wait for free local memory.  Memory affinity-aware scheduling

	      The operating system scheduler takes "cache affinity" into account when choosing a processor to run a process thread on multiproces-
	      sor  platforms.  Cache  affinity assumes that a process thread builds a "memory footprint" in a particular processor's cache.  On CC
	      NUMA platforms, the scheduler also takes into account the fact that processes will have memory allocated	on  particular	RADs,  and
	      will attempt to keep processes running on processors that are in the same RAD as their memory footprints.  Load balancing

	      To  minimize  the requirement for remote memory allocation (overflow), the scheduler will take into account memory availability on a
	      RAD as well as the processor load average for the RAD.   Although these two factors may at times	conflict  with	one  another,  the
	      scheduler will attempt to balance the load so that processes run where there are memory pages as well as processor cycles available.
	      This balancing involves both the initial selection of a RAD at process creation and migration of processes or  individual  pages	in
	      response to changing loads as processes come and go or their resource requirements or access patterns change.

   NUMA Enhancements to Application Programming Interfaces
       Application programmers can use new or modified library routines to further increase local accesses on CC NUMA platforms. Using these APIs,
       programmers can write new applications or modify old ones to provide additional information to the operating system  or	to  take  explicit
       control	over  process,	thread,  memory  object placement, or some combination of these. NUMA aware routines are included in the following
       libraries: The Standard C Library (libc) The POSIX Threads Library (libpthread) The NUMA Library (libnuma)

       The reference pages that document NUMA-aware APIs note their library location.

SEE ALSO
       Files: numa_types(4)

																     numa_intro(3)
numa_intro(3) osf1 man page | unix.com