Home Man
Today's Posts

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:
Select Section of Man Page:
Select Man Page Repository:

NetBSD 6.1.5 - man page for jemalloc (netbsd section 3)

JEMALLOC(3)			   BSD Library Functions Manual 		      JEMALLOC(3)

     jemalloc -- the default system allocator

     Standard C Library (libc, -lc)

     const char * _malloc_options;

     The jemalloc is a general-purpose concurrent malloc(3) implementation specifically designed
     to be scalable on modern multi-processor systems.	It is the default user space system allo-
     cator in NetBSD.

     When the first call is made to one of the memory allocation routines such as malloc() or
     realloc(), various flags that affect the workings of the allocator are set or reset.  These
     are described below.

     The ``name'' of the file referenced by the symbolic link named /etc/malloc.conf, the value
     of the environment variable MALLOC_OPTIONS, and the string pointed to by the global variable
     _malloc_options will be interpreted, in that order, character by character as flags.

     Most flags are single letters.  Uppercase letters indicate that the behavior is set, or on,
     and lowercase letters mean that the behavior is not set, or off.  The following options are

	A     All warnings (except for the warning about unknown flags being set) become fatal.
	      The process will call abort(3) in these cases.

	H     Use madvise(2) when pages within a chunk are no longer in use, but the chunk as a
	      whole cannot yet be deallocated.	This is primarily of use when swapping is a real
	      possibility, due to the high overhead of the madvise() system call.

	J     Each byte of new memory allocated by malloc(), realloc() will be initialized to
	      0xa5.  All memory returned by free(), realloc() will be initialized to 0x5a.  This
	      is intended for debugging and will impact performance negatively.

	K     Increase/decrease the virtual memory chunk size by a factor of two.  The default
	      chunk size is 1 MB.  This option can be specified multiple times.

	N     Increase/decrease the number of arenas by a factor of two.  The default number of
	      arenas is four times the number of CPUs, or one if there is a single CPU.  This
	      option can be specified multiple times.

	P     Various statistics are printed at program exit via an atexit(3) function.  This has
	      the potential to cause deadlock for a multi-threaded process that exits while one
	      or more threads are executing in the memory allocation functions.  Therefore, this
	      option should only be used with care; it is primarily intended as a performance
	      tuning aid during application development.

	Q     Increase/decrease the size of the allocation quantum by a factor of two.	The
	      default quantum is the minimum allowed by the architecture (typically 8 or 16
	      bytes).  This option can be specified multiple times.

	S     Increase/decrease the size of the maximum size class that is a multiple of the
	      quantum by a factor of two.  Above this size, power-of-two spacing is used for size
	      classes.	The default value is 512 bytes.  This option can be specified multiple

	U     Generate ``utrace'' entries for ktrace(1), for all operations.  Consult the source
	      for details on this option.

	V     Attempting to allocate zero bytes will return a NULL pointer instead of a valid
	      pointer.	(The default behavior is to make a minimal allocation and return a
	      pointer to it.)  This option is provided for System V compatibility.  This option
	      is incompatible with the X option.

	X     Rather than return failure for any allocation function, display a diagnostic mes-
	      sage on stderr and cause the program to drop core (using abort(3)).  This option
	      should be set at compile time by including the following in the source code:

		    _malloc_options = "X";

	Z     Each byte of new memory allocated by malloc(), realloc() will be initialized to 0.
	      Note that this initialization only happens once for each byte, so realloc() does
	      not zero memory that was previously allocated.  This is intended for debugging and
	      will impact performance negatively.

     Extra care should be taken when enabling any of the options in production environments.  The
     A, J, and Z options are intended for testing and debugging.  An application which changes
     its behavior when these options are used is flawed.

     The jemalloc allocator uses multiple arenas in order to reduce lock contention for threaded
     programs on multi-processor systems.  This works well with regard to threading scalability,
     but incurs some costs.  There is a small fixed per-arena overhead, and additionally, arenas
     manage memory completely independently of each other, which means a small fixed increase in
     overall memory fragmentation.  These overheads are not generally an issue, given the number
     of arenas normally used.  Note that using substantially more arenas than the default is not
     likely to improve performance, mainly due to reduced cache performance.  However, it may
     make sense to reduce the number of arenas if an application does not make much use of the
     allocation functions.

     Memory is conceptually broken into equal-sized chunks, where the chunk size is a power of
     two that is greater than the page size.  Chunks are always aligned to multiples of the chunk
     size.  This alignment makes it possible to find metadata for user objects very quickly.

     User objects are broken into three categories according to size:

	1.   Small objects are smaller than one page.

	2.   Large objects are smaller than the chunk size.

	3.   Huge objects are a multiple of the chunk size.

     Small and large objects are managed by arenas; huge objects are managed separately in a sin-
     gle data structure that is shared by all threads.	Huge objects are used by applications
     infrequently enough that this single data structure is not a scalability issue.

     Each chunk that is managed by an arena tracks its contents in a page map as runs of contigu-
     ous pages (unused, backing a set of small objects, or backing one large object).  The combi-
     nation of chunk alignment and chunk page maps makes it possible to determine all metadata
     regarding small and large allocations in constant time.

     Small objects are managed in groups by page runs.	Each run maintains a bitmap that tracks
     which regions are in use.	Allocation requests can be grouped as follows.

	o   Allocation requests that are no more than half the quantum (see the Q option) are
	    rounded up to the nearest power of two (typically 2, 4, or 8).

	o   Allocation requests that are more than half the quantum, but no more than the maximum
	    quantum-multiple size class (see the S option) are rounded up to the nearest multiple
	    of the quantum.

	o   Allocation requests that are larger than the maximum quantum-multiple size class, but
	    no larger than one half of a page, are rounded up to the nearest power of two.

	o   Allocation requests that are larger than half of a page, but small enough to fit in
	    an arena-managed chunk (see the K option), are rounded up to the nearest run size.

	o   Allocation requests that are too large to fit in an arena-managed chunk are rounded
	    up to the nearest multiple of the chunk size.

     Allocations are packed tightly together, which can be an issue for multi-threaded applica-
     tions.  If you need to assure that allocations do not suffer from cache line sharing, round
     your allocation requests up to the nearest multiple of the cache line size.

     The first thing to do is to set the A option.  This option forces a coredump (if possible)
     at the first sign of trouble, rather than the normal policy of trying to continue if at all

     It is probably also a good idea to recompile the program with suitable options and symbols
     for debugger support.

     If the program starts to give unusual results, coredump or generally behave differently
     without emitting any of the messages mentioned in the next section, it is likely because it
     depends on the storage being filled with zero bytes.  Try running it with the Z option set;
     if that improves the situation, this diagnosis has been confirmed.  If the program still
     misbehaves, the likely problem is accessing memory outside the allocated area.

     Alternatively, if the symptoms are not easy to reproduce, setting the J option may help pro-
     voke the problem.	In truly difficult cases, the U option, if supported by the kernel, can
     provide a detailed trace of all calls made to these functions.

     Unfortunately, jemalloc does not provide much detail about the problems it detects; the per-
     formance impact for storing such information would be prohibitive.  There are a number of
     allocator implementations available on the Internet which focus on detecting and pinpointing
     problems by trading performance for extra sanity checks and detailed diagnostics.

     The following environment variables affect the execution of the allocation functions:

     MALLOC_OPTIONS  If the environment variable MALLOC_OPTIONS is set, the characters it con-
		     tains will be interpreted as flags to the allocation functions.

     To dump core whenever a problem occurs:

	   ln -s 'A' /etc/malloc.conf

     To specify in the source that a program does no return value checking on calls to these

	   _malloc_options = "X";

     If any of the memory allocation/deallocation functions detect an error or warning condition,
     a message will be printed to file descriptor STDERR_FILENO.  Errors will result in the
     process dumping core.  If the A option is set, all warnings are treated as errors.

     The _malloc_message variable allows the programmer to override the function which emits the
     text strings forming the errors and warnings if for some reason the stderr file descriptor
     is not suitable for this.	Please note that doing anything which tries to allocate memory in
     this function is likely to result in a crash or deadlock.

     All messages are prefixed by ``<progname>: (malloc)''.

     emalloc(3), malloc(3), memory(3), memoryallocators(9)

     Jason Evans, A Scalable Concurrent malloc(3) Implementation for FreeBSD,
     http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf, April 16, 2006, BSDCan

     Poul-Henning Kamp, "Malloc(3) revisited", Proceedings of the FREENIX Track: 1998 USENIX
     Annual Technical Conference, USENIX Association,
     http://www.usenix.org/publications/library/proceedings/usenix98/freenix/kamp.pdf, June
     15-19, 1998.

     Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles, Dynamic Storage
     Allocation: A Survey and Critical Review, University of Texas at Austin,
     ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps, 1995.

     The jemalloc allocator became the default system allocator first in FreeBSD 7.0 and then in
     NetBSD 5.0.  In both systems it replaced the older so-called ``phkmalloc'' implementation.

     Jason Evans <jasone@canonware.com>

BSD					  June 21, 2011 				      BSD

All times are GMT -4. The time now is 09:33 AM.

Unix & Linux Forums Content Copyrightę1993-2018. All Rights Reserved.
Show Password