Sponsored Content
Special Forums UNIX and Linux Applications High Performance Computing Memory Barriers for (Ubuntu) Linux (i686) Post 302430462 by Corona688 on Thursday 17th of June 2010 05:25:35 PM
Old 06-17-2010
P.S. On a two-core single-CPU system, the overhead of XCHG vs LOCK XCHG with five seperate processes:

Code:
$ ./a.out & ./a.out & ./a.out & ./a.out & ./a.out &
12225 !Lock     time = 0 M 8 S 657 ms 205 us = 0.116 Mops/s
12229 !Lock     time = 0 M 8 S 801 ms 676 us = 0.114 Mops/s
12227 !Lock     time = 0 M 8 S 896 ms 459 us = 0.112 Mops/s
12228 !Lock     time = 0 M 8 S 958 ms 739 us = 0.112 Mops/s
12226 !Lock     time = 0 M 9 S 157 ms 723 us = 0.109 Mops/s
12228 Lock      time = 0 M 8 S 610 ms 749 us = 0.116 Mops/s
12227 Lock      time = 0 M 8 S 719 ms 860 us = 0.115 Mops/s
12225 Lock      time = 0 M 9 S 49 ms 622 us = 0.111 Mops/s
12226 Lock      time = 0 M 8 S 608 ms 304 us = 0.116 Mops/s
12229 Lock      time = 0 M 9 S 48 ms 352 us = 0.111 Mops/s

The code is a million loops of this:
Code:
                        "LOOP1:                 \n"
                        "       xchg    %ebx, a \n"
                        "       xchg    %ebx, a \n"
                        "       xchg    %ebx, a \n"
                        "       xchg    %ebx, a \n"
                        "       xchg    %ebx, a \n"
                        "       loop    LOOP1   \n"

except once with LOCK XCHG, one with just XCHG. No significant difference.

---------- Post updated at 03:25 PM ---------- Previous update was at 03:14 PM ----------

Quote:
Originally Posted by gorga
I originally used it with existing thread-pools, but found I needed more control over the allocation of "tasks" to "cores"
How so?
Quote:
hence I'm making my own.
I don't see how using a different structure excludes pthreads. You wanted to avoid pthreads since it used atomic ops, and are prepared to use atomic ops instead? It's best to write portably if possible anyway.

Last edited by Corona688; 06-17-2010 at 06:42 PM..
 

4 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Memory-waste in Ubuntu/Debian?

I have 512 mem on this laptop, though 'top' tells me I only have 380. However, Ubuntu is using 288 mb of memory, when I only have 3 terminals, running lynx, vim(for this file) and (of course) top. Considering it I have lynx running a 600 page txt file, which of course would eat some memory but 300?... (0 Replies)
Discussion started by: riwa
0 Replies

2. Linux

i686, x86 64, ppc

Hi, i am quite new to linux. I am interested in fedora linux distro. Fedora Project I dont know which one to choose, either i686, x86 64 or ppc. I prefer a live cd, coz its easy to use. And what is the difference between "Fedora Desktop Live Media" and "Fedora KDE Live Media". (3 Replies)
Discussion started by: superblacksmith
3 Replies

3. Programming

Getting the total virtual memory for ubuntu in c++

Hi guys , i need to get the total virtual memory in ubuntu but i need to write a C++ code for that, any idea on how to go about doing it? any references? or website that i can refer to ? (6 Replies)
Discussion started by: xiaojesus
6 Replies

4. Ubuntu

XP and Linux (Ubuntu) on same disk, Can I install Ubuntu on not-yet partitioned portion of disk?

My PC (Esprimo, 3 yeas old) has one hard drive having 2 partitions C: (80 GB NTFS, XP) and D: (120 GB NTFS, empty) and and a 200 MB area that yet is not-partitioned. I would like to try Ubuntu and to install Ubuntu on the not-partitioned area . The idea is to have the possibility to run... (7 Replies)
Discussion started by: C.Weidemann
7 Replies
ATOMIC_DEPRECATED(3)					   BSD Library Functions Manual 				      ATOMIC_DEPRECATED(3)

NAME
OSAtomicAdd32, OSAtomicAdd32Barrier, OSAtomicIncrement32, OSAtomicIncrement32Barrier, OSAtomicDecrement32, OSAtomicDecrement32Barrier, OSAtomicOr32, OSAtomicOr32Barrier, OSAtomicOr32Orig, OSAtomicOr32OrigBarrier, OSAtomicAnd32, OSAtomicAnd32Barrier, OSAtomicAnd32Orig, OSAtomicAnd32OrigBarrier, OSAtomicXor32, OSAtomicXor32Barrier, OSAtomicXor32Orig, OSAtomicXor32OrigBarrier, OSAtomicAdd64, OSAtomicAdd64Barrier, OSAtomicIncrement64, OSAtomicIncrement64Barrier, OSAtomicDecrement64, OSAtomicDecrement64Barrier, OSAtomicCompareAndSwapInt, OSAtomicCompareAndSwapIntBarrier, OSAtomicCompareAndSwapLong, OSAtomicCompareAndSwapLongBarrier, OSAtomicCompareAndSwapPtr, OSAtomicCompareAndSwapPtrBarrier, OSAtomicCompareAndSwap32, OSAtomicCompareAndSwap32Barrier, OSAtomicCompareAndSwap64, OSAtomicCompareAndSwap64Barrier, OSAtomicTestAndSet, OSAtomicTestAndSetBarrier, OSAtomicTestAndClear, OSAtomicTestAndClearBarrier, OSMemoryBarrier -- deprecated atomic add, increment, decrement, or, and, xor, compare and swap, test and set, test and clear, and memory barrier SYNOPSIS
#include <libkern/OSAtomic.h> int32_t OSAtomicAdd32(int32_t theAmount, volatile int32_t *theValue); int32_t OSAtomicAdd32Barrier(int32_t theAmount, volatile int32_t *theValue); int32_t OSAtomicIncrement32(volatile int32_t *theValue); int32_t OSAtomicIncrement32Barrier(volatile int32_t *theValue); int32_t OSAtomicDecrement32(volatile int32_t *theValue); int32_t OSAtomicDecrement32Barrier(volatile int32_t *theValue); int32_t OSAtomicOr32(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicOr32Barrier(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicAnd32(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicAnd32Barrier(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicXor32(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicXor32Barrier(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicOr32Orig(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicOr32OrigBarrier(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicAnd32Orig(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicAnd32OrigBarrier(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicXor32Orig(uint32_t theMask, volatile uint32_t *theValue); int32_t OSAtomicXor32OrigBarrier(uint32_t theMask, volatile uint32_t *theValue); int64_t OSAtomicAdd64(int64_t theAmount, volatile OSAtomic_int64_aligned64_t *theValue); int64_t OSAtomicAdd64Barrier(int64_t theAmount, volatile OSAtomic_int64_aligned64_t *theValue); int64_t OSAtomicIncrement64(volatile OSAtomic_int64_aligned64_t *theValue); int64_t OSAtomicIncrement64Barrier(volatile OSAtomic_int64_aligned64_t *theValue); int64_t OSAtomicDecrement64(volatile OSAtomic_int64_aligned64_t *theValue); int64_t OSAtomicDecrement64Barrier(volatile OSAtomic_int64_aligned64_t *theValue); bool OSAtomicCompareAndSwapInt(int oldValue, int newValue, volatile int *theValue); bool OSAtomicCompareAndSwapIntBarrier(int oldValue, int newValue, volatile int *theValue); bool OSAtomicCompareAndSwapLong(long oldValue, long newValue, volatile long *theValue); bool OSAtomicCompareAndSwapLongBarrier(long oldValue, long newValue, volatile long *theValue); bool OSAtomicCompareAndSwapPtr(void* oldValue, void* newValue, void* volatile *theValue); bool OSAtomicCompareAndSwapPtrBarrier(void* oldValue, void* newValue, void* volatile *theValue); bool OSAtomicCompareAndSwap32(int32_t oldValue, int32_t newValue, volatile int32_t *theValue); bool OSAtomicCompareAndSwap32Barrier(int32_t oldValue, int32_t newValue, volatile int32_t *theValue); bool OSAtomicCompareAndSwap64(int64_t oldValue, int64_t newValue, volatile OSAtomic_int64_aligned64_t *theValue); bool OSAtomicCompareAndSwap64Barrier(int64_t oldValue, int64_t newValue, volatile OSAtomic_int64_aligned64_t *theValue); bool OSAtomicTestAndSet(uint32_t n, volatile void *theAddress); bool OSAtomicTestAndSetBarrier(uint32_t n, volatile void *theAddress); bool OSAtomicTestAndClear(uint32_t n, volatile void *theAddress); bool OSAtomicTestAndClearBarrier(uint32_t n, volatile void *theAddress); bool OSAtomicEnqueue(OSQueueHead *list, void *new, size_t offset); void* OSAtomicDequeue(OSQueueHead *list, size_t offset); void OSMemoryBarrier(void); DESCRIPTION
These are deprecated interfaces for atomic and synchronization operations, provided for compatibility with legacy code. New code should use the C11 <stdatomic.h> interfaces described in stdatomic(3). These functions are thread and multiprocessor safe. For each function, there is a version which incorporates a memory barrier and another version which does not. Barriers strictly order memory access on a weakly-ordered architecture such as ARM. All loads and stores executed in sequential program order before the barrier will complete before any load or store executed after the barrier. On some platforms, such as ARM, the barrier operation can be quite expensive. Most code will want to use the barrier functions to ensure that memory shared between threads is properly synchronized. For example, if you want to initialize a shared data structure and then atomically increment a variable to indicate that the initialization is complete, then you must use OSAtomicIncrement32Barrier() to ensure that the stores to your data structure complete before the atomic add. Likewise, the con- sumer of that data structure must use OSAtomicDecrement32Barrier(), in order to ensure that their loads of the structure are not executed before the atomic decrement. On the other hand, if you are simply incrementing a global counter, then it is safe and potentially much faster to use OSAtomicIncrement32(). If you are unsure which version to use, prefer the barrier variants as they are safer. The logical (and, or, xor) and bit test operations are layered on top of the OSAtomicCompareAndSwap() primitives. There are four versions of each logical operation, depending on whether or not there is a barrier, and whether the return value is the result of the operation (eg, OSAtomicOr32() ) or the original value before the operation (eg, OSAtomicOr32Orig() ). The memory address theValue must be "naturally aligned", i.e. 32-bit aligned for 32-bit operations and 64-bit aligned for 64-bit operations. Note that this is not the default alignment of the int64_t in the iOS ARMv7 ABI, the OSAtomic_int64_aligned64_t type can be used to declare variables with the required alignment. The OSAtomicCompareAndSwap() operations compare oldValue to *theValue, and set *theValue to newValue if the comparison is equal. The compar- ison and assignment occur as one atomic operation. OSAtomicTestAndSet() and OSAtomicTestAndClear() operate on bit (0x80 >> ( n & 7)) of byte ((char*) theAddress + ( n >> 3)). They set the named bit to either 1 or 0, respectively. theAddress need not be aligned. The OSMemoryBarrier() function strictly orders memory accesses in a weakly ordered memory model such as with ARM, by creating a barrier. All loads and stores executed in sequential program order before the barrier will complete with respect to the memory coherence mechanism, before any load or store executed after the barrier. Used with an atomic operation, the barrier can be used to create custom synchronization proto- cols as an alternative to the spinlock or queue/dequeue operations. Note that this barrier does not order uncached loads and stores. On a uniprocessor, the barrier operation is typically optimized into a no-op. RETURN VALUES
The arithmetic operations return the new value, after the operation has been performed. The boolean operations come in two styles, one of which returns the new value, and one of which (the "Orig" versions) returns the old. The compare-and-swap operations return true if the com- parison was equal, ie if the swap occured. The bit test and set/clear operations return the original value of the bit. SEE ALSO
stdatomic(3), atomic(3), spinlock_deprecated(3) HISTORY
Most of these functions first appeared in Mac OS 10.4 (Tiger). The "Orig" forms of the boolean operations, the "int", "long" and "ptr" forms of compare-and-swap first appeared in Mac OS 10.5 (Leopard). Darwin Mar 7, 2016 Darwin
All times are GMT -4. The time now is 07:21 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy