Sponsored Content
Special Forums UNIX and Linux Applications High Performance Computing Memory Barriers for (Ubuntu) Linux (i686) Post 302430480 by gorga on Thursday 17th of June 2010 07:09:41 PM
Old 06-17-2010
Quote:
Originally Posted by Corona688
No significant difference.


Looks like you're right judging by those results. But if I ran it on 4/8/16/32 etc cores, would it still be the case? I have 4-core at work, I'll try it tomoro. Although if LOCK just causes a "re-ordering" of bus-access I suppose theoretically it should impact throughput.

Quote:
How so?
I want to control which queue, and ultimately which pthread runs which task, based on the fact that in the upper application, some tasks communicate very frequently and some never. Also, there is much scope for assigning equal work loads across cores, (think of a n-ary tree structure where the n-paths are of equal length and communication is restricted to nodes of the same path.) I looked at Threading Building Blocks "tasks" at first, but found it too blunt a tool for what I want.

Quote:
I don't see how using a different structure excludes pthreads. You wanted to avoid pthreads since it used atomic ops, and are prepared to use atomic ops instead?
Not really, I originally wanted to use pthreads but they didn't offer the high number of threads and "lightweightness" I needed, (in the order of 10s of 1000s, with many short-lived threads), but user-level threads like GNU threads don't offer multi-core exploitation because the kernel isn't involved. So what I've done, with some inspiration from "protothreads", is provide an abstraction on top of pthreads which provides what I need in the form of lightweight tasks.

I wanted to avoid the pthreads syncrhonisation structures like mutexes because I sought to avoid their overhead and keep it scalable. There are ways to distribute work such that mutexes aren't necessary as long as an ordering of instructions can be guaranteed, hence following your advice, I'll try those atomic instructions from GCC.

thanks!
 

4 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Memory-waste in Ubuntu/Debian?

I have 512 mem on this laptop, though 'top' tells me I only have 380. However, Ubuntu is using 288 mb of memory, when I only have 3 terminals, running lynx, vim(for this file) and (of course) top. Considering it I have lynx running a 600 page txt file, which of course would eat some memory but 300?... (0 Replies)
Discussion started by: riwa
0 Replies

2. Linux

i686, x86 64, ppc

Hi, i am quite new to linux. I am interested in fedora linux distro. Fedora Project I dont know which one to choose, either i686, x86 64 or ppc. I prefer a live cd, coz its easy to use. And what is the difference between "Fedora Desktop Live Media" and "Fedora KDE Live Media". (3 Replies)
Discussion started by: superblacksmith
3 Replies

3. Programming

Getting the total virtual memory for ubuntu in c++

Hi guys , i need to get the total virtual memory in ubuntu but i need to write a C++ code for that, any idea on how to go about doing it? any references? or website that i can refer to ? (6 Replies)
Discussion started by: xiaojesus
6 Replies

4. Ubuntu

XP and Linux (Ubuntu) on same disk, Can I install Ubuntu on not-yet partitioned portion of disk?

My PC (Esprimo, 3 yeas old) has one hard drive having 2 partitions C: (80 GB NTFS, XP) and D: (120 GB NTFS, empty) and and a 200 MB area that yet is not-partitioned. I would like to try Ubuntu and to install Ubuntu on the not-partitioned area . The idea is to have the possibility to run... (7 Replies)
Discussion started by: C.Weidemann
7 Replies
PARALLELCPU(1p) 					User Contributed Perl Documentation					   PARALLELCPU(1p)

NAME
PDL::ParallelCPU - Parallel Processor MultiThreading Support in PDL (Experimental) DESCRIPTION
PDL has support (currently experimental) for splitting up numerical processing between multiple parallel processor threads (or pthreads) using the set_autopthread_targ and set_autopthread_size functions. This can improve processing performance (by greater than 2-4X in most cases) by taking advantage of multi-core and/or multi-processor machines. SYNOPSIS
use PDL; # Set target of 4 parallel pthreads to create, with a lower limit of # 5Meg elements for splitting processing into parallel pthreads. set_autopthread_targ(4); set_autopthread_size(5); $a = zeroes(5000,5000); # Create 25Meg element array $b = $a + 5; # Processing will be split up into multiple pthreads # Get the actual number of pthreads for the last # processing operation. $actualPthreads = get_autopthread_actual(); Terminology The use of the term threading can be confusing with PDL, because it can refer to PDL threading, as defined in the PDL::Threading docs, or to processor multi-threading. To reduce confusion with the existing PDL threading terminology, this document uses pthreading to refer to processor multi-threading, which is the use of multiple processor threads to split up numerical processing into parallel operations. Functions that control PDL PThreads This is a brief listing and description of the PDL pthreading functions, see the PDL::Core docs for detailed information. set_autopthread_targ Set the target number of processor-threads (pthreads) for multi-threaded processing. Setting auto_pthread_targ to 0 means that no pthreading will occur. See PDL::Core for details. set_autopthread_size Set the minimum size (in Meg-elements or 2**20 elements) of the largest PDL involved in a function where auto-pthreading will be performed. For small PDLs, it probably isn't worth starting multiple pthreads, so this function is used to define a minimum threshold where auto-pthreading won't be attempted. See PDL::Core for details. get_autopthread_actual Get the actual number of pthreads executed for the last pdl processing function. See PDL::get_autopthread_actual for details. Global Control of PDL PThreading using Environment Variables PDL PThreading can be globally turned on, without modifying existing code by setting environment variables PDL_AUTOPTHREAD_TARG and PDL_AUTOPTHREAD_SIZE before running a PDL script. These environment variables are checked when PDL starts up and calls to set_autopthread_targ and set_autopthread_size functions made with the environment variable's values. For example, if the environment var PDL_AUTOPTHREAD_TARG is set to 3, and PDL_AUTOPTHREAD_SIZE is set to 10, then any pdl script will run as if the following lines were at the top of the file: set_autopthread_targ(3); set_autopthread_size(10); How It Works The auto-pthreading process works by analyzing threaded array dimensions in PDL operations and splitting up processing based on the thread dimension sizes and desired number of pthreads (i.e. the pthread target or pthread_targ). The offsets and increments that PDL uses to step thru the data in memory are modified for each pthread so each one sees a different set of data when performing processing. Example $a = sequence(20,4,3); # Small 3-D Array, size 20,4,3 # Setup auto-pthreading: set_autopthread_targ(2); # Target of 2 pthreads set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded # This will be split up into 2 pthreads $c = maximum($a); For the above example, the maximum function has a signature of "(a(n); [o]c())", which means that the first dimension of $a (size 20) is a Core dimension of the maximum function. The other dimensions of $a (size 4,3) are threaded dimensions (i.e. will be threaded-over in the maximum function. The auto-pthreading algorithm examines the threaded dims of size (4,3) and picks the 4 dimension, since it is evenly divisible by the autopthread_targ of 2. The processing of the maximum function is then split into two pthreads on the size-4 dimension, with dim indexes 0,2 processed by one pthread and dim indexes 1,3 processed by the other pthread. Limitations Must have POSIX Threads Enabled Auto-PThreading only works if your PDL installation was compiled with POSIX threads enabled. This is normally the case if you are running on linux, or other unix variants. Non-Threadsafe Code Not all the libraries that PDL intefaces to are thread-safe, i.e. they aren't written to operate in a multi-threaded environment without crashing or causing side-effects. Some examples in the PDL core is the fft function and the pnmout functions. To operate properly with these types of functions, the PPCode flag NoPthread has been introduced to indicate a function as not being pthread-safe. See PDL::PP docs for details. Size of PDL Dimensions and PThread Target Due to the way a PDL is split-up for operation using multiple pthreads, the size of a dimension must be evenly divisible by the pthread target. For example, if a PDL has threaded dimension sizes of (4,3,3) and the auto_pthread_targ has been set to 2, then the first threaded dimension (size 4) will be picked to be split up into two pthreads of size 2 and 2. However, if the threaded dimension sizes are (3,3,3) and the auto_pthread_targ is still 2, then pthreading won't occur, because no threaded dimensions are divisible by 2. The algorithm that picks the actual number of pthreads has some smarts (but could probably be improved) to adjust down from the auto_pthread_targ to get a number of pthreads that can evenly divide one of the threaded dimensions. For example, if a PDL has threaded dimension sizes of (9,2,2) and the auto_pthread_targ is 4, the algorithm will see that no dimension is divisible by 4, then adjust down the target to 3, resulting in splitting up the first threaded dimension (size 9) into 3 pthreads. Speed improvement might be less than you expect. If you have a 8 core machine and call auto_pthread_targ with 8 to generate 8 parallel pthreads, you probably won't get a 8X improvement in speed, due to memory bandwidth issues. Even though you have 8 separate CPUs crunching away on data, you will have (for most common machine architectures) common RAM that now becomes your bottleneck. For simple calculations (e.g simple additions) you can run into a performance limit at about 4 pthreads. For more complex calculations the limit will be higher. COPYRIGHT
Copyright 2011 John Cerney. You can distribute and/or modify this document under the same terms as the current Perl license. See: http://dev.perl.org/licenses/ perl v5.14.2 2012-01-02 PARALLELCPU(1p)
All times are GMT -4. The time now is 02:43 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy