Multiprocessing Help


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Multiprocessing Help
# 1  
Old 11-01-2012
Multiprocessing Help

Hello all,
I recently wrote a simple script for the analysis jobs I do at work. I have to run multiple files through 5 different stages of an analysis program, the script simply runs all of the files through each stage automatically. My question is this: The computer I'm using has 12 cores, each with two virtual cores, so I am working with 24 virtual processors. Is there a way to either 1) assign different jobs to different processors, or 2) get all of the processors to collaborate on one job? Does this happen automatically? To clarify, the script runs the files through the analysis one at a time, we are looking to speed up the process and this will help. Does anyone have any ideas?
Thanks!

PS: The machine is a new Mac Pro, OS 10.7
# 2  
Old 11-01-2012
The simplest way would be to divide the set of files into 24 subsets and execute your script on those 24 subsets in parallel. Out of curiosity... What processor(s) do you have in that computer?
# 3  
Old 11-01-2012
That makes sense, so if I just send 24 through at once, the CPU will allocate the tasks automatically? Seems simple enough. I was slightly wrong in my post above, the machine has two 2.4GHz 6-Core Intel Xeon E5645 processors, which with hyper threading has 24 virtual cores.
# 4  
Old 11-01-2012
Quote:
Originally Posted by Tyler_92
That makes sense, so if I just send 24 through at once, the CPU will allocate the tasks automatically?
Operating system will schedule tasks on CPUs. You might want to start with a bit lower number of jobs, like 20 and see how the CPU utilization is looking then. If you will still see some idle CPU time then you can add some more jobs. Of course it assumes that the jobs are CPU bound, i.e. there is no heavy disk I/O performed by those jobs.
# 5  
Old 11-01-2012
Quote:
Originally Posted by Tyler_92
That makes sense, so if I just send 24 through at once, the CPU will allocate the tasks automatically? Seems simple enough. I was slightly wrong in my post above, the machine has two 2.4GHz 6-Core Intel Xeon E5645 processors, which with hyper threading has 24 virtual cores.
24 virtual cores are not 24 actual cores. They can't truly do everything simultaneously, they just overlap what parts of two tasks wouldn't compete for the same CPU resources.
# 6  
Old 11-01-2012
Hi.

See GNU Parallel - GNU Project - Free Software Foundation for a utility that would make it easy to control the number of parallel processes run -- it's a bit like xargs, but on steroids. This might help in finding the optimum number of simultaneously running processes for your application, if, for some reason, that is not 24.

Best wishes ... cheers, drl
# 7  
Old 11-01-2012
Quote:
Originally Posted by Tyler_92
My question is this: The computer I'm using has 12 cores, each with two virtual cores, so I am working with 24 virtual processors. Is there a way to either 1) assign different jobs to different processors, or 2) get all of the processors to collaborate on one job? Does this happen automatically?
Ha - i never thought my work with massively parallel computers would ever be of help here!

In principle, this is possible. The whole point of having several CPUs instead of one is to work on several tasks at the same time - in parallel.

There are some restrictions to this, though: first, as my venerable colleague Corona688 already told you, 24 virtual processors are not 24 real processors. But even setting this point aside, there are two classes of problems:

Suppose i give you a number and ask you to multiply it by 3. It will take you a certain time to compute the result. Now, if i give 50 of these numbers to 50 people, the time to come up with the answer will be more or less the same (save for some slack), because every one of them will compute one result only. This is a problem which "scales" very good.

Now let us alter the problem a bit: i give you a number and ask you to multiply it by 3, the result again by 3, and so forth, 50 times. This time 50 people wouldn't help at all because every result is depending on the previous result and therefore computation of the final result will take the same time, regardless of how many people will work on it. This is a very bad scaling problem.

Real world problems are somewhere in between these extremes and usually contain parts which scale good and parts which scale less good. The art of programming massively parallel computers is to identify the good scaling parts and optimize them to the utmost possible.

OK, so far for theory, which will not directly solve your problem. It answers the second part of your question, though: if all the processors can be brought to work together on one analysis depends - it depends on how well the program doing the analysis was written with SMP in mind. As we probably don't know anything about your program you will be left to simply test it. Try it on a two-processor system (1 processor for the OS to eliminate effects from this side) and on your 12-processor system and look if there are any processing time differences.

I am no OSX expert, but as far as i know setting processor affinity is not supported. It might work to use OpenGrid or similar products, but honestly, i have no idea if this even runs under OSX. Maybe someone else can fill the gaps here. You might consider transferring the analysis to a Linux-Workstation, where such tools indeed exist.

I hope this helps.

bakunin
 
Login or Register to Ask a Question

Previous Thread | Next Thread

4 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

<< Threading inside multiprocessing using queues >>

Hi All, I am trying to achieve threading inside each process of multiprocessing. I have 2 queues one for multiprocess (process) & another inside each process. when i execute it got hung after below output. My goal here is to go through p_source queue & for each process picks up all t_source... (0 Replies)
Discussion started by: kamauv234
0 Replies

2. Shell Programming and Scripting

Multiprocessing in Python

Hi there, I have a code that can take in any function with two arguements and do processing. However, I would like to implement a feature whether it can limit a number of process running concurrently so as not take up too much resources. I have tried researching for pool.map however I am unable... (1 Reply)
Discussion started by: alvinoo
1 Replies

3. Programming

Multiprocessing multipointers

I have a complex problem..... I have to search files on directory "text files" then search on all of them for a word or sentence....the user inter my problem is,,,, if I want to create a child for each file...and point a file by pointer to search...and I don't know how much files i have in... (2 Replies)
Discussion started by: fwrlfo
2 Replies

4. UNIX for Dummies Questions & Answers

Multiprocessing under Linux

I'm writing C programs to be executed on a multi-processor UNIX (GNU/Linux, kernel 2.6.11) Do I need to add a special kind of code to somewhere or run a special utility to execute the program file to be executed by all processors? Or is it handled automatically by kernel? (1 Reply)
Discussion started by: rayne
1 Replies
Login or Register to Ask a Question