Unix/Linux Go Back    


High Performance Computing Message Passing Interface (MPI) programming and tuning, MPI library installation and management, parallel administration tools, cluster monitoring, cluster optimization, and more HPC topics.

Massively parallel on single core?

High Performance Computing


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 02-11-2010   -   Original Discussion by Andre_Merzky
Andre_Merzky Andre_Merzky is offline
Registered User
 
Join Date: Jan 2010
Last Activity: 28 December 2011, 5:35 PM EST
Posts: 44
Thanks: 0
Thanked 2 Times in 2 Posts
Unix or Linux Question Massively parallel on single core?

Hia all,

I am not sure how many people actually follow the HPC forum on unix.com, but you may be interested in discussing the following (academic) problem:

Assume you want to run a *very* large number (say 100.000) of very lightweight synchronous operations. As an example, assume that you want to run 100.000 instances of


Code:
sleep (3600); // thats one hour sleep

The trivial (aka braindead) approach would be


Code:
for ( int i = 0; i < 100000; i++ )
{
  ::sleep (3600);
}

Takes about 15 years to finish ;-)

One could start 1000 threads, and run a sleep in each of them. That reduces the runtime to 100 hours - still 4 days, and the system is totally idle all the time.

So, using more threads? Won't work, as the max-threads-per-process limit will be hit at some point.

So, spawn 100 processes which spawn 1000 threads each?
The max-threads-per-process limit is, on Linux, close to the max-threads-per-system limit, so that won't work. On other Unixes that is different, but I don't think you get 100.000 threads on a normal single CPU system. Do you?

So, what would your approach be?

I am not looking for a sleep replacement: so saying that I should set alarm or something similar is of not much use. Sleep is obviously only an example here - replace it with an extremely lightweight job, like running a very time consuming synchronous remote operation.

I am looking forward to the ideas you guys can come up with! :-)

Cheers, Andre.

Last edited by Andre_Merzky; 02-11-2010 at 08:48 AM.. Reason: layout...
Sponsored Links
    #2  
Old Unix and Linux 02-15-2010   -   Original Discussion by Andre_Merzky
Neo's Unix or Linux Image
Neo Neo is offline Forum Staff  
Administrator
 
Join Date: Sep 2000
Last Activity: 21 November 2017, 1:06 PM EST
Location: Asia pacific region
Posts: 13,912
Thanks: 898
Thanked 1,241 Times in 583 Posts
Seems overly academic.....

If practice, most people who have a requirement to run 100,000 parallel applications, they would turn do some distributed processing package, for example cluster management software.

Hardware and existing distributed processing software is cheaper (and more practical) than attempting to design a single-core solution (the title of this thread).

In general, you should design your HPC application as a distributed architecture and make the centralized approach a special case of a distributed architecture.
Sponsored Links
    #3  
Old Unix and Linux 02-15-2010   -   Original Discussion by Andre_Merzky
Andre_Merzky Andre_Merzky is offline
Registered User
 
Join Date: Jan 2010
Last Activity: 28 December 2011, 5:35 PM EST
Posts: 44
Thanks: 0
Thanked 2 Times in 2 Posts
Hi Neo,

thanks for your reply!

I agree abut your remark as distributed architectures. This is my day-job, and I like it a lot :-)

I did not make the problem clear enough I think: the workload I am talking about are mostly idle jobs, so the CPU and memory load for each job is *very* low. Yes, I can beat the problem with more cores or nodes, but that seems very much like a waste, as those would be all idling most of the time.

Assume you plan for 1000 threads per core, and use quad code nodes - that would require 25 nodes which all idle all day long :-(

Some more detail, if that helps: the idle processes/threads are basically watchers, which represent a CPU/Memory heavy remote job they spawned, and whose state they are watching. Only when that state changes they become active, and kick of data movements or spawn new jobs.

We can't control the design of the remote job startup API very well (third party, synchronous API only), thus our technical options for obtaining state information about those jobs are limited, and boil down to

Code:
void * run_job (void * data)
{
   // this call runs a remote job, and blocks for hours
   remote_api_call (data);
   store_output_data (data);
}

#define NJOBS 100000

int main ()
{
  pthread_t threads[NJOBS]
  for ( int i = 0; i < NJOBS; i++ )
  {
     pthread_create (threads[i],  run_job, ...)
  }

  for ( int i = 0; i < NJOBS; i++ )
  {
     pthread_join (threads[i]);
  }
}

So, I can throw 25 nodes on that large for loop, and that is what we do basically - but what a waste...

The *real* workload are 100.000 CPU/Memory heavy remote jobs, which have sufficient resources to run concurrently. I am talking about the management side (our workflow engine).

Thanks, Andre.
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
FTP in parallel. bornon2303 Shell Programming and Scripting 2 08-11-2009 01:13 PM
how to know the application run on which core, and run how many times on this core yanglei_fage Programming 2 06-28-2009 04:47 AM
difference between Dual-core & Core-to-duo Ajith kumar.G UNIX for Dummies Questions & Answers 1 05-31-2008 08:50 AM



All times are GMT -4. The time now is 03:14 PM.