Unix/Linux Go Back    


High Performance Computing Message Passing Interface (MPI) programming and tuning, MPI library installation and management, parallel administration tools, cluster monitoring, cluster optimization, and more HPC topics.

I/O bound computing clusters

High Performance Computing


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 08-25-2012
figaro figaro is offline
Registered User
 
Join Date: Jan 2007
Last Activity: 27 August 2016, 2:57 PM EDT
Posts: 842
Thanks: 93
Thanked 29 Times in 25 Posts
I/O bound computing clusters

I want to build a computing cluster and have been looking into grid solutions. My understanding from grid solutions is that participating nodes have to actually sign up to participate in a computation and that an isolated piece of work is sent to the node through a request from that node (pull). Along that reasoning, would a solution whereby a controlling machine sends work to whichever node is available not be a grid solution (push)?

The problem we will be solving is data-intensive, so we will be looking at an I/O bound problem. What methodology is used whereby the data sits on one machine and the nodes use that data? Could a partitioning of the database work, whereby a node only works on the data in the partition and no other?
Sponsored Links
    #2  
Old Unix and Linux 08-26-2012
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 26 April 2017, 11:10 PM EDT
Location: NM
Posts: 11,024
Thanks: 515
Thanked 1,046 Times in 969 Posts
Is it a database or a LUN? Either way you can get to the data, but you do have a wonderful chance of becoming i/o bound.

There are ways around this: creating logical partitions either as tables or LUNs, each on separate physical LUN/tablspace file/devices.

Can you give us more information?
Sponsored Links
    #3  
Old Unix and Linux 08-27-2012
figaro figaro is offline
Registered User
 
Join Date: Jan 2007
Last Activity: 27 August 2016, 2:57 PM EDT
Posts: 842
Thanks: 93
Thanked 29 Times in 25 Posts
Thank you for your response.
The application runs on a database which is multiple 100GB big and resides on one node only for now. The nodes are either on a local area network or wide area network and are idle the vast majority of the time. Almost all nodes are multi-core machines and currently do not have a database. In other words, the nodes could be put to good use if they had easy access to the data (for instance by having its own partition or LUN) or data is sent across via http (which is likely to be detrimental to performance, as I/O becomes the bottleneck).
Putting the logical partitions on each of the nodes seems like a fruitful route, given that the results are no more than a few GB in size. At the same time, it also seems fairly rigid, because if one of the nodes is down, the system needs to be aware that the missing results will need to be recalculated somewhere else.
    #4  
Old Unix and Linux 08-27-2012
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 26 April 2017, 11:10 PM EDT
Location: NM
Posts: 11,024
Thanks: 515
Thanked 1,046 Times in 969 Posts
If you have results, in some static form, you can dynamically mount NFS connections as needed to acces those directories.

Is your "backbone" 1GB or 10GB? If you can create a subnet for the fast NICs and each UNIX box has a 10GB NIC, this is very acceptable - what we do now. We create a job's data, notify the other box, it NFS mounts the dataset readonly, and away we go.

There is another issue to consider. Even though you may get great throughput, some boxes have issues. Solaris with older Qlogix cards takes a hit on interrupts. Because the cpu does a lot of work for the NIC.

All of this is a case of limiting factors, something you see in Science a lot. When you raise the bar on one limit (cpu) then some other resource becomes limiting (I/O in this case or interrupt stack). Since it is not economically feasible to build highways to completely handle rush hour traffic, so it is with computers. As long as it does not hurt production, and you get more processing power, you are okay. You did pay for the hadware, so use it.
Sponsored Links
    #5  
Old Unix and Linux 08-27-2012
figaro figaro is offline
Registered User
 
Join Date: Jan 2007
Last Activity: 27 August 2016, 2:57 PM EDT
Posts: 842
Thanks: 93
Thanked 29 Times in 25 Posts
Thank you again for your response.
Our backbone is 1GB as far as I know, but would have to check. The bigger issue is with nodes on the WAN, we should be lucky to sustain 1MB on those lines. That means we should consider compression/decompression for the results.
I will have our administrator look into setting up NFSes on the available nodes.
Our application stack is fairly standard: FreeBSD 8.x with a C++/mysql/python application. This should eliminate the diversity problem, but that doesn't mean we will not run into hardware issues. We may even have to assign the jobs greedily to the node with the fastest CPU first.
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Memory bound error... das.somik Linux 1 11-04-2011 02:37 AM
env not bound: BEDEWORK lhareigh890 Programming 1 04-29-2011 03:43 PM
Sending alt-n to /dev/pts/1 from process bound to /dev/pts/2 mentos UNIX for Dummies Questions & Answers 0 08-11-2008 09:23 AM
Event Cloud Computing - IBM Turning Data Centers Into ?Computing Cloud? Linux Bot Virtualization and Cloud Computing 0 11-15-2007 07:30 PM



All times are GMT -4. The time now is 05:07 AM.