I/O bound computing clusters

Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
I/O bound computing clusters

I want to build a computing cluster and have been looking into grid solutions. My understanding from grid solutions is that participating nodes have to actually sign up to participate in a computation and that an isolated piece of work is sent to the node through a request from that node (pull). Along that reasoning, would a solution whereby a controlling machine sends work to whichever node is available not be a grid solution (push)?

The problem we will be solving is data-intensive, so we will be looking at an I/O bound problem. What methodology is used whereby the data sits on one machine and the nodes use that data? Could a partitioning of the database work, whereby a node only works on the data in the partition and no other?
# 2  
Is it a database or a LUN? Either way you can get to the data, but you do have a wonderful chance of becoming i/o bound.

There are ways around this: creating logical partitions either as tables or LUNs, each on separate physical LUN/tablspace file/devices.

Can you give us more information?
# 3  
Thank you for your response.
The application runs on a database which is multiple 100GB big and resides on one node only for now. The nodes are either on a local area network or wide area network and are idle the vast majority of the time. Almost all nodes are multi-core machines and currently do not have a database. In other words, the nodes could be put to good use if they had easy access to the data (for instance by having its own partition or LUN) or data is sent across via http (which is likely to be detrimental to performance, as I/O becomes the bottleneck).
Putting the logical partitions on each of the nodes seems like a fruitful route, given that the results are no more than a few GB in size. At the same time, it also seems fairly rigid, because if one of the nodes is down, the system needs to be aware that the missing results will need to be recalculated somewhere else.
# 4  
If you have results, in some static form, you can dynamically mount NFS connections as needed to acces those directories.

Is your "backbone" 1GB or 10GB? If you can create a subnet for the fast NICs and each UNIX box has a 10GB NIC, this is very acceptable - what we do now. We create a job's data, notify the other box, it NFS mounts the dataset readonly, and away we go.

There is another issue to consider. Even though you may get great throughput, some boxes have issues. Solaris with older Qlogix cards takes a hit on interrupts. Because the cpu does a lot of work for the NIC.

All of this is a case of limiting factors, something you see in Science a lot. When you raise the bar on one limit (cpu) then some other resource becomes limiting (I/O in this case or interrupt stack). Since it is not economically feasible to build highways to completely handle rush hour traffic, so it is with computers. As long as it does not hurt production, and you get more processing power, you are okay. You did pay for the hadware, so use it.
# 5  
Thank you again for your response.
Our backbone is 1GB as far as I know, but would have to check. The bigger issue is with nodes on the WAN, we should be lucky to sustain 1MB on those lines. That means we should consider compression/decompression for the results.
I will have our administrator look into setting up NFSes on the available nodes.
Our application stack is fairly standard: FreeBSD 8.x with a C++/mysql/python application. This should eliminate the diversity problem, but that doesn't mean we will not run into hardware issues. We may even have to assign the jobs greedily to the node with the fastest CPU first.
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Awk: get upper and lower bound per group

Hi all, I've data as: 22 51018157 51018157 exonic CHKB nonsynonymous SNV 22 51018204 51018204 exonic CHKB nonsynonymous SNV 22 51018428 51018428 exonic CHKB nonsynonymous SNV 22 51018814 51018814 ... (4 Replies)
Discussion started by: genome
4 Replies

2. Emergency UNIX and Linux Support

How to fix the CPU bound issues on AIX?

Hi All, Can you please answer my question. i see lot of CPU utilization on AIX LPARs. i am able to find the cause of the probelm. But i do not know how to mitigate or fix the problem. for instance, i found the process which is consuming most of CPU. i informed the responsible team. how... (7 Replies)
Discussion started by: System Admin 77
7 Replies

3. Solaris

Bound, Unbound, Idle, Listening,

Hi Guys, I am studying netstat and I am getting confused a lot. I will be glad if someone will be kind enough to explain to me : 1) bound port 2) unbound port 3) idle 4 listening I will very much appreciate it. Thanks guys We have a special forum with special rules for homework (3 Replies)
Discussion started by: cjashu
3 Replies

4. Linux

Memory bound error...

Hi all, Am getting the below error for a job that is run in our system. error code: 114, pc=0, call=1, seg=0 114 Attempt to access item beyond bounds of memory (Signal 11) This job uses a cobol program and as far as I know, the problem is related to this cobol program. What does this... (1 Reply)
Discussion started by: das.somik
1 Replies

5. Solaris

VCS Clusters

:)Hi, can someone please explain VCS clustering and where do we need VCS clusters ..? :o:)Thanks in advance :o:) (1 Reply)
Discussion started by: amitbisht9
1 Replies

6. Programming

env not bound: BEDEWORK

I was trying to test dump data on bedework jxi console however I got the error below.I'm using debian as my OS and installed quickstart bedework on it. Pls advise what am I missing. thanks Caused by: javax.naming.NameNotFoundException: env not bound at... (1 Reply)
Discussion started by: lhareigh890
1 Replies

7. Solaris

List zones bound to a pool

How to get the list of zones which are bound to a pool say appPool. Rather then logging in each zone and then check from pool stat command. (3 Replies)
Discussion started by: fugitive
3 Replies

8. Virtualization and Cloud Computing

Event Cloud Computing - IBM Turning Data Centers Into ?Computing Cloud?

Tim Bass Thu, 15 Nov 2007 23:55:07 +0000 *I predict we may experience less*debates*on the use of the term “event cloud”*related to*CEP in the future, now that both IBM and Google* have made announcements about “cloud computing” and “computing cloud”, IBM Turning Data Centers Into ‘Computing... (0 Replies)
Discussion started by: Linux Bot
0 Replies

9. High Performance Computing

question about clusters

hello all...first off let me say hi and im really glad to be apart of this community....tried to join awhile back but i couldnt for some reason im a highschool student and im eager to learn and what im trying to learn now is clusters i have 3 computers in my room all connected on a simple hub ... (1 Reply)
Discussion started by: hexadecimal0011
1 Replies

Featured Tech Videos