Sponsored Content
Full Discussion: I/O bound computing clusters
Special Forums UNIX and Linux Applications High Performance Computing I/O bound computing clusters Post 302692271 by jim mcnamara on Monday 27th of August 2012 10:50:29 AM
Old 08-27-2012
If you have results, in some static form, you can dynamically mount NFS connections as needed to acces those directories.

Is your "backbone" 1GB or 10GB? If you can create a subnet for the fast NICs and each UNIX box has a 10GB NIC, this is very acceptable - what we do now. We create a job's data, notify the other box, it NFS mounts the dataset readonly, and away we go.

There is another issue to consider. Even though you may get great throughput, some boxes have issues. Solaris with older Qlogix cards takes a hit on interrupts. Because the cpu does a lot of work for the NIC.

All of this is a case of limiting factors, something you see in Science a lot. When you raise the bar on one limit (cpu) then some other resource becomes limiting (I/O in this case or interrupt stack). Since it is not economically feasible to build highways to completely handle rush hour traffic, so it is with computers. As long as it does not hurt production, and you get more processing power, you are okay. You did pay for the hadware, so use it.
 

9 More Discussions You Might Find Interesting

1. High Performance Computing

question about clusters

hello all...first off let me say hi and im really glad to be apart of this community....tried to join awhile back but i couldnt for some reason im a highschool student and im eager to learn and what im trying to learn now is clusters i have 3 computers in my room all connected on a simple hub ... (1 Reply)
Discussion started by: hexadecimal0011
1 Replies

2. Virtualization and Cloud Computing

Event Cloud Computing - IBM Turning Data Centers Into ?Computing Cloud?

Tim Bass Thu, 15 Nov 2007 23:55:07 +0000 *I predict we may experience less*debates*on the use of the term “event cloud”*related to*CEP in the future, now that both IBM and Google* have made announcements about “cloud computing” and “computing cloud”, IBM Turning Data Centers Into ‘Computing... (0 Replies)
Discussion started by: Linux Bot
0 Replies

3. Solaris

List zones bound to a pool

How to get the list of zones which are bound to a pool say appPool. Rather then logging in each zone and then check from pool stat command. (3 Replies)
Discussion started by: fugitive
3 Replies

4. Programming

env not bound: BEDEWORK

I was trying to test dump data on bedework jxi console however I got the error below.I'm using debian as my OS and installed quickstart bedework on it. Pls advise what am I missing. thanks Caused by: javax.naming.NameNotFoundException: env not bound at... (1 Reply)
Discussion started by: lhareigh890
1 Replies

5. Solaris

VCS Clusters

:)Hi, can someone please explain VCS clustering and where do we need VCS clusters ..? :o:)Thanks in advance :o:) (1 Reply)
Discussion started by: amitbisht9
1 Replies

6. Linux

Memory bound error...

Hi all, Am getting the below error for a job that is run in our system. error code: 114, pc=0, call=1, seg=0 114 Attempt to access item beyond bounds of memory (Signal 11) This job uses a cobol program and as far as I know, the problem is related to this cobol program. What does this... (1 Reply)
Discussion started by: das.somik
1 Replies

7. Solaris

Bound, Unbound, Idle, Listening,

Hi Guys, I am studying netstat and I am getting confused a lot. I will be glad if someone will be kind enough to explain to me : 1) bound port 2) unbound port 3) idle 4 listening I will very much appreciate it. Thanks guys We have a special forum with special rules for homework (3 Replies)
Discussion started by: cjashu
3 Replies

8. Emergency UNIX and Linux Support

How to fix the CPU bound issues on AIX?

Hi All, Can you please answer my question. i see lot of CPU utilization on AIX LPARs. i am able to find the cause of the probelm. But i do not know how to mitigate or fix the problem. for instance, i found the process which is consuming most of CPU. i informed the responsible team. how... (7 Replies)
Discussion started by: System Admin 77
7 Replies

9. Shell Programming and Scripting

Awk: get upper and lower bound per group

Hi all, I've data as: 22 51018157 51018157 exonic CHKB nonsynonymous SNV 22 51018204 51018204 exonic CHKB nonsynonymous SNV 22 51018428 51018428 exonic CHKB nonsynonymous SNV 22 51018814 51018814 ... (4 Replies)
Discussion started by: genome
4 Replies
in.mpathd(1M)															     in.mpathd(1M)

NAME
in.mpathd - daemon for network adapter (NIC) failure detection, recovery, automatic failover and failback SYNOPSIS
/usr/lib/inet/in.mpathd The in.mpathd daemon performs Network Interface Card (NIC) failure and repair detection. In the event of a NIC failure, it causes IP net- work access from the failed NIC to failover to a standby NIC, if available, or to any another operational NIC that has been configured as part of the same network multipathing group. Once the failed NIC is repaired, all network access is restored to the repaired NIC. The in.mpathd daemon can detect NIC failure and repair through two methods: by monitoring the IFF_RUNNING flag for each NIC (link-based failure detection), and by sending and receiving ICMP echo requests and replies on each NIC (probe-based failure detection). Link-based failure detection requires no explicit configuration and thus is always enabled (provided the NIC driver supports the feature); probe-based failure detection must be enabled through the configuration of one or more test addresses (described below), but has the benefit of testing the entire NIC send and receive path. If only link-based failure detection is enabled, then the health of the interface is determined solely from the state of the IFF_RUNNING flag. Otherwise, the interface is considered failed if either of the two methods indicate a failure, and repaired once both methods indi- cate the failure has been corrected. Not all interfaces in a group need to be configured with the same failure detection methods. As mentioned above, in order to perform probe-based failure detection in.mpathd needs a special test address on each NIC for the purpose of sending and receiving probes on the NIC. Use the ifconfig command -failover option to configure these test addresses. See ifconfig(1M). The test address must belong to a subnet that is known to the hosts and routers on the link. The in.mpathd daemon can detect NIC failure and repair by two methods, by sending and receiving ICMP echo requests and replies on each NIC, and by monitoring the IFF_RUNNING flag for each NIC. The link state on some models of NIC is indicated by the IFF_RUNNING flag, allowing for faster failure detection when the link goes down. The in.mpathd daemon considers a NIC to have failed if either of the above two meth- ods indicates failure. A NIC is considered to be repaired only if both methods indicate the NIC is repaired. The in.mpathd daemon sends the ICMP echo request probes to on-link routers. If no routers are available, it sends the probes to neighboring hosts. Thus, for network failure detection and repair, there must be at least one neighbor on each link that responds to ICMP echo request probes. in.mpathd works on both IPv4 and IPv6. If IPv4 is plumbed on a NIC, an IPv4 test address is configured on theNIC, and the NIC is configured as part of a network multipathing group, then in.mpathd will start sending ICMP probes on the NIC using IPv4. In the case of IPv6, the link-local address must be configured as the test address. The in.mpathd daemon will not accept a non-link-local address as a test address. If the NIC is part of a multipathing group, and the test address has been configured, then in.mpathd will probe the NIC for failures using IPv6. Even if both the IPv4 and IPv6 protocol streams are plumbed, it is sufficient to configure only one of the two, that is, either an IPv4 test address or an IPv6 test address on a NIC. If only an IPv4 test address is configured, it probes using only ICMPv4. If only an IPv6 test address is configured, it probes using only ICMPv6. If both type test addresses are configured, it probes using both ICMPv4 and ICMPv6. The in.mpathd daemon accesses three variable values in /etc/default/mpathd: FAILURE_DETECTION_TIME, FAILBACK and TRACK_INTER- FACES_ONLY_WITH_GROUPS. The FAILURE_DETECTION_TIME variable specifies the NIC failure detection time for the ICMP echo request probe method of detecting NIC fail- ure. The shorter the failure detection time, the greater the volume of probe traffic. The default value of FAILURE_DETECTION_TIME is 10 seconds. This means that NIC failure will be detected by in.mpathd within 10 seconds. NIC failures detected by the IFF_RUNNING flag being cleared are acted on as soon as the in.mpathd daemon notices the change in the flag. The NIC repair detection time cannot be configured; however, it is defined as double the value of FAILURE_DETECTION_TIME. By default, in.mpathd does failure detection only on NICs that are configured as part of a multipathing group. You can set TRACK_INTER- FACES_ONLY_WITH_GROUPS to no to enable failure detection by in.mpathd on all NICs, even if they are not part of a multipathing group. How- ever, in.mpathd cannot do failover from a failed NIC if it is not part of a multipathing group. The in.mpathd daemon will restore network traffic back to the previously failed NIC, after it has detected a NIC repair. To disable this, set the value of FAILBACK to no in /etc/default/mpathd. /etc/default/mpathd Contains default values used by the in.mpathd daemon. See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWcsr | +-----------------------------+-----------------------------+ ifconfig(1M), attributes(5), icmp(7P), icmp6(7P), Test address address is not unique; disabling probe based failure detection In order for in.mpathd to perform probe-based failure detection, each configured test address on the system must be unique. Since the IPv6 test address is a link-local address derived from the ethernet address, each NIC must have a unique MAC address. NIC interface_name of group group_name is not plumbed for IPv[4|6] and may affect failover capability All NICs in a multipathing group must be homogeneously plumbed. For example, if a NIC is plumbed for IPv4, then all NICs in the group must be plumbed for IPv4. The streams modules pushed on all NICs must be identical. No test address configured on interface interface_name disabling probe-based failure detection on it In order for in.mpathd to perform probe-based failure detection on a NIC, it must be configured with a test address: IPv4, IPv6, or both. The link has come up on interface_name more than 2 times in the last minute; disabling failback until it stabilizes. In order to prevent interfaces with intermittent hardware, such as a bad cable, from causing repeated failovers and failbacks, in.mpathd does not failback to interfaces with frequently fluctuating link states. Invalid failure detection time assuming default 10000 An invalid value was encountered for FAILURE_DETECTION_TIME in the /etc/default/mpathd file. Too small failure detection time of time assuming minimum 100 The minimum value that can be specified for FAILURE_DETECTION_TIME is currently 100 milliseconds. Invalid value for FAILBACK value Valid values for the boolean variable FAILBACK are yes or no. Invalid value for TRACK_INTERFACES_ONLY_WITH_GROUPS value Valid values for the boolean variable TRACK_INTERFACES_ONLY_WITH_GROUPS are yes or no. Cannot meet requested failure detection time of time ms on (inet[6] interface_name) new failure detection is time ms The round trip time for ICMP probes is higher than necessary to maintain the current failure detection time. The network is probably con- gested or the probe targets are loaded. in.mpathd automatically increases the failure detection time to whatever it can achieve under these conditions. Improved failure detection time time ms on (inet[6] interface_name) The round trip time for ICMP probes has now decreased and in.mpathd has lowered the failure detection time correspondingly. NIC failure detected on interface_name in.mpathd has detected NIC failure on interface_name, and has set the IFF_FAILED flag on NIC interface_name. Successfully failed over from NIC interface_name1 to NIC interface_name2 in.mpathd has caused the network traffic to failover from NIC interface_name1 to NIC interface_name2, which is part of the multipathing group. NIC repair detected on interface_name in.mpathd has detected that NIC interface_name is repaired and operational. If the IFF_FAILED flag on the NIC was previously set, it will be reset. Successfully failed back to NIC interface_name in.mpathd has restored network traffic back to NIC interface_name, which is now repaired and operational. The link has gone down on interface_name in.mpathd has detected that the IFF_RUNNING flag for NIC interface_name has been cleared, indicating the link has gone down. The link has come up on interface_name in.mpathd has detected that the IFF_RUNNING flag for NIC interface_name has been set, indicating the link has come up. 4 May 2004 in.mpathd(1M)
All times are GMT -4. The time now is 01:07 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy