10 More Discussions You Might Find Interesting
1. UNIX for Beginners Questions & Answers
Hi Experts,
I wanted to extend a veritas file system which is running on veritas cluster and mounted on node2 system.
#hastatus -sum
-- System State Frozen
A node1 running 0
A node2 running 0
-- Group State
-- Group System Probed ... (1 Reply)
Discussion started by: Skmanojkum
1 Replies
2. Solaris
M running solaris 10 u8 my vncserver is running on :0 .. and when i try to connect it through tight vncview i can see just see the screen .. with no terminal .. what could be the issue for it ? and what i need to check for it ? (2 Replies)
Discussion started by: fugitive
2 Replies
3. Solaris
One of our system is running 3 oracle db instances. And as per prstat o/p the system is approximately using 78G of swap memory
# prstat -J -n 2,15
PROJID NPROC SWAP RSS MEMORY TIME CPU PROJECT
4038 557 31G 29G 22% 113:23:43 10% proj1
4036 466 20G 19G... (2 Replies)
Discussion started by: fugitive
2 Replies
4. Solaris
I have a T5240 server with following swap configuration
$ grep tmp /etc/vfstab
swap - /tmp tmpfs - yes -
$ swap -l
swapfile dev swaplo blocks free
/dev/swap 4294967295,4294967295 16 213909472 213909472
... (4 Replies)
Discussion started by: fugitive
4 Replies
5. Solaris
my system has 128G of installed memory. top, vmstat shows the system has just over 10G of free memory on the system. but as per prstat o/p the usage is just 50-55G is there anyway i can find which process/zone is using more memory ?
System has 3 zones and all running application servers.
... (1 Reply)
Discussion started by: fugitive
1 Replies
6. Solaris
Hi All
How can we verify if any of the parameters we have change in Solaris10 after reboot. Like is there any command? Please advice
Thanks (3 Replies)
Discussion started by: imran721
3 Replies
7. Solaris
Hi Community,
Do you know a procedure to modify the swap on Solaris10 on a volume VERITAS?
Please help me I'm currently working on this issue.
Thank you for your availability! (1 Reply)
Discussion started by: Sunb3
1 Replies
8. Shell Programming and Scripting
the following simple scripts work fine on linux but fail on solaris:
#!/bin/bash
eval /usr/bin/time -f \'bt=\"%U + %S\"\' ./JUNK >> ./LOG 2>&1
cp ./LOG ./LOG_joe
LC_joe=`cat ./LOG | wc -l`
LC_joe=`echo $LC_joe-1|bc`
tail -1 ./LOG > ./tmp
head -$LC_joe ./LOG_joe > ./LOG
rm ./LOG_joe
... (1 Reply)
Discussion started by: joepareti
1 Replies
9. Red Hat
Hello;
I have 2 node Redhat Cluster (RHEL4 U4 and Cluster Suite) I'm using mc_data fiber channel switch for fencing
when I want to fence manually using
fence_mcdata -a x.x.x. -l xxx -p xxxx -n 5 -o disable
following messages appears
fencing node "test1"
agent "fence_mcdata" reports:... (0 Replies)
Discussion started by: sakir19
0 Replies
10. Solaris
sorry,my english is poor.
who can install win2003 and solaris10 in one pc ?
my win2000server in hda1
so
frist install win2003 in hda5
second install solaris10 in hda2
but after install over,the win2003 can't logon in. alway let me press<ctrl>+<alt>+<del>.
why? (1 Reply)
Discussion started by: keyi
1 Replies
FENCED(8) cluster FENCED(8)
NAME
fenced - the I/O Fencing daemon
SYNOPSIS
fenced [OPTIONS]
DESCRIPTION
The fencing daemon, fenced, fences cluster nodes that have failed. Fencing a node generally means rebooting it or otherwise preventing it
from writing to storage, e.g. disabling its port on a SAN switch. Fencing involves interacting with a hardware device, e.g. network power
switch, SAN switch, storage array. Different "fencing agents" are run by fenced to interact with various hardware devices.
Software related to sharing storage among nodes in a cluster, e.g. GFS, usually requires fencing to be configured to prevent corruption of
the storage in the presence of node failure and recovery. GFS will not allow a node to mount a GFS file system unless the node is running
fenced.
Once started, fenced waits for the fence_tool(8) join command to be run, telling it to join the fence domain: a group of nodes that will
fence group members that fail. When the cluster does not have quorum, fencing operations are postponed until quorum is restored. If a
failed fence domain member is reset and rejoins the cluster before the remaining domain members have fenced it, the fencing is no longer
needed and will be skipped.
fenced uses the corosync cluster membership system, it's closed process group library (libcpg), and the cman quorum and configuration
libraries (libcman, libccs).
The cman init script usually starts the fenced daemon and runs fence_tool join and leave.
Node failure
When a fence domain member fails, fenced runs an agent to fence it. The specific agent to run and the agent parameters are all read from
the cluster.conf file (using libccs) at the time of fencing. The fencing operation against a failed node is not considered complete until
the exec'ed agent exits. The exit value of the agent indicates the success or failure of the operation. If the operation failed, fenced
will retry (possibly with a different agent, depending on the configuration) until fencing succeeds. Other systems such as DLM and GFS
wait for fencing to complete before starting their own recovery for a failed node. Information about fencing operations will also appear
in syslog.
When a domain member fails, the actual fencing operation can be delayed by a configurable number of seconds (cluster.conf post_fail_delay
or -f). Within this time, the failed node could be reset and rejoin the cluster to avoid being fenced. This delay is 0 by default to min-
imize the time that other systems are blocked.
Domain startup
When the fence domain is first created in the cluster (by the first node to join it) and subsequently enabled (by the cluster gaining quo-
rum) any nodes listed in cluster.conf that are not presently members of the corosync cluster are fenced. The status of these nodes is
unknown, and to be safe they are assumed to need fencing. This startup fencing can be disabled, but it's only truly safe to do so if an
operator is present to verify that no cluster nodes are in need of fencing.
The following example illustrates why startup fencing is important. Take a three node cluster with nodes A, B and C; all three have a GFS
file system mounted. All three nodes experience a low-level kernel hang at about the same time. A watchdog triggers a reboot on nodes A
and B, but not C. A and B reboot, form the cluster again, gain quorum, join the fence domain, _don't_ fence node C which is still hung and
unresponsive, and mount the GFS fs again. If C were to come back to life, it could corrupt the fs. So, A and B need to fence C when they
reform the fence domain since they don't know the state of C. If C _had_ been reset by a watchdog like A and B, but was just slow in
rebooting, then A and B might be fencing C unnecessarily when they do startup fencing.
The first way to avoid fencing nodes unnecessarily on startup is to ensure that all nodes have joined the cluster before any of the nodes
start the fence daemon. This method is difficult to automate.
A second way to avoid fencing nodes unnecessarily on startup is using the cluster.conf post_join_delay setting (or -j option). This is the
number of seconds fenced will delay before actually fencing any victims after nodes join the domain. This delay gives nodes that have been
tagged for fencing a chance to join the cluster and avoid being fenced. A delay of -1 here will cause the daemon to wait indefinitely for
all nodes to join the cluster and no nodes will actually be fenced on startup.
To disable fencing at domain-creation time entirely, the cluster.conf clean_start setting (or -c option) can be used to declare that all
nodes are in a clean or safe state to start. This setting/option should not generally be used since it risks not fencing a node that needs
it, which can lead to corruption in other applications (like GFS) that depend on fencing.
Avoiding unnecessary fencing at startup is primarily a concern when nodes are fenced by power cycling. If nodes are fenced by disabling
their SAN access, then unnecessarily fencing a node is usually less disruptive.
Fencing override
If a fencing device fails, the agent may repeatedly return errors as fenced tries to fence a failed node. In this case, the admin can man-
ually reset the failed node, and then use fence_ack_manual(8) to tell fenced to continue without fencing the node.
OPTIONS
Command line options override a corresponding setting in cluster.conf.
-D Enable debugging to stderr and don't fork.
See also fence_tool dump in fence_tool(8).
-L Enable debugging to log file.
See also logging in cluster.conf(5).
-g num groupd compatibility mode, 0 off, 1 on. Default 0.
-r path
Register a directory that needs to be empty for the daemon to start. Use a dash (-) to skip default directories /sys/fs/gfs,
/sys/fs/gfs2, /sys/kernel/dlm.
-c All nodes are in a clean state to start. Do no startup fencing.
-s Skip startup fencing of nodes with no defined fence methods.
-j secs
Post-join fencing delay. Default 6.
-f secs
Post-fail fencing delay. Default 0.
-R secs
Number of seconds to wait for a manual override after a failed fencing attempt before the next attempt. Default 3.
-O path
Location of a FIFO used for communication between fenced and fence_ack_manual.
-h Print a help message describing available options, then exit.
-V Print program version information, then exit.
FILES
cluster.conf(5) is usually located at /etc/cluster/cluster.conf. It is not read directly. Other cluster components load the contents into
memory, and the values are accessed through the libccs library.
Configuration options for fenced are added to the <fence_daemon /> section of cluster.conf, within the top level <cluster> section.
post_join_delay
is the number of seconds the daemon will wait before fencing any victims after a node joins the domain. Default 6.
<fence_daemon post_join_delay="6"/>
post_fail_delay
is the number of seconds the daemon will wait before fencing any victims after a domain member fails. Default 0.
<fence_daemon post_fail_delay="0"/>
clean_start
is used to prevent any startup fencing the daemon might do. It indicates that the daemon should assume all nodes are in a clean
state to start. Default 0.
<fence_daemon clean_start="0"/>
override_path
is the location of a FIFO used for communication between fenced and fence_ack_manual. Default shown.
<fence_daemon override_path="/var/run/cluster/fenced_override"/>
override_time
is the number of seconds to wait for administrator intervention between fencing attempts following fence agent failures. Default 3.
<fence_daemon override_time="3"/>
Per-node fencing settings
The per-node fencing configuration is partly dependant on the specific agent/hardware being used. The general framework begins like this:
<clusternodes>
<clusternode name="node1" nodeid="1">
<fence>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2">
<fence>
</fence>
</clusternode>
</clusternodes>
The simple fragment above is a valid configuration: there is no way to fence these nodes. If one of these nodes is in the fence domain and
fails, fenced will repeatedly fail in its attempts to fence it. The admin will need to manually reset the failed node and then use
fence_ack_manual to tell fenced to continue without fencing it (see override above).
There is typically a single method used to fence each node (the name given to the method is not significant). A method refers to a spe-
cific device listed in the separate <fencedevices> section, and then lists any node-specific parameters related to using the device.
<clusternodes>
<clusternode name="node1" nodeid="1">
<fence>
<method name="1">
<device name="myswitch" foo="x"/>
</method>
</fence>
</clusternode>
<clusternode name="node2" nodeid="2">
<fence>
<method name="1">
<device name="myswitch" foo="y"/>
</method>
</fence>
</clusternode>
</clusternodes>
Fence device settings
This section defines properties of the devices used to fence nodes. There may be one or more devices listed. The per-node fencing sec-
tions above reference one of these fence devices by name.
<fencedevices>
<fencedevice name="myswitch" agent="..." something="..."/>
</fencedevices>
Multiple methods for a node
In more advanced configurations, multiple fencing methods can be defined for a node. If fencing fails using the first method, fenced will
try the next method, and continue to cycle through methods until one succeeds.
<clusternode name="node1" nodeid="1">
<fence>
<method name="1">
<device name="myswitch" foo="x"/>
</method>
<method name="2">
<device name="another" bar="123"/>
</method>
</fence>
</clusternode>
<fencedevices>
<fencedevice name="myswitch" agent="..." something="..."/>
<fencedevice name="another" agent="..."/>
</fencedevices>
Dual path, redundant power
Sometimes fencing a node requires disabling two power ports or two i/o paths. This is done by specifying two or more devices within a
method. fenced will run the agent for the device twice, once for each device line, and both must succeed for fencing to be considered suc-
cessful.
<clusternode name="node1" nodeid="1">
<fence>
<method name="1">
<device name="sanswitch1" port="11"/>
<device name="sanswitch2" port="11"/>
</method>
</fence>
</clusternode>
When using power switches to fence nodes with dual power supplies, the agents must be told to turn off both power ports before restoring
power to either port. The default off-on behavior of the agent could result in the power never being fully disabled to the node.
<clusternode name="node1" nodeid="1">
<fence>
<method name="1">
<device name="nps1" port="11" action="off"/>
<device name="nps2" port="11" action="off"/>
<device name="nps1" port="11" action="on"/>
<device name="nps2" port="11" action="on"/>
</method>
</fence>
</clusternode>
Hardware-specific settings
Find documentation for configuring specific devices from the device agent's man page.
SEE ALSO
fence_tool(8), fence_ack_manual(8), fence_node(8), cluster.conf(5)
cluster 2009-12-21 FENCED(8)