Making Grid Engine Highly Available


 
Thread Tools Search this Thread
Operating Systems Solaris Solaris BigAdmin RSS Making Grid Engine Highly Available
# 1  
Old 12-16-2009
Making Grid Engine Highly Available

Ashutosh Tripathi, Senior Software Engineer, Sun Microsystems, presents at the Open Source Grid & Cluster Conference, Oakland CA, May 2008

More...
Login or Register to Ask a Question

Previous Thread | Next Thread

4 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sun Grid Engine (SGE) scripts - processors?

Hi, I was trying to run a program that calls 8 processors (with max. RAM of 2 GB per processor). I want to run this program on my cluster that runs SGE. The cluster has 2 nodes, and each node has 62 cores, and 248GB/node. Currently, I use the scripts below, but the program (softx below) crashes... (0 Replies)
Discussion started by: pc2001
0 Replies

2. Post Here to Contact Site Administrators and Moderators

Priority one: Remove highly confidential data

Hi Kindly remove the following from the post . These are confidential info posted by mistake and there is escalation on our company side . Kindly help. Link : https://www.unix.com/shell-programming-and-scripting/194721-process-field-1-depending-field-6-a.html Terms to be removed : itf... (0 Replies)
Discussion started by: ptappeta
0 Replies

3. Solaris

How to activate "high" priority queues for codine (Sun Grid Engine) under solaris 10

How to activate "high" priority queues for codine (Sun Grid Engine) under solaris 10? What are the steps? (0 Replies)
Discussion started by: ionrivera
0 Replies

4. High Performance Computing

Alternative to Sun Grid Engine

Does anybody know of a good alternative to Sun Grid Engine? It seems that Oracle is now charging for this software. I am running a HPC cluster that has Solaris 10 machines and I am adding some nodes that will be running Ubuntu 10.04, eventually the Solaris machines will be migrating to Ubuntu. (0 Replies)
Discussion started by: ccj4467
0 Replies
Login or Register to Ask a Question
SGE_SHADOWD(8)					      Sun Grid Engine Administrative Commands					    SGE_SHADOWD(8)

NAME
sge_shadowd - Sun Grid Engine shadow master daemon SYNOPSIS
sge_shadowd DESCRIPTION
sge_shadowd is a "light weight" process which can be run on so-called shadow master hosts in a Sun Grid Engine cluster to detect failure of the current Sun Grid Engine master daemon, sge_qmaster(8), and to start-up a new sge_qmaster(8) on the host on which the sge_shadowd runs. If multiple shadow daemons are active in a cluster, they run a protocol which ensures that only one of them will start-up a new master dae- mon. The hosts suitable for being used as shadow master hosts must have shared root read/write access to the directory $SGE_ROOT/$SGE_CELL/com- mon as well as to the master daemon spool directory (by default $SGE_ROOT/$SGE_CELL/spool/qmaster). The names of the shadow master hosts need to be contained in the file $SGE_ROOT/$xQS_NAME_Sxx_CELL/common/shadow_masters. RESTRICTIONS
sge_shadowd may only be started by root. ENVIRONMENT VARIABLES
SGE_ROOT Specifies the location of the Sun Grid Engine standard configuration files. SGE_CELL If set, specifies the default Sun Grid Engine cell. To address a Sun Grid Engine cell sge_shadowd uses (in the order of precedence): The name of the cell specified in the environment variable SGE_CELL, if it is set. The name of the default cell, i.e. default. SGE_DEBUG_LEVEL If set, specifies that debug information should be written to stderr. In addition the level of detail in which debug infor- mation is generated is defined. SGE_QMASTER_PORT If set, specifies the tcp port on which sge_qmaster(8) is expected to listen for communication requests. Most installations will use a services map entry for the service "sge_qmaster" instead to define that port. SGE_DELAY_TIME This variable controls the interval in which sge_shadowd pauses if a takeover bid fails. This value is used only when there are multiple sge_shadowd instances and they are contending to be the master. The default is 600 seconds. SGE_CHECK_INTERVAL This variable controls the interval in which the sge_shadowd checks the heartbeat file (60 seconds by default). SGE_GET_ACTIVE_INTERVAL This variable controls the interval when a sge_shadowd instance tries to take over when the heartbeat file has not changed. FILES
<sge_root>/<cell>/common Default configuration directory <sge_root>/<cell>/common/shadow_masters Shadow master hostname file. <sge_root>/<cell>/spool/qmaster Default master daemon spool directory <sge_root>/<cell>/spool/qmaster/heartbeat The heartbeat file. SEE ALSO
sge_intro(1), sge_conf(5), sge_qmaster(8), Sun Grid Engine Installation and Administration Guide. COPYRIGHT
See sge_intro(1) for a full statement of rights and permissions. SGE 6.2u5 $Date$ SGE_SHADOWD(8)