Child killing parent process and how to set up SMF


 
Thread Tools Search this Thread
Operating Systems Solaris Child killing parent process and how to set up SMF
# 1  
Old 09-07-2012
Question Child killing parent process and how to set up SMF

Hello,

A little background on what we are doing first. We are running several applications from a CLI, and not all of them are fully functional. They do on occasion core dump, not a problem. We are running a service that takes a screen scrape of those apps and displays them in a more user friendly Java window. So this process is the parent of all of these applictions when run in the GUI.

When one of the applications core dumps, the service goes down and restarts, and after four times in a close span, it doesn't come back up because it is "restarting too quickly." We can deal with the core dumps on their own, but having the whole process go down stops other users from running other applications through the GUI.

My question is how do I set it so when the child process errors out the parent will ignore it and keep on chugging. And if that is not possible, the next solution, although not ideal, would be to change it so the restarter does not decide that after four restarts it will throw the service into maintenance mode.

Any help or direction would be amazing.

Thank you,
Bryan

I have tried adding the following to the manifest, but it so far does not seem to make a difference.
<property_group name='startd' type='framework'>

<propval name='ignore_error' type='astring'

value='core,signal' />
<propval name='duration' type='astring'
value='transient' />

</property_group>

Also, this is all on solaris 10, and the program is called infonet if that is relevant.

Last edited by Bryan.Eidson; 09-07-2012 at 01:27 PM.. Reason: more information
# 2  
Old 09-07-2012
In order to help we need:

What CLI are we talking about here?
Are you Solaris 10?
What shell do you use?
# 3  
Old 09-07-2012
Thanks for your response.

Yes, I am using Solaris 10.

I use /bin/sh, some others use ksh.

I am not sure what you mean by what CLI, but it is an old (~25 year?) ERP system that we fix inhouse as it breaks.
# 4  
Old 09-07-2012
So, you are asking us how to ignore signals in an old ERP. The ERP apps I have seen mostly run on their own in something like realtime and have a database.

Are there shell scripts that invoke your code (would be called by the GUI app).
If there are then you have to modify the ones you run in DEV to add a trap command
Code:
trap 11 'exit 1'

just returns an error when the process dumps core. NOTE: I am assuming you are not getting SIGILL or SIGBUS signals. Just segfaults.

Otherwise should this html-like string
Code:
<propval name='ignore_error' type='astring'
value='core,signal' />

use the signal name(s), core is not a signal name. SEGV or SIGSEV is a signal name, it may also be 11, which is the signal number in Solaris for a segfault. (I am guessing here, not about signals but about your ERP GUI)
This User Gave Thanks to jim mcnamara For This Post:
# 5  
Old 09-07-2012
Bug

There aren't any shell scripts launching it, so I took the second route you suggested. Had some issues with svccfg importing my new manifest, but manually entering those three entries into the configuration instead of core seems to have done it, atleast allowed it to work more than the 5 times it would take to put it in maintenance mode before.

Many thanks!
# 6  
Old 09-13-2012
An update to where I am:

I entered made the above changes to the manifest and it would not import. I then used svccfg to manually add the startd and ignore_error entries. At the time I thought it was working.

Next I wanted to figure out why the manifest I was given was not working, which I will get back to. Also I wanted to make sure if the system restarted that the service would retain those settings. It turns out it did, but the service would not start.

The problem with the way the manifest was written was it defined another instance that we weren't using and then everything else was written inside of it. So no start and stop methods for the default instance, and my property_group settings were not being applied to the default. I rewrote the manifest by getting rid of the defined instance, declared a single_instance, and added the property group to the default.

Swell, I thought I solved the problem because the manifest imported and the service started fine. Except I am back at square one, because the
Code:
 
startd/ignore_error                astring  SIGSEGV,SIGABRT,SIGILL,SIGQUIT,SIGSYS

that are clearly showing up in the svcprop do not seem to be doing what they are supposed to: it went back to maintenence mode after 4 failures. I think the only reason it seemed to work is because the lack of a stop and start made it impossible for the service to bring it down. I am not sure how the system even allowed that to get in, but its the only thing I could think of. Smilie

After all that, any idea how to stop it from going down after a failure?

---------- Post updated 09-13-12 at 01:06 PM ---------- Previous update was 09-12-12 at 01:44 PM ----------

Not sure if there is any extra information I can provide. It's not perfect, but I think I have purged the obvious flaws in the manifest:

Code:
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<!--
    Production Infonet Manifest
    July 23, 2012 03:31:42 PDT dgz
-->
<service_bundle type='manifest' name='site:infonet'>
<service
        name='site/infonet'
        type='service'
        version='1'>
        <single_instance />
        <dependency name='paths'
            grouping='require_all'
            restart_on='error'
            type='path'>
                <service_fmri value='file://localhost/opt/unisyn/infonet/' />
                <service_fmri value='file://localhost/opt/unisyn/infonet/lib/system.conf' />
        </dependency>
        <dependency name='network'
            grouping='require_any'
            restart_on='error'
            type='service'>
                <service_fmri value='svc:/network/service' />
        </dependency>
        <dependent
                name='infonet_multi-user'
                grouping='optional_all'
                restart_on='none'>
                <service_fmri value='svc:/milestone/multi-user' />
        </dependent>
        <!--
                The timeout needs to be large enough to wait for startup.
        -->
        <exec_method
            type='method'
            name='start'
            exec='/lib/svc/method/infonet start'
            timeout_seconds='60' />
        <exec_method
            type='method'
            name='stop'
            exec='/lib/svc/method/infonet stop'
            timeout_seconds='60' />
        <property_group name='startd' type='framework'>
                <propval name='ignore_error' type='astring'
                        value=SIGSEGV,SIGABRT,SIGILL,SIGQUIT,SIGSYS' />
        </property_group>
        <instance name='default' enabled='true'>
         <property_group name='startd' type='framework'>
           <propval name='ignore_error' type='astring'
                value='SIGSEGV,SIGABRT,SIGILL,SIGQUIT,SIGSYS' />
         </property_group>
        </instance>
        <template>
                <common_name>
                        <loctext xml:lang='C'>
                        Production Infonet (httpd)
                        </loctext>
                </common_name>
        </template>
 
</service>
</service_bundle>

Also the svcprop for after it went into maintenence mode:
Code:
# svcprop infonet:default
startd/ignore_error astring SIGSEGV,SIGABRT,SIGILL,SIGQUIT,SIGSYS
general/enabled boolean true
general/single_instance boolean true
paths/entities fmri [URL removed because < 5 posts]
paths/grouping astring require_all
paths/restart_on astring error
paths/type astring path
network/entities fmri svc:/network/service
network/grouping astring require_any
network/restart_on astring error
network/type astring service
dependents/infonet_multi-user fmri svc:/milestone/multi-user
start/exec astring /lib/svc/method/infonet\ start
start/timeout_seconds count 60
start/type astring method
stop/exec astring /lib/svc/method/infonet\ stop
stop/timeout_seconds count 60
stop/type astring method
tm_common_name/C ustring Production\ Infonet\ \(httpd\)
restarter/start_pid count 972
restarter/start_method_timestamp time 1347486995.549591000
restarter/start_method_waitstatus integer 0
restarter/transient_contract count
restarter/logfile astring /var/svc/log/site-infonet:default.log
restarter/auxiliary_state astring restarting_too_quickly
restarter/next_state astring none
restarter/state astring maintenance
restarter/state_timestamp time 1347487004.979156000
restarter/contract count
restarter_actions/refresh integer


Last edited by Bryan.Eidson; 09-14-2012 at 01:01 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

forking a child process and kill its parent to show that child process has init() as its parent

Hi everyone i am very new to linux , working on bash shell. I am trying to solve the given problem 1. Create a process and then create children using fork 2. Check the Status of the application for successful running. 3. Kill all the process(threads) except parent and first child... (2 Replies)
Discussion started by: vizz_k
2 Replies

2. Emergency UNIX and Linux Support

signal between parent process and child process

Hello, everyone. Here's a program: pid_t pid = fork(); if (0 == pid) // child process { execvp ...; } I send a signal (such as SIGINT) to the parent process, the child process receive the signal as well as the parent process. However I don't want to child process to receive the... (7 Replies)
Discussion started by: jackliang
7 Replies

3. Red Hat

Killing child daemon started by parent process

Hi All, Hope this is right area to ask this question. I have a shell script (bash) "wrapper.sh", which contains few simple shell command which executes a "server.sh" (conatins code to execute a java server) as a daemon. Now what I want to kill this "server.sh" so that the server should... (2 Replies)
Discussion started by: jw_amp
2 Replies

4. Shell Programming and Scripting

[KSH/Bash] Starting a parent process from a child process?

Hey all, I need to launch a script from within 2 other scripts that can run independently of the two parent scripts... Im having a hard time doing this, if anyone knows how please let me know. More detail. ScriptA (bash), ScriptB (ksh), ScriptC (bash) ScriptA, launches ScriptB ScirptB,... (7 Replies)
Discussion started by: trey85stang
7 Replies

5. Shell Programming and Scripting

How to make the parent process to wait for the child process

Hi All, I have two ksh script. 1st script calls the 2nd script and the second script calls an 'C' program. I want 1st script to wait until the 'C' program completes. I cant able to get the process id for the 'C' program (child process) to make the 1st script to wait for the second... (7 Replies)
Discussion started by: sennidurai
7 Replies

6. UNIX for Advanced & Expert Users

Child Killing Parent

Hi all, I am writing a script which calls other third party scripts that perform numerous actions. I have no control over these scripts. My problem is, one of these scripts seems to execute and do what it is meant to do, but my calling / parent script always exits at that point. I need to... (4 Replies)
Discussion started by: mark007
4 Replies

7. Shell Programming and Scripting

Killing child process in ksh

I have a script that (ideally) starts tcpdump, sleeps a given number of seconds, then kills it. When I do this for 10 seconds or and hour, it works fine. When I try it for 10 hours (the length I actually want) it just doesn't die, and will actually stick around for days. Relevant part of my... (1 Reply)
Discussion started by: upnix
1 Replies

8. Shell Programming and Scripting

killing a child process

I am calling another script from my main script and making it run in the background,based upon the value of the input provided by the user I want to kill the child process ,I have written this code timer.ksh & PID=$$ print "\n Do you wish to continue .. (Y/N) : \c " read kill_proc if ]... (4 Replies)
Discussion started by: mervin2006
4 Replies

9. Programming

parent and child process question?

Hi everybody, I'm trying to understand how a parent and child processes interact. This function( below) basically measures the fork time from the perspective of the parent only. what i would like to know is how to measure the time from the perspective of parent and child (ie: inserting... (0 Replies)
Discussion started by: tosa
0 Replies

10. Shell Programming and Scripting

killing a child process within a shell

Hi All, I have a written a script in korn shell for importing data into a oracle database. The shell invokes the import within the script. I want to kill this import (child process) . I tries using trap, but this does not kill the import even if i press cnt c. i have to login into other terminal... (2 Replies)
Discussion started by: yerics
2 Replies
Login or Register to Ask a Question