I have a ksh script (dtksh Version M-12/28/93d on Solaris 10) that is run daily by cron and sometime hangs forever. I need to detect if there is an old copy hung before I start the new run, and if so send an email and exit the script. Here is part of the code:
Here is the problem, and I am really stumped:
Even when this script is the only instance, $count still equals 2.
Debugging shows that:
pgrep emits one PID
wc -l returns " 1 " (Seven spaces, the digit 1, single space. I need to trim the spaces before the test, thus the awk command.)
and then awk returns 2
??? !!! ???
I see the same thing when using these versions of the command:
When I test at the shell using these checks against existing processes the result is alway correct. I have checked and double checked and it seems that the digit 1 surrounded by spaces is being piped into either tr or awk, and both return the digit 2 with no spaces.
Help will be really appreciated.
Last edited by vbe; 12-14-2010 at 01:59 PM..
Reason: code tags please
I would expect that when the script runs:
The script will create a child process (thats what the round brackets do) so that it can run the pgrep, wc, awk, etc. When it creates a child process, it will create it with the same name as the parent process, ie $PROGNAME. Therefore you will always have more than one process...
Both of these set count to 1 and not to 2, regardless that they run in a subshell:
It is only when I add the pipe to tr or awk that I get '2' ---------- Post updated at 05:49 PM ---------- Previous update was at 01:46 PM ----------
I learned something but I don't understand it. Anyone explanation would be really appreciated.
I took the existing code and removed the parens, getting rid of the subshell issue, if any. Then I added a tee to each part of the pipeline, like in each of these:
pgrep -l -x -z global $PROGNAME | tee n1 | wc -l | tee n2 | awk '{print $1}'
ps -z global -e -f | grep $PROGNAME | grep -v grep | tee n3 | wc -l | tee n4 | awk '{print $1}'
Looking at the files n1, n2, n3, and n4 shows the following:
1) When I remove the first line "#!/usr/dt/bin/dtksh" and just called the executed the script by name from the command line it actually ran, and all ps commands made no matches.
2) When I remove the first line "#!/usr/dt/bin/dtksh" and sourced the script from the command line like so: " . /mnt/cplog/scripts/check_portals" it ran and matched two instances of /usr/dt/bin/dtksh.
3) When I restore the first line "#!/usr/dt/bin/dtksh" and ran the script normally it matched two copies of itself, one the parent of the other:
root 9143 9127 0 23:41:00 pts/1 0:00 /usr/dt/bin/dtksh /mnt/cplog/scripts/check_portals
root 9127 9105 0 23:40:59 pts/1 0:00 /usr/dt/bin/dtksh /mnt/cplog/scripts/check_portals
This was without the subshell-creating expressions "$(...). Why does running a script create these two instances?
I thought it was perhaps because I executed the script from the dtksh shell, and then the script forked and exec'd another dtksh shell to run in, but when I tested that theory by calling the script from tcsh instead of dtksh I got the same thing. Any ideas?
I thought I understood ksh pretty will but now I am confused.
Last edited by Scott; 12-14-2010 at 03:52 PM..
Reason: Please use code tags
Oracle Linux 5.6 64-bit (derivative of RHEL)
Dear Ann Landers,
This is about as bizarre as anything I've ever seen.
I have a little test script I've been working with. When I redirect stdout to a file, no file. Make a copy of the script to another name. Execute it and redirect stdout, and... (4 Replies)
Hi all,
Below is a script I'm writing and giving me error:
#!/usr/bin/sh
if ; then
echo "Success!"
else
echo "Failure!"
fi
Normally if I do ps -ef|grep dw.sap|wc -l it gives me output of 18. So my script checks if it's greater than 17 it echoes success else failure
... (5 Replies)
Ok, so I have been struggling with this for a few days and I think I need an explanation of a few things before I go any further. I'm not sure it's possible to do what I'm trying, so before I pull my hair out, here is what I'm doing:
I have written a program in LiveCode that sits on our... (2 Replies)
I have a C program that, for the life of me, I can't see any possible stack corruption in but for some reason corrupts a local variable to 0 when not referenced. My compiler is gcc 4.3.4.
But my question's not really a code question, it's a compiler question. The glitch is weirdly specific: ... (3 Replies)
Hi all.
I have a really really weird problem that I've been working on for days.
The problem manifested as users cannot connect to our web servers via SSH when they're using our wireless network. Here's where it gets weird:
- Clients from anywhere other than the wireless subnet can... (4 Replies)
A server I host is having very rare glitches where a file the user downloads will have incorrect contents. This almost never happens when I am looking, I caught it once and only once -- a user messaged me saying his antivirus had given him a warning about an image file downloaded from his... (2 Replies)
I'm trying to write a function which opens a file pointer and writes one of the function parameters into the file, however for some reason Im getting a core dump error.
The code is as below
void WriteToFile(char *file_name, char *data)
{
FILE *fptr;
/*disk_name_size is a... (10 Replies)
Is there a way to monitor certain processes and if they hang too long to kill them, but certain scripts which are expected to take a long time to let them go?
Thank you
Richard (4 Replies)
I have a simple script that I want to notify me whenever there are anything other than one instance of a particular process running. I've always used the script:
DPID_DW=$(ps -ef | grep | wc -l)
if
then
echo "The data warehouse manager for DB is down"
elif
then
... (4 Replies)
I'm trying to use the while statement to increment a positive number, with a leading "0". when I pass it through, it seems to come out with a negative value, and all the increments remain negative.
This is what I have:
i=010986294184
j=010986988888
while ; do
echo $i
i=(($i + 1))
done... (8 Replies)