Folks I suck a lot of things and performance issue is one of them.
After upgrading from 5300-06-03 to 5300-12-04 we started seeing an issue with some runaway processes. It varies as some of these processes have a TTY accociated with them and some do not. If you could give me any idea of what to look for it would be most appreciated. I did contact IBM support and provided a perfpmr but all I have so far is your machine is CPU bound and here are the top processes contact your applications people. Well I kind of already knew that. Just expecting some guidance on what might have changed to cause it.
"ps aux" yeilds:
Looking at specifics about the processes being run by this user.
There have been no new processes spun off by the main initmenu.42r process since yesterday at 13:39. I am thinking the user still has the session open on his/her computer but what is it doing to use that much CPU.
The second scenario with no TTY associated with the process looks like this.
Thanks for any guidance you could provide or should I say will provide.
If I am missing some data you might need please let me know.
IBM can't tell more than what they see in the snaps etc. from the OS gathered by perfpmr. Since they didn't see anything strange, they ask for the application as they can't have a clue how the application works, that's right.
Are you sure it didn't look the same before the upgrade? Question might sound stupid, but just to make sure.
High C shows them currently active but the percentage is average since start so can you talk with any of the users and ask what they are doing or if they are doing different things than usual (at least jufackle?). Sometimes there is a perdiodic run of other tasks because of business things like gathering end-of-the-month statistics or whatever could be the reason to produce a peak - you'd better know or maybe your users than I.
Is the box being pressed against the wall vmstat-wise?
Killing or stopping the application by one user and starting anew will have the same effect that C rises up that high immediately?
You could check (awful work) what enhancements or fixes the difference between 5300-06-03 and 5300-12-04 has brought.
Sorry to have no better idea at the moment to help you.
Do you have nmon-monitoring up maybe to check pre-update data with current cpu/process wise? If not it could be helpful in the future.
As you have no assure baseline performance records from the previous AIX TL level It will be no easy task to determine if is the application the cause or the OS.
For now just collect performance statistics in various time intervals and do some comparison between statistics gathered all day.
I apologize for the delay in responding and want to thank you both for responding. I am positive it did not look the same before the upgrade. I actually rolled a server back to AIX 5.3TL06 so I would have something to compare against. All is well on that server and it's a full time job keeping the runaway processed killed on the other 7 servers so it does not crash or become unresponsive.
Unless I am reading the vmstat output wrong yes the box is being pressed pretty good. I played with the headers a bit trying to line up the colmuns for easier reading.
I provided IBM support with perfpmr data and it took them a bit but came back with a possible bug. After getting a core dump of the process it was confirmed that there is an APAR in the works from a previous PMR. Below is the APAR discription. This matches up with the report they sent me from the perfpmr data.
A SIGHUP'D PROCESS HANGS, REPEATEDLY CALLING PTHREAD_YIELD
An ifix is currently in the works. I just hope and pray this is the issue.
Not sure it applied to anyone but can update when it's applied if that is preferred.
Thanks for the feedback - indeed the vmstat looks bad CPU wise and has lot's of unused memory.
Btw. you can use vmstat's switch -w to have the columns aligned. When you even add -t you'll get a time stamp (sometimes helpful).
We are still on 5300-11-04-1015 so I can't tell of any bad experience with your level of updates.
Glad to hear they found something and usually they are fast with responses for hotfixes once you... persuaded them to have a look again ^^ (At least my experience too way back with some other IBM software).
Hi All !
I am just trying to print bash variable in awk statement as string
here is my script
n=1
for file in `ls *.tk |sort -t"-" -k2n,2`; do
ak=`(awk 'FNR=='$n'{print $0}' res.dat)`
awk '{print "'$ak'",$0}' OFS="\t" $file
n=$((n+1))
unset ak
doneI am getting following error
awk:... (7 Replies)
I had issues with processes locking up. This script checks for processes and kills them if they are older than a certain time.
Its uses some functions you'll need to define or remove, like slog() which I use for logging, and is_running() which checks if this script is already running so you can... (0 Replies)
Database.txt
John:30:40
echo -n "New Title Please :"
read NewTitle
awk -F":" 'OFS = ":"{ $1 = "'$NewTitle'" ; print $0 } ' Database.txt> Database2.txt
mv Database2.txt Database.txt
what this does, is that when i input something into $NewTitle, it will update $1 which is "John" into... (3 Replies)
Hello all,
My hosting provider has contacted me in order to notify about a runaway process issue. Here it is:
They have given me a list of those processes but I can neither analyze nor understand what I should do.
DATE
Fri Nov 21 21:32:29 GMT 2008
SINFO
hostname:... (2 Replies)
Hi
Is there an easy way to identify and group currently running processes into OS processes and APP processes. Not all applications are installed as packages.
Any free tools or scripts to do this?
Many thanks. (2 Replies)
I have written a program to demonstrate a problem I have encountered when using BSD style asynchronous input using the O_ASYNC flag in conjunction with a real time interval timer sending regular SIGALRM signals to the program. The SIGIO handler obeys all safe practices, using only an atomic update... (8 Replies)
Is there a way to monitor certain processes and if they hang too long to kill them, but certain scripts which are expected to take a long time to let them go?
Thank you
Richard (4 Replies)
I got about more than 300 emails from root with the subject "Runaway processes killed" saying that "13146 12737 97.7 6 bash" . So what should I do? Any help would be appreciate (2 Replies)
not too long ago, i wrote a very short script that will bring up 4 customized xterms. The script went completely abnormal simply because of an error I had made in a while loop. This script took control of the system and rendered everything useless. The system admin team which i was part of... (4 Replies)