I am having a problem on an AIX server running a WebSphere MQ instance. The problem is that sometimes it seems to reach process limit, but I do not find the processes themselves.
What I see: succeed to log in (as root from console os as nonpriviliged user via ssh). Trying to run almost any command results in a message "Killed.", even a simple "ls" command. However "ps -ef" command is able to run. MQ monitoring scripts gets killed. vmstat cannot run, but lsps shows no paging activity. thus there should be enough memory. Also topas is able to run, showing very little CPU activity.
This "Killed." thing - by my experience - used to be the result of reaching maxuproc limit, but the maxuproc is set to 4096 and the ps -ef shows only ~90 processes. However. when I raise the maxuproc parameter, everything works fine again.
Well, my question is: how to monitor if I am reaching maxuproc limit? Or: where are the processes which are not listed by ps command.
If you are hitting a limit on the number of processes you're running, ps may be exempt from the limit because it usually runs set-UID root.
If you're running ps -ef | grep "<username>" or some other pipeline, even though ps might be exempt from the limit, the pipeline is not exempt and the output you're seeing could be truncated if the grep is killed due to the process limit.
Are you seeing this problem consistently? Or does it vary with time of day, or at times when cron or at jobs might be expected to be running? You say you're seeing about 90 processes running. Are they all things that you expect to be running? Are any of them things that hang around running for a while and then kick off a bunch of other processes to perform certain tasks when certain conditions arise?
Could network traffic be kicking off jobs that are being run by processes running under your account?
Do you have a bunch of MQ monitoring scripts running in the background? What are they doing? How many of them are there?
Obviously, with no access to your system, we can only make wild guesses. I agree that it sounds like you're running enough processes that AIX isn't letting you start any more until one or more of the jobs that are running terminate, but that doesn't help much if we don't know what is running and why it is running.
Is process accounting enabled on your system? Can you sysadmin help you track down what jobs you're running during times when your processes are being killed?
Well, my question is: how to monitor if I am reaching maxuproc limit? Or: where are the processes which are not listed by ps command.
I can't tell you where your processes are but i can tell you how to find out all user properties (including, but not limited, to maxuproc):
The output is in "attribute=value" format, separated by blanks. You can also use the -f switch to get stanza format or -c to get colon-separated format. You need to do it as root to get all attributes, if you do it as user you only get a small subset.
Thanks for the replies. Well, trying to be more specific.
There is an MQ server running on the host, running ps -ef at any time shows about 90 lines of output. This is quite normal, including the processes belonging to AIX itself, the MQ server and the monitoring scripts (5 maximum at any given moment).
This morning I found that the output of ps -ef shows just the same amount of processes as it usually does. Most of them remain live for an extended period, thus every app that succeeded to connect earlyer, is able to use the service. New connections cannot be created - new connections in this configuration implies new processes to handle a client.
Also I am unable to run any command that is not setuid root.
Now, raising the maxuproc value from 4096 to 5000 seems to solve the problem. Well, there is not a single user in the system trying to run 4000 processes, as I see 90 processes altogether. Why?
Couple of hours later the problem is showing up again the same way. Raising the maxuproc again solves the problem. Well, seems solving. Something is accumulating in the background and I do not see what that might be. So, when I run into this maxuproc problem, and maxuproc is set to 4096, then I would like to see thet something is really 4096. What kind of objects are counted? entries in process table? Threads? Or what else.
Well, I know how to list user parameters .
The relevant parameters of the relevant user are:
Well, yes, maybe I was on a wrong track and the limit was not the number of processes, but some other limit. In this case my question is, why did the raise of maxuproc suppress the problem?
--Trifo
Last edited by jim mcnamara; 04-01-2019 at 10:55 AM..
On AIX, I would expect the count to just be the number of processes in the process table. (On a Linux system, it could easily be the number of threads.)
Note that if one of your processes forks and execs other processes and doesn't reap them when they die you could easily get a condition like this, but you should see zombie processes in the process table in this case. (Note that a zombie process is a process that was running and has died. The process table slot is still consumed by the process even though all of its other resources have been freed because the process slot can't be released until its parent reaps its exit status with a call like wait(), waitid(), or waitpid().) But, zombies should show up in ps -ef output.
I suppose it is possible that you have a process that is creating threads and not waiting for them to finish (i.e., calling thread_join() to free up the thread ID). I don't know if AIX would kill processes that can't get a new thread ID due to unreaped threads, but it seems plausible. On AIX, threads would not show up in ps -ef output.
Maybe bakunin can suggest a way to determine thread limits on AIX and a way to look for zombie threads?
Well, zombie processes - if there would be any - would show up in ps -ef output as "defunct". This time there were none.
Threads in AIX can be listed using ps -efo THREAD but counting all the threads resulted in ~500 entries, which is far less than the value of maxuproc.
Let's see the problem from another aspect:
- monitoring shows that monitoring scripts are unable to finish
- logging in as nonprivileged user succeed, but running most command results "Killed." message
- the host seems to have plenty of free memory and CPU resources.
- no messages in errpt
Well, what would you do as problem determination?
Before your last post, I thought you were saying that one (non-root) user was having problems. Do you mean that all non-root users are having jobs killed by the system?
Does AIX have a fixed process table size? If so, what fixed process table size is currently configured and how many processes does ps -ef show running for all users?
Hi Guys,
I am running RHEL6 and now my processes reach maximum limit.
How do I increase the maxuproc value?
Can I increase the value without rebooting the server?
Thanks in advance...
Please Help!!! (5 Replies)
// AIX 6.1 & Power 7 server
I have maxuproc set to 16384.
lsattr -El sys0 -a maxuproc
maxuproc 16384 Maximum number of PROCESSES allowed per user True
What is the maximum number of maxuproc we can go for?
If I increase maxuproc to the higher number, what would be ramifications?
I... (1 Reply)
Morning,
Somebody can tell me in AIX 6.1 what is the different between the maxuproc (lsattr -El sys0 | grep max) and the for a user.
Example:
Oracle is limited by :
#ulimit -u
processes(per user) unlimited
But lsattr -El sys0| grep maxuproc show me :
maxuproc 16384
So... (1 Reply)
Hi,
Is there a maximum number of processes kernel parameter in AIX.
Solaris has max_nprocs,
HP-UX has nproc,
I can only find max user process (maxuproc) for AIX.
Thanks,
Wilson. (3 Replies)
Hey I'm new in linux,
I'm looking for a code to check whether the parameter is a number or a string.
I have already tried this code:
eerste=$(echo $1 | grep "^*$">aux)
if
But it doesn't work.:confused:
Thanks (2 Replies)
Hi
Is there a way to count how many processes a script has started, count how many of these have finished, and make the script wait if their difference goes over a given threshold?
I am using a script to repeatedly execute a code (~100x) which converts 2 data files into one .plt which is in... (4 Replies)
I need a mechanism to fork child processes and all child processes should connect to a server.but the number of child processes should be limited(for ex:50)
Here's my pseudo, but I cant figure out how to limit the child process number. Should I use a semaphore? or what?
for(;;)... (3 Replies)
I have written a function that fills an array and another function where if a parameter is supplied it will jump to that part of the array and cat it to the screen.
I need to put in some checks to make sure the parameter supplied is firstly a number and then not a number great than the length of... (2 Replies)
Hi ,
I need to count all processes contains the pattren "FND"
For Example:
I was reteriving the details of all processes related to "FND" by this command
$ ps -ef | grep FND
but now I just wanna count them .
Regards
Adel (2 Replies)