Determine threshold for CPU


 
Thread Tools Search this Thread
Operating Systems AIX Determine threshold for CPU
# 1  
Old 02-24-2017
Determine threshold for CPU

I'm writing an application that should display whether a system is running “fine” (normal activity) or if it has reached a critical level and thus indicate through a graphical interface using a green-yellow-red color scheme. The server machines in question are running AIX (but it shouldn't differ much through various UNIX systems, though important to note it uses POWER). The solution will be applied on both single server machines with 100% (CPU) capacity and clusters which allow utilization of more than 100%.

I'm well aware that threshold like these are most commonly determined through a lot of trial & error and testing but I would like to come to a conclusion as to which would be the most appropriate threshold with some facts to back it up.

Which leads me to the following questions, how do I set these thresholds in a theoretical way? By thresholds I mean for example “should it turn red and alert with a critical warning at 90%, then how come?”, “Why not 85%?”.
There's also possible spikes in the CPU usage, so should it only indicate as critical after 2 minutes of usage above 85%?


My main question is: Are there any algorithms or past works that have done something similar? Any research papers or books that you know of? I've tried to research this a bit without much success, most of what I could find was related to the x86 architecture and not POWER. Even if the two architectures differ a bit, there's also many similarities so some methods may work with them both.
# 2  
Old 02-24-2017
The truth is only the admin responsible on the machine could tell you... It will depend on your knowledge of what and how things run on a machine and so necessarily be not the same on another... furthermore the contentions will differ too, one beeing a memory hog the other a cPU intensive consumer because og laborious calculation algorithms...
A Generalist thing to apply threshold will be only valid on a generalist box...
# 3  
Old 02-24-2017
There are other considerations too, such as is your CPU allocation fixed or variable? It might sound odd, but an LPAR can (configuration choice) use more CPU if it is available on the whole server and other LPARs are fully using (or indeed there is some unallocated). You also need to know if you have a share of processors or whole CPUs allocated. That can really skew the figures too.

You would need to better clarify what you have.

What output do you get from something like vmstat 5 3?



Robin
# 4  
Old 03-16-2017
vmstat measures a certain interval, then you get the average CPU usage from that interval.
That means your check must wait until the interval is finished.
For example
Code:
vmstat 5 2

The second value line is the average from the 5 seconds interval.
(The first value line is the average since the system was booted - not very useful.)
"Normal" thresholds for usr%,system%,iowait% are 75,55,30 for warning and 90,70,40 for critical.
Another measurement is the loadavg, this is the runqueue length. The runqueue gets longer if the scheduler is too busy to run the task according to the schedule.
The advantage of the loadavg is that the system provides the measurement interval; there are even 3 intervals: 1 minute, 5 minutes, 15 minutes.
The command line tool for this is uptime.
In the "infrastructure monitoring" sub-forum I have provided some Nagios-plugin-scripts that work on many platforms. Even if you do not have Nagios, you can see the commands in the code. Actually the check_load5.sh uses uptime and the check_cpu_stats.sh uses vmstat.
# 5  
Old 03-17-2017
Quote:
I'm writing an application that should display whether a system is running “fine” (normal activity) or if it has reached a critical level and thus indicate through a graphical interface using a green-yellow-red color scheme. The server machines in question are running AIX (but it shouldn't differ much through various UNIX systems, though important to note it uses POWER). The solution will be applied on both single server machines with 100% (CPU) capacity and clusters which allow utilization of more than 100%.
What is a machine? In "Openstack" terms - is the machine the host, or the virtual machine?

100% of what? On POWER virtualization - 100% of a processor, or of entitlement (which can get as high as 2000% - yes 2000! although 1000 is the more typical ridiculous number.)

Or are you looking a lcpu percentage: 25% lcpu could mean 100% of all the virtual processors - operating in single-threaded 'scheduling'.

The other thing to be aware of is AIX stats are PURR (processor utilization resource register) - that are processor (hardware) counters, not time-based metrics. A program like vmstat might say 95% user plus 5% system, but it is only 1% of the physical processor (i.e. the physical usage was 1%, and of that 1% 95% user "user%").

So, data-only can be very difficult. For advice you will need advice from someone who knows the expected workload and reasons for "virtual" sizing decisions.

Great ambition - difficult to define the meaning of the variables - as in all things performance - there is a sauce called "it depends" that flavors the numbers you see/observe.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Is it possible to combine multiple CPU to act as a single CPU on the same server?

We have a single threaded application which is restricted by CPU usage even though there are multiple CPUs on the server, hence leading to significant performance issues. Is it possible to merge / combine multiple CPUs at OS level so it appear as a single CPU for the application? (6 Replies)
Discussion started by: Dissa
6 Replies

2. Solaris

Rootvol above threshold

Hi there, Root filesystem is above threshold, I have search and cleared unwanted files which are filling up space. But the root fs is still above threshold. I don't know about veritas volume management. Can anyone show me how to solve this. Du shows /proc is occupying a lot of space. Most of the... (2 Replies)
Discussion started by: sundar63
2 Replies

3. UNIX for Dummies Questions & Answers

threshold

Hi, I have a table with 14 columns. How can I filter the columns 2-14, so that I get only those rows back in which the data values are >= 6 in 5 or more columns. :confused: E.g. A 6 6 3 6 7 8 B 1 2 3 4 5 5 C 2 2 2 6 7 8 Here I should only get back the row A. I would like to work from... (5 Replies)
Discussion started by: danieladna
5 Replies

4. Solaris

How to change CPU threshold high temperature

Hi, I have a NETRA 240 server wich should work on high temperature environment (up to 50 deg celsius). After reaching ~48 deg, the system is shuting down. The HighShutDownThreshold of the CPU is set to 89 deg The PowerOffThreshold of the CPU is set to 96 deg Please help me to change these... (2 Replies)
Discussion started by: Danielz
2 Replies

5. HP-UX

how could I use shell script to determine which CPU structure

how could I use shell script to determine which CPU structure because I found that I compile my program under Itanium base that cannot run on the PA-RISC base but PA-RISC program can run on Itanium base i would like to use shell script to know which CPU structure it is,how could i do thanks (1 Reply)
Discussion started by: alert0919
1 Replies

6. Solaris

Multi CPU Solaris system shows 100% CPU usage.

Hello Friends, On one of my Solaris 10 box, CPU usage shows 100% using "sar", "vmstat". However, it has 4 CPUs and prstat and glance are not showing enough processes to justify high CPU utilization. ========================================================================= $ prstat -a ... (4 Replies)
Discussion started by: mahive
4 Replies

7. UNIX for Dummies Questions & Answers

how to get persistant cpu utilization values per process per cpu in linux (! top,ps)

hi, i want to know cpu utilizatiion per process per cpu..for single processor also if multicore in linux ..to use these values in shell script to kill processes exceeding cpu utilization.ps (pcpu) command does not give exact values..top does not give persistant values..psstat,vmstat..does njot... (3 Replies)
Discussion started by: pankajd
3 Replies

8. Shell Programming and Scripting

apache threshold

Hi folks, how can i check apache threshold values via shell scripting and what factors need to check via shell scripting process or number of users or what. pls do advice me. Thanks, Bash (9 Replies)
Discussion started by: learnbash
9 Replies

9. HP-UX

How to determine cpu&memory percentage usage per user

Using HP-UX v11 Need to monitor cpu and memory usage, total for system and separately for each user in command-line mode. Found out next ways to monitor total cpu usage under hp-ux: 1) vmstat, also shows free memory 2) sar -M ps -eo user,pcpu - does not work, means 'user-defined format'... (4 Replies)
Discussion started by: hp-ux-user
4 Replies
Login or Register to Ask a Question