Question about details for the whole machine


 
Thread Tools Search this Thread
Operating Systems Solaris Question about details for the whole machine
# 1  
Old 03-19-2013
Question about details for the whole machine

Hi folks,

As I continue my self-torture [ >Smilie ], I've come on an interesting issue.
I now have a script that uses top on a Solaris box to gather performance data into a file for use in tracking over all performance.

And it even works 99.99% of the time.

But it glitches eventually and leaves a process still running and burning cpu cycles.

The unix admin who has been "helping me" just smiled as if he expected that and said I should use utilities which are native to Solaris.

Well, what I need from top are:
- the load averages
- the CPU Idle state data
- and the memory
from the displayed results below:
Code:
   last pid: 25033;  load avg:  0.11,  0.77,  0.59;  up 21+07:05:28       16:25:24
   104 processes: 103 sleeping, 1 on cpu
   CPU states: 96.6% idle,  1.5% user,  2.0% kernel,  0.0% iowait,  0.0% swap
   Kernel: 1181 ctxsw, 28 trap, 1072 intr, 1549 syscall, 26 flt
   Memory: 10G phys mem, 4408M free mem, 9083M total swap, 9083M free swap

Now I can get the load averages easy with:
# prstat 1 1 | grep load | awk '{print $8, $9, $10}'

Sadly, prstat only gives most of it's data 'per process' where I need amalgamated data for the entire box.(as top gives)

When I investigated getting the CPU and memory data from vmstat, I could find no method of getting the values for "iowait" or "swap". In my investigations, the man page says there should also be "wa" value which I do not get form my vmstat as shown:
Code:
# vmstat 1 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s3 --   in   sy   cs us sy id
 3 0 0 11893120 5017848 7 46 1  0  0  0  0  0  2 -0  0  934 1029 1080 16  1 82

As for memory, the only values I see in vmstat are Swap and Free. Nothing about Physical memory, free or otherwise.

I investigated using sar, but our customer has it disabled and does not wish to enable it.

I am looking into my options, which are broadening as I look at things like "kstat", which i do not know how to use, and mpstat (using the -a flag). But this also does not get me a complete set of the data that top does.

On the basis that this is just a box following instructions, I am assuming there is another way to get what top does from a Solaris box and I am hoping that someone here can give me some direction in getting it?

I've even looked at the source for vmstat(vmstat.c) in order to see if I can figure out what it does[and no, I do not prog in c at all] and that led me to a bunch of ".h" files I'm exploring as I burn more and more work time.

Can anyone guide me to an answer for solaris?

Thanks and sorry for rambling.

Marc
# 2  
Old 03-19-2013
Quote:
Originally Posted by Marc G
Sadly, prstat only gives most of it's data 'per process' where I need amalgamated data for the entire box.(as top gives)
You might want to use "top -Z" which gives some statistics per zone.
Quote:
When I investigated getting the CPU and memory data from vmstat, I could find no method of getting the values for "iowait" or "swap".
That's no surprise.
- "iowait" ceased to be reported by Solaris many years ago being quite confusing, meaningless and commonly misinterpreted.
- There is no "swap" CPU state so it should always display 0% here.
Quote:
As for memory, the only values I see in vmstat are Swap and Free. Nothing about Physical memory, free or otherwise.
vmstat "free" column is definitely about physical memory. the "swap" column is not that much related to what top reports. "swap" means here available virtual memory. What top reports is the swap area usage which you can get with "swap -l" on Solaris.
Quote:
I am looking into my options, which are broadening as I look at things like "kstat", which i do not know how to use
Most of the statistic gathering commands (eg. vmstat, iostat, mpstat, netstat, ...) are using the kstat interface to get part or all of their input data. The kstat command allows to get the low level data from which they build a more readable representation.

Quote:
But this also does not get me a complete set of the data that top does.
No command will. "top" is gathering data from different sources (mostly kstat and /proc) and consolidating them its way.

Quote:
On the basis that this is just a box following instructions, I am assuming there is another way to get what top does from a Solaris box and I am hoping that someone here can give me some direction in getting it?
That's a wrong assumption if you expect a single alternative tool that provides the same set of statistics. If you want top just use it and fix whatever doesn't work in the way you call it. Otherwise, you'll have to aggregate data from different commands or process kstat output, if you are not interested in process specific information.
These 2 Users Gave Thanks to jlliagre For This Post:
# 3  
Old 03-20-2013
Hi jlliagre!

First, thanks for responding!

So taking from your response,
The "iowait" and "swap" CPU state are items I should not worry about

I can use the vmstat "free" column for "Free" physical memory, but how do I test
for the total Physical memory?

I have checked and like the results from "swap -l". Thank you!

In the end, I don't want one single utility to do this. Or rather, once I understand how to get what I want I can build my script which will be the "one single utility" I use and can share.

Ultimately, the output needs to be:
Box Name, Mar-20-13,10:45:05,97.5%,10G,4473M,9083M,9083M,0.06, 0.05, 0.05
Where:
97.5% = CPU Idle [ This I still need * See below* ]
10G = Total phys memory [ This I can get from "vmstat 1 1" per your reply ]
4473M = Phys Memory in use [ This I still need ]
9083M = total Swap [ This and the next I can get from "swap -l" per your reply ]
9083M = Swap in us (this is a lab machine for design)
0.06, 0.05, 0.05 is the cpu load [This I can get from "prstat 1 1"]

Regarding the CPU Idle,
Top shows "97.5%"
where vmstat shows "83", being 83 percent.

Is this just an artifact of top not being accurate in a Solaris OS or is there something I am missing?

And on the "Phys Memory in use", how do I get that?
I am researching this, but came here when I found myself trying to read up on vmstat, prstat, kstat and several other things all at the same time. If I had one path, I could walk it myself. But I need direction on which to use or I'll be balancing on a toe each as I walk many paths rather than walking one with both feet

Thanks for your help.

Marc

---------- Post updated at 12:01 PM ---------- Previous update was at 11:06 AM ----------

An additional discovery I've made is that:
vmstat 1 1
gives a "free" of 5001808
where
kstat -n system_pages | grep availrmem
gives a "availrmem" of 1207874

So as I continue to research, I am finding that each of the roads I am walking give answers which appear similar but are vastly different

Marc

---------- Post updated at 04:43 PM ---------- Previous update was at 12:01 PM ----------

With help from you folks and my own research I found:

/usr/sbin/prtdiag | grep "Memory size" | awk '{print $3}'
gets me Phys memory size

/usr/bin/vmstat 1 1 | grep -v free | grep -v faults | awk '{print $5}'
gets me Phys Memory free

prstat 1 1 | grep load | awk '{print $8, $9, $10}'
gets: me the load averages

/usr/sbin/./swap -l
gets:me the swap size and free

/usr/bin/vmstat 1 1 | grep -v swap | awk '{print $22}'| sed '/^$/d'
gets me the CPU Idle

Thanks for the guidance!!!!

Marc
# 4  
Old 03-20-2013
Quote:
Originally Posted by Marc G
Regarding the CPU Idle,
Top shows "97.5%"
where vmstat shows "83", being 83 percent.

Is this just an artifact of top not being accurate in a Solaris OS or is there something I am missing?
You are missing the first set of statistics reported by most of the *stat commands is an average since last boot.

Instead of "vmstat 1 1", you should run "vmstat 1 2" and use the last line values.
Quote:
An additional discovery I've made is that:
vmstat 1 1
gives a "free" of 5001808
where
kstat -n system_pages | grep availrmem
gives a "availrmem" of 1207874

So as I continue to research, I am finding that each of the roads I am walking give answers which appear similar but are vastly different
The facts they are about different periods of time and reported with different units (KB vs 4 or 8 KB pages depending on the architecture) is something worth considering Smilie
# 5  
Old 03-21-2013
1. get a new top program that does not get stuck
2. if it still goes stuck, call it through a timeout wrapper
Code:
perl -e "alarm 3600; exec @ARGV" top ...

This one will kill top after 1 hour.
# 6  
Old 03-25-2013
Quote:
Originally Posted by jlliagre
You are missing the first set of statistics reported by most of the *stat commands is an average since last boot.

Instead of "vmstat 1 1", you should run "vmstat 1 2" and use the last line values.
The facts they are about different periods of time and reported with different units (KB vs 4 or 8 KB pages depending on the architecture) is something worth considering Smilie
ok....
I've been reading here and also doing a lot of research

As jlliagre says,

if I use the command:
Code:
"vmstat 1 2 | cut -b77-78 | grep -v id"

I get the following output:

Code:
85
99

now "85" is based on the period "since the device was booted"
so "99" is what I want

So how to I only grab the second value?

I tried a test like this:
Code:
   set data=`vmstat 1 2 | cut -b77-78 | grep -v id`
   echo $data

But that only hung at the prompt until the process ran, and gave me no data.

Thanks for your continuing help.

Marc

---------- Post updated at 02:52 PM ---------- Previous update was at 02:50 PM ----------

Quote:
Originally Posted by MadeInGermany
1. get a new top program that does not get stuck
2. if it still goes stuck, call it through a timeout wrapper
Code:
perl -e "alarm 3600; exec @ARGV" top ...

This one will kill top after 1 hour.
Actually, It continued to get stuck so I put a command after the top to determine if top was running and kill it.

That failed to resolve the issue to the sys admin's satisfaction.

---------- Post updated at 02:52 PM ---------- Previous update was at 02:52 PM ----------

Quote:
Originally Posted by MadeInGermany
1. get a new top program that does not get stuck
2. if it still goes stuck, call it through a timeout wrapper
Code:
perl -e "alarm 3600; exec @ARGV" top ...

This one will kill top after 1 hour.
Actually, It continued to get stuck so I put a command after the top to determine if top was running and kill it.

That failed to resolve the issue to the sys admin's satisfaction.

---------- Post updated at 03:53 PM ---------- Previous update was at 02:52 PM ----------

additionally...

When I run the following from the command line:
I get:
Code:
bash-3.2# vmstat 1 2 | grep -v id | cut -b77-78

85
98
bash-3.2#

Note, the blank line is part of the output

but when I create a script: "test.sh" containing:
Code:
bash-3.2# cat test.sh
/usr/bin/ksh

vmstat 1 2 | grep -v id | cut -b77-78

and I run that, I get:
Code:
bash-3.2# ./test.sh
#

I then have to type "exit" to allow the script to complete execution as follows:
Code:
bash-3.2# ./test.sh
# exit

85
98
bash-3.2#

So the more I am trying the more I am both learning and digging my wheels in deeper. Smilie

I'm hoping someone can help me out.

Marc

Last edited by Marc G; 03-25-2013 at 05:01 PM..
# 7  
Old 03-25-2013
Quote:
Originally Posted by Marc G
if I use the command:
Code:
"vmstat 1 2 | cut -b77-78 | grep -v id"

I get the following output:

Code:
85
99

now "85" is based on the period "since the device was booted"
so "99" is what I want

So how to I only grab the second value?
Assuming you want the last column (idle cpu), here is one way to get it:

Code:
vmstat 1 2 | nawk 'NR==4{print $NF}'

That would be quite inefficient though to only extract a single value while I understand you want several columns.
This User Gave Thanks to jlliagre For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Red Hat

iptables applied in local machine, can't ssh remote machine after chain changed to DROP

I want to SSH to 192.168.1.15 Server from my machine, my ip was 192.168.1.99 Source Destination was UP, with IP 192.168.1.15. This is LAN Network there are 30 Machine's Connected to the network and working fine, I'm Playing around the local machine's because I need to apply the same rules in... (2 Replies)
Discussion started by: babinlonston
2 Replies

2. Linux

Unable to connect to Server machine from a client machine using ftp service

Hi, Could you please help me with the below issue.. I'm running RHEL6 OS on both server (192.168.0.10) and client machines (192.168.0.1). I'm trying to connect to server from the client machine using ftp service. I have installed vsftpd daemon on both the machines. I'm getting... (4 Replies)
Discussion started by: raosr020
4 Replies

3. UNIX for Advanced & Expert Users

FTP While transfering files to local machine to remote machine

Hi Am using unix Ksh Am getting the problem while transferring zero size files through the script . When i transfer zero size files from local machine to remote machine manually i can able to do it . My question its beause of zero size files am not able to transfer through script ? or its... (2 Replies)
Discussion started by: Venkatesh1
2 Replies

4. UNIX for Dummies Questions & Answers

at -l doesnt give details of the scheduled job. How to get the details?

I have scheduled couple of shell scripts to run using 'at' command. The o/p of at -l is: $ at -l 1320904800.a Thu Nov 10 01:00:00 2011 1320894000.a Wed Nov 9 22:00:00 2011 1320876000.a Wed Nov 9 17:00:00 2011 $ uname -a SunOS dc2prcrptetl2 5.9 Generic_122300-54 sun4u sparc... (2 Replies)
Discussion started by: superparticle
2 Replies

5. Windows & DOS: Issues & Discussions

How to know machine details in windows through cmd

Hi, I want to print machine details in windows. In UNIX it is easily available by the command uname -X Is there any command like this in windows to get those details.. can anybody help me on this.. Thanks in advance. ~arup (1 Reply)
Discussion started by: arup1980
1 Replies

6. Shell Programming and Scripting

How to transfer files from unix machine to local machine using shell script?

Hi All.. Am new to Unix!! Am creating a shell script in which a scenario is like i have transfer the output file from unix machine (Server) to local directory (Windows xp). And also i have to transfer the input file from the local directory to Unix machine (Server) Any help from you... (1 Reply)
Discussion started by: vidhyaS
1 Replies

7. Shell Programming and Scripting

shell script to copy files frm a linux machine to a windows machine using SCP

I need a shell script to copy files frm a linux machine to a windows machine using SCP. The files keeps changing day-to-day. I have to copy the latest file to the windows machine frm the linux machine. for example :In Linux, On July 20, the file name will be 20.txt and it should be copied to... (3 Replies)
Discussion started by: nithin6034
3 Replies

8. Red Hat

To find the LATEST file from a dir on REMOTE machine and SCP to local machine?

Hi All, URGENT - Please help me form a scipt for this: I need the LATEST file from a dir on REMOTE machine to be SCP'd to a dir on local machine. (and I need to execute this from local server) I know that the below cmd is used to find the LATEST file from a dir. But this command is not... (3 Replies)
Discussion started by: me_ub
3 Replies

9. SCO

How to check memory details of a SCO UNIXWARE machine

Hi All, I want to check memory details and other hardware details of my SCO machine. can someone please share the command to do that? Thanks, Am (2 Replies)
Discussion started by: am_yadav
2 Replies
Login or Register to Ask a Question