Solaris 10 10/09 s10s_u8wos_08a SPARC 16cpus 128MB, uptime 150+ days,
2 db zones (Oracle 9 & 10), 3 application zones.
This is from a system that was literally crawling, 60 seconds to execute a
single command. I had to reboot to clear it. Data is from runs of
prstat and top, and iostat. The system is fine after the reboot.
Most of the waits were for oracle remote user processes in a
single db zone.
I ran dtrace and mdb to find cpu issues and file locks, found very few.
We lost a SAN controller (for a Windows fileserver SAN absolutely
not attached at all to this box) and this occurred as well - several hours
later.
Note: cpu is not occupied actually occupied but the load averages
are absurd. Context switches were low, less than 100/sec, per dtrace.
iostat shows two disks with excessively high svc_t times, but not that
much transfer of data.
Low priority processes are often in waits, this is normal.
I have historical sar data, sarcheck does not see any problems other than
ssd18 and ssd27 have excessive waits.
I had to reboot so this is what I now have to work with....
Any ideas? What would cause this:
ssdnn devices are SAN Luns
Thanks for any comments.
Last edited by jim mcnamara; 02-25-2014 at 04:57 PM..
Well in situations like this (reboot performed) one can only offer suggestions from experience.
With uptime at +150, multiple zones and multiple Oracle instances I would be looking at two things.
1. Check the content of /tmp directories on all zones to see if one of them has five million files in it. If so, do we know why? Cleaning them up often clears the issue. If this is the problem (an O/S problem) then I would expect the problem to recur in the short term.
2. What is the setting of the parameter "pg_contig_disable" in the /etc/system files? On a long running uptime and Oracle instances, memory can become very fragmented and if Oracle dB requests contiguous memory then the system virtually hangs whilst working sets are shuffled to give Oracle what it wants. The cure is either to increase memory size or allow Oracle to use non-contiguous memory. If this is the problem (an Oracle problem) then I would expect the problem not to recur in the short term.
This really isn't very helpful I know, just thinking aloud.
Thanks!
@jlliagre - system was rebooted and the problem cleared. Back then prstat -Z did not show any one zone using cpu resources. Nobody had cpu. as you saw sys % time was low, too. So the kernel was not thrashing AFAIK.
/tmp gets cleaned up monthly, so maybe 200 files were out there.
@hicksd8 - pg_contig_disable = 0. I think this may have precipitated the problem. OTN has some similar information, we knew about it but decided against setting it. We rebooted, it is now set to 1. We also forced mgt to acquiesce to a periodic off-time reboot. We now are allowed reboots on the weekend. The whole thing is political, no technical person is allowed input in decisions like this until something goes South.
Hello,
I have noticed some unusual behavior while running the script.
when i use below script it gives output 355.23
#!/bin/bash
ONEDAY=`date +%Y%m%d --date="1 days ago"`
cat /opt/occ/var/performance/counters_`date -d "1 day ago" +%Y%m%d`*|grep "Gy,Gy-Gy-CCR"|awk -F"," '{print... (5 Replies)
I have made a simple script to zip a file then first copy it to a specific directory using cp command then move it to another directory. Files are getting generated at regular intervals in the dir. /one/two/three/four/. I have entry of my script in cron to run after every 2 min.
#!/bin/sh... (9 Replies)
# echo "size(JFJF" | awk -F"size(" '{print $1}'
awk: fatal: Unmatched ( or \(: /size(/
the delimiter is "size(" but i'm not sure if awk is the best tool to use to specify it.
i have tried:
# echo "size(JFJF" | awk -F"size\(" '{print $1}'
awk: warning: escape sequence `\(' treated as... (1 Reply)
Our comp-operator has come across a peculiar ‘feature'. We have this directory where we save all the reports that were generated for a particular department for only one calendar year. Currently there are 45,869 files. When the operator tried to backup that drive it started to print a flie-listing... (3 Replies)
what is wrong with the below script:
---------------------------------------------------------------------------------
#!/bin/bash
echo "Setting JrePath..."
grep -w "export JrePath" /etc/profile
Export_Status=$?
if
echo "JrePath declared"
elif
echo "JrePath not declared"
echo... (4 Replies)
is there anyway to make while run a command faster than per second?
timed=60
while
do
command
sleep 1
done
i need something that can run a script for me more than one time in one second. can someone help me out here? (3 Replies)
I'm wrting a program which needs to get the following information of a sever by calling some lib fuctions or system calls, so can anybody help to tell me those function names or where I can find the description of them ?
CPU usage
Memory usage
Load procs per min
Swap usage
Page I/O
... (11 Replies)
I'm wrting a program which needs to get the following information of a sever by calling some lib fuctions or system calls, so can anybody help to tell me those function names or where I can find the description of them ?
CPU usage
Memory usage
Load procs per min
Swap usage
Page I/O
Net I/O... (1 Reply)
Hi everyone,
I was doing some practising with Unix and accidentally created a file with the name --------------------
Yeah, it was UNINTENTIONALLY. I tried removing it various ways like
rm '--------------'
rm '-.*'
and all other sorts, but Unix keeps detecting that as an option stuff...
... (2 Replies)