Sponsored Content
Full Discussion: Unusual system bog down
Operating Systems Solaris Unusual system bog down Post 302890188 by jim mcnamara on Tuesday 25th of February 2014 03:41:00 PM
Old 02-25-2014
Unusual system bog down

Solaris 10 10/09 s10s_u8wos_08a SPARC 16cpus 128MB, uptime 150+ days,
2 db zones (Oracle 9 & 10), 3 application zones.

This is from a system that was literally crawling, 60 seconds to execute a
single command. I had to reboot to clear it. Data is from runs of
prstat and top, and iostat. The system is fine after the reboot.

Most of the waits were for oracle remote user processes in a
single db zone.

I ran dtrace and mdb to find cpu issues and file locks, found very few.
We lost a SAN controller (for a Windows fileserver SAN absolutely
not attached at all to this box) and this occurred as well - several hours
later.

Note: cpu is not occupied actually occupied but the load averages
are absurd. Context switches were low, less than 100/sec, per dtrace.

iostat shows two disks with excessively high svc_t times, but not that
much transfer of data.

Low priority processes are often in waits, this is normal.
I have historical sar data, sarcheck does not see any problems other than
ssd18 and ssd27 have excessive waits.

I had to reboot so this is what I now have to work with....

Any ideas? What would cause this:
Code:
PRSTAT
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
 20125 oracle   3772M 3769M wait    59    0   0:00:32 0.1% oracle/1
 18435 oracle   3762M 3759M wait    59    0   0:13:35 0.1% oracle/1
 18430 appworx    50M   47M sleep   59    0   0:06:27 0.1% uzpplpl/1
  7264 oracle   3781M 3764M wait     1    0   0:07:45 0.1% oracle/11
 12839 oracle   2551M 2535M wait    47    0   0:03:52 0.1% oracle/11
 16458 root     7688K 4864K cpu10   59    0   0:00:00 0.0% prstat/1
 18337 oracle   3762M 3759M sleep    1    0   0:04:54 0.0% oracle/1
 25080 vssrt     170M  157M sleep   59    2   0:00:44 0.0% MrepApp/1
 13886 oracle   2566M 2535M wait    38    0   0:00:06 0.0% oracle/1
 25011 oracle   3772M 3769M wait     1    0   0:00:30 0.0% oracle/1
 18334 appworx    15M   12M sleep   59    0   0:03:44 0.0% uapsogn/1
  7480 oracle   2584M 2554M wait    59    0   0:07:52 0.0% oracle/11
  7470 oracle   2584M 2556M wait    59    0   0:07:51 0.0% oracle/11
  5488 oracle   3772M 3769M wait    55    0   0:00:22 0.0% oracle/1
  8591 oracle   3762M 3759M wait    59    0   0:00:00 0.0% oracle/1
 23924 vssrt     206M  193M wait     1    2   0:04:24 0.0% DrepApp/1
 25129 oracle   3768M 3765M wait    59    0   0:00:02 0.0% oracle/1
 12857 oracle   2551M 2534M wait     1    0   0:03:53 0.0% oracle/11
  3803 oracle   3777M 3773M wait     1    0   0:00:11 0.0% oracle/15
  3751 oracle   3772M 3769M wait     1    0   0:00:28 0.0% oracle/1
 26066 oracle   2550M 2534M wait    21    0   0:06:54 0.0% oracle/1
 20904 oracle   3768M 3765M wait     1    0   0:00:05 0.0% oracle/1
  7464 oracle   2549M 2532M wait     1    0   0:06:42 0.0% oracle/1
  7266 oracle   3781M 3764M wait     1    0   0:04:45 0.0% oracle/11
  7256 oracle   3769M 3752M wait     1    0   0:06:39 0.0% oracle/1
 23930 oracle   2554M 2538M wait    59    0   0:03:07 0.0% oracle/11
 19553 oracle   3772M 3769M wait    59    0   0:00:10 0.0% oracle/1
  4058 oracle   3768M 3765M wait    60    0   0:00:14 0.0% oracle/1
 14899 oracle   3768M 3765M wait    59    0   0:00:05 0.0% oracle/1
  8670 oracle   2554M 2537M wait    58    0   0:01:35 0.0% oracle/11
 25086 oracle   2553M 2537M wait    59    0   0:00:29 0.0% oracle/11
 15891 oracle   3762M 3758M wait    57    0   0:00:00 0.0% oracle/1
 17399 oracle   3772M 3769M wait    59    0   0:00:19 0.0% oracle/1
 18260 oracle   3772M 3769M wait    59    0   0:02:05 0.0% oracle/1
  4805 oracle   3772M 3769M wait    60    0   0:00:04 0.0% oracle/1
 23116 oracle   3772M 3769M wait     1    0   0:00:14 0.0% oracle/1
 15228 oracle   3765M 3749M cpu11   59    0   0:04:44 0.0% oracle/1
  4946 oracle   3772M 3769M sleep    1    0   0:00:34 0.0% oracle/1
 29429 oracle   3772M 3769M sleep   55    0   0:00:11 0.0% oracle/1
 12875 oracle   2551M 2534M sleep   59    0   0:04:21 0.0% oracle/11
 12632 oracle   2552M 2535M sleep    1    0   0:02:30 0.0% oracle/14
 12594 oracle   2549M 2532M sleep   59    0   0:02:11 0.0% oracle/1
 11515 vssrt     196M  180M wait     1    0   0:01:57 0.0% TbApp/1
 21481 vssrt      76M   62M wait     1    2   0:01:37 0.0% BmanApp/1
 24837 vssrt     178M  165M sleep   59    2   0:01:13 0.0% MrepApp/1
 20360 oracle   3772M 3769M wait     1    0   0:00:22 0.0% oracle/1
 21726 oracle   3777M 3773M wait    57    0   0:00:34 0.0% oracle/11
Total: 1425 processes, 8621 lwps, load averages: 142.80, 134.91, 144.84

top
last pid: 18794;  load avg: 144.64,  133.78,  144.80;  up 154+00:35:28 12:16:18
1425 processes: 601 waiting, 801 sleeping, 3 on cpu                                                                          
CPU states: 95.6% idle,  3.0% user,  1.4% kernel,  0.0% iowait,  0.0% swap
Memory: 128G phys mem, 78G free mem, 32G total swap, 32G free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 25326 oracle     1  59    0 3768M 3765M wait     0:10  0.42% oracle
 12632 oracle    14  59    0 2552M 2535M wait     2:31  0.25% oracle
 18435 oracle     1  59    0 3762M 3759M wait     3:47  0.15% oracle
 23924 vssrt      1   1    2  206M  193M wait     4:26  0.12% DrepApp
 18260 oracle     1  59    0 3772M 3769M wait     2:06  0.12% oracle
  7264 oracle    11   1    0 3781M 3764M wait     7:47  0.11% oracle
 18337 oracle     1   1    0 3762M 3759M wait     4:56  0.10% oracle
  8670 oracle    11  58    0 2554M 2537M wait     1:35  0.09% oracle
 25011 oracle     1   1    0 3772M 3769M wait     0:31  0.08% oracle
 23930 oracle    11  59    0 2554M 2538M wait     3:09  0.08% oracle
  8674 oracle    11  51    0 3770M 3753M wait     1:16  0.08% oracle
 13886 oracle     1  38    0 2564M 2535M wait     0:08  0.08% oracle
 18783 oracle     1  59    0 3762M 3758M wait     0:00  0.08% oracle
  7262 oracle     1   1    0 3960M 3943M wait     4:23  0.08% oracle
 18430 appworx    1  59    0   50M   47M sleep    6:30  0.08% uzpplpl

ssdnn devices are SAN Luns
Code:
 iostat -xm
 device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b 
 sd0       0.4    0.5   19.1    2.0  0.0  0.0   25.2   0   0 
 sd1       0.4    0.7   19.1    2.1  0.0  0.0   24.0   0   1 
 sd2       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 ssd0      0.9    0.3   36.5    1.5  0.0  0.0    3.8   0   0 
 ssd1      1.0    0.3   38.9    1.6  0.0  0.0    4.0   0   0 
 ssd2      1.3    0.3   44.6    5.8  0.0  0.0    3.4   0   0 
 ssd3      0.9    0.3   37.6    2.3  0.0  0.0    3.7   0   0 
 ssd5     88.0   27.0 3181.2  311.0  0.0  0.3    2.7   0   8 
 ssd7      0.0    0.0    0.0    0.0  0.0  0.0    0.9   0   0 
 ssd8      0.1    0.0    0.5    0.0  0.0  0.0    2.1   0   0 
 ssd9      0.1    0.0    0.6    0.0  0.0  0.0    2.1   0   0 
 ssd10     0.5    1.2   14.2   49.1  0.0  0.0    2.6   0   0 
 ssd11     0.1    0.0    0.8    0.0  0.0  0.0    2.0   0   0 
 ssd12     0.3    0.0    5.8    0.1  0.0  0.0    3.5   0   0 
 ssd13     5.1    2.5  395.8  270.8  0.0  0.1    8.7   0   1 
 ssd14     2.4   23.7   46.2  121.9  0.0  0.0    1.4   0   2 
 ssd15     0.0    0.0    0.0    0.0  0.0  0.0    0.6   0   0 
 ssd16     0.1    0.0    0.2    0.0  0.0  0.0    1.9   0   0 
 ssd17     0.0    0.0    0.0    0.0  0.0  0.0    1.1   0   0 
 ssd18    73.5   12.0 13469.7  132.1  0.0  1.5   17.1   0  10
 ssd19     2.0    1.7  133.5   18.9  0.0  0.0    4.7   0   0 
 ssd23     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 ssd24     0.0    0.0    0.0    0.0  0.0  0.0    1.1   0   0 
 ssd25     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 ssd26     0.0    0.0    0.0    0.0  0.0  0.0    0.8   0   0 
 ssd27   594.9   65.9 12204.8  669.7  0.0  4.3   86.6   0  74
 ssd28     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0 
 ssd29     0.1    0.2    3.1    0.4  0.0  0.0    2.5   0   0 
 ssd30     0.1    0.0    1.8    0.0  0.0  0.0    2.2   0   0 
 ssd31   140.6   25.2 11266.5  315.0  2.9  5.4   60.3   2  15

Thanks for any comments.

Last edited by jim mcnamara; 02-25-2014 at 04:57 PM..
 

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Deleting an unusual file

Hi everyone, I was doing some practising with Unix and accidentally created a file with the name -------------------- Yeah, it was UNINTENTIONALLY. I tried removing it various ways like rm '--------------' rm '-.*' and all other sorts, but Unix keeps detecting that as an option stuff... ... (2 Replies)
Discussion started by: scmay
2 Replies

2. UNIX for Advanced & Expert Users

unusual function refrences

I'm wrting a program which needs to get the following information of a sever by calling some lib fuctions or system calls, so can anybody help to tell me those function names or where I can find the description of them ? CPU usage Memory usage Load procs per min Swap usage Page I/O Net I/O... (1 Reply)
Discussion started by: xbjxbj
1 Replies

3. Programming

unusual function refrences

I'm wrting a program which needs to get the following information of a sever by calling some lib fuctions or system calls, so can anybody help to tell me those function names or where I can find the description of them ? CPU usage Memory usage Load procs per min Swap usage Page I/O ... (11 Replies)
Discussion started by: xbjxbj
11 Replies

4. Shell Programming and Scripting

very unusual question about while

is there anyway to make while run a command faster than per second? timed=60 while do command sleep 1 done i need something that can run a script for me more than one time in one second. can someone help me out here? (3 Replies)
Discussion started by: Terrible
3 Replies

5. Shell Programming and Scripting

Unusual Problem

what is wrong with the below script: --------------------------------------------------------------------------------- #!/bin/bash echo "Setting JrePath..." grep -w "export JrePath" /etc/profile Export_Status=$? if echo "JrePath declared" elif echo "JrePath not declared" echo... (4 Replies)
Discussion started by: proactiveaditya
4 Replies

6. HP-UX

Unusual Behavior?

Our comp-operator has come across a peculiar ‘feature'. We have this directory where we save all the reports that were generated for a particular department for only one calendar year. Currently there are 45,869 files. When the operator tried to backup that drive it started to print a flie-listing... (3 Replies)
Discussion started by: vslewis
3 Replies

7. Shell Programming and Scripting

Using Awk specify Unusual Delimiter

# echo "size(JFJF" | awk -F"size(" '{print $1}' awk: fatal: Unmatched ( or \(: /size(/ the delimiter is "size(" but i'm not sure if awk is the best tool to use to specify it. i have tried: # echo "size(JFJF" | awk -F"size\(" '{print $1}' awk: warning: escape sequence `\(' treated as... (1 Reply)
Discussion started by: SkySmart
1 Replies

8. UNIX for Dummies Questions & Answers

unusual problem with cp command

I have made a simple script to zip a file then first copy it to a specific directory using cp command then move it to another directory. Files are getting generated at regular intervals in the dir. /one/two/three/four/. I have entry of my script in cron to run after every 2 min. #!/bin/sh... (9 Replies)
Discussion started by: Devesh5683
9 Replies

9. UNIX for Beginners Questions & Answers

Script unusual behavior

Hello, I have noticed some unusual behavior while running the script. when i use below script it gives output 355.23 #!/bin/bash ONEDAY=`date +%Y%m%d --date="1 days ago"` cat /opt/occ/var/performance/counters_`date -d "1 day ago" +%Y%m%d`*|grep "Gy,Gy-Gy-CCR"|awk -F"," '{print... (5 Replies)
Discussion started by: scriptor
5 Replies
All times are GMT -4. The time now is 11:33 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy