Hi all,
We're been having issues with quite a few Solaris 10 VMs hanging after about a week of uptime. These VMs are running on VMware ESXi 4.1 U1 hosts and the issue does not occur on any specific host. We also running CentOS VMs and are not experiencing any issues with those VMs. The VMs that are experiencing this issue are running a few different patch levels. I've seen it occur on 147441-15, 144489-17, and 142910-17.
These VMs are running Tomcat (5.5.33 and some run 6.0.18) and PostgreSQL 9.0.3. The Tomcat apps use JDK version 1.6.0_26. When the hang occurs, all network services stop responding and the console echos what I type but does not respond or give me any prompt.
I booted one of the affected VMs with the -k parameter to enable the kernel debugger. When the hang occured, I followed the instructions at http : // docs.oracle.com/cd/E19082-01/819-2379/fvzni/index.html to invoke a system dump.
I analyzed the dump with the Solaris Crash Analysis Tool (SCAT) and this was the output:
Quote:
bash-4.1# ./scat /var/crash/unknown/unix.3 /var/crash/unknown/vmcore.3
Oracle Solaris Crash Analysis Tool
Version 5.3 (SV5415, Jan 31 2012) for Oracle Solaris 10 64-bit x64
Copyright © 1989, 2011, Oracle and/or its affiliates. All rights reserved.
Please note: Do not submit any health, payment card or other sensitive
production data that requires protections greater than those specified in
the Oracle GCS Security Practices. Information on how to remove data from
your submission is available at:
Oracle proprietary - DO NOT RE-DISTRIBUTE!
opening /var/crash/unknown/unix.3 /var/crash/unknown/vmcore.3 ...dumphdr...symtab...core...done
loading core data: modules...symbols...CTF...done
core file: /var/crash/unknown/vmcore.3
user: Super-User (root:0)
release: 5.10 (64-bit)
version: Generic_147441-15
machine: i86pc
node name: test-server
domain: mydomain.com
system type: i86pc
hostid: 351d7bc
dump_conflags: 0x10000 (DUMP_KERNEL) on /dev/zvol/dsk/rpool/dump(1G)
boothowto: 0x20040 (DEBUG|KMDB)
time of crash: Mon Apr 30 20:14:43 EDT 2012
age of system: 9 days 3 hours 53 minutes 23.76 seconds
panic CPU: 0 (1 CPUs, 1.99G memory)
panic string: BAD TRAP: type=e (#pf Page fault) rp=fffffe80000b3890 addr=0 occurred in module "<unknown>" due to a NULL pointer dereference
sanity checks: settings...
NOTE: /etc/system: module nfssrv not loaded for "set nfssrv:nfs_portmon=0x1"
vmem...
WARNING: CPU0 has cpu_intr_actv for 5
WARNING: PIL5 interrupt thread 0xfffffe80000b3c60 on CPU0 pinning SYS thread 0xfffffe8000351c60
WARNING: CPU0 has 9 threads in its dispatch queue
sysent...clock...misc...
WARNING: needfree is 80 pages
WARNING: freemem_wait is 80 (threads)
WARNING: page_create() throttled (freemem < throttlefree)
WARNING: hard swapping (avefree < minfree)
NOTE: nscan is 44505
NOTE: push_list_size is 256
WARNING: 15 expired realtime (max -16.924085995s) callouts (17 on expired lists)
done
CAT(/var/crash/unknown/vmcore.3/10X)> meminfo
pages bytes
physinstalled 524175 2147020800 (1.99G)
physmem 521102 2134433792 (1.98G)
total_pages 521102 2134433792 (1.98G)
freemem 1754 7184384 (6.85M)
avefree 1752 7176192 (6.84M)
avefree30 1734 7102464 (6.77M)
needfree 80 327680 (320K)
freemem_wait 80 threads
availrmem (nonswapable) 137232 562102272 (536M)
availrmem_initial 521102 2134433792 (1.98G)
swapfs_minfree 65137 266801152 (254M)
sw_pending_size 4096 (4K)
lotsfree 8142 33349632 (31.8M)
desfree 4071 16674816 (15.9M)
minfree 2035 8335360 (7.94M)
throttlefree 2035 8335360 (7.94M)
pp_kernel(calculated) 375038 1536155648 (1.43G)
obp_pages 1536 6291456 (6M)
kcage_on: 0
shared memory (SM) 0 (0)
intimate SM (ISM) 37289984 (35.5M)
dynamic ISM (DISM) 0 (0)
locked DISM 0 0 (0)
total locked SM 37289984 (35.5M) (1.73% of memory)
spt_used (ISM) 9104 37289984 (35.5M)
segspt_minfree 21184 86769664 (82.7M)
WARNING: page_create() throttled (freemem < throttlefree)
WARNING: hard swapping (avefree < minfree)
anoninfo: (physical == disk-backed)
ani_max - total reservable physical swap 524287 pages (1.99G)
ani_free - unallocated physical and memory 337747 pages (1.28G)
ani_phys_resv - reserved physical 458542 pages (1.74G)
ani_mem_resv - reserved memory 9104 pages (35.5M)
ani_locked_swap - swap locked in reserved mem swap 9104 pages (35.5M)
initial virtual swap available for reservation 980252 pages (3.73G)
ani_max + MAX((availrmem_initial - swapfs_minfree), 0)
current virtual swap available for reservation 137840 pages (538M)
(ani_max - ani_phys_resv) + MAX((availrmem - swapfs_minfree), 0)
swap device pages free
/dev/zvol/dsk/rpool/swap 524287 (1.99G) 458362 (1.74G)
tmpfs:
tmount size mount point
0xffffffff8315d018 684K /etc/svc/volatile
0xffffffff848274c0 8K /tmp
0xffffffff841fc1e8 28K /var/run
ramdisk: (none)
CAT(/var/crash/unknown/vmcore.3/10X)>
I'm thinking the hangs are memory related, based on the output from SCAT. These VMs have 2 GB of memory. Would a lack of memory cause Solaris to completely hang? Shouldn't it be reserving some for the kernel? There is no useful information in /var/adm/messages when the hang occurs.
Thanks for any help you can provide.
Derek