Performance problem - waiting on cache


 
Thread Tools Search this Thread
Operating Systems HP-UX Performance problem - waiting on cache
# 1  
Old 01-21-2011
Performance problem - waiting on cache

My server is running HP-UX 11.23 and one Oracle database. The server has 8 CPUs and is mostly idle all the time. Buffer cache is set to 10%min/max with 5GB memory on the server.

I have a user complaining that a batch process is all of a sudden taking a long time to finish. The DBA gave me the PID of the offending query and when I look up that process in Glance it shows the reason for waiting is CACHE. I set the update interval to 1 second and it never changes from cache. The process is consuming little CPU but still waits on cache.

Eventually the process will finish and die but it takes much, much longer than it used to just a few days ago. I know of no other processes that exhibit this behavior yet everyday this one process does the same thing, even with a new PID.

My hypothesis is that the server thinks that some data resides in cache but does not so it waits and waits. This doesn't really make sense to me but I cannot think of anything else that might cause this.

I have tried scouring the net for answers and have found none. If someone has any kind of logical guess of what might cause this behavior I'd love to hear it. It might help me figure out what is really going on.

Thanks,
Kevin
# 2  
Old 01-21-2011
Something to read:
http://h21007.www2.hp.com/portal/dow...rfCookBook.pdf

What hardware? What discs? Any SANs? Any NFS?
What version of Oracle?

The Oracle DBA should be able to give you detailed statistics for activity of the Oracle database engine and detailed statistics for activity of the Oracle session concerned.
It may just be reading millions of records without using an index ... or something like that. Look very closely for Oracle sorts - these are a notorious bottleneck with default Oracle SGA settings because they use disc if they have not been given enough memory.
# 3  
Old 01-21-2011
Smilie Is this a typo... 8 CPU's and 5 GB of memory? 5 GB per CPU for a total of 40 GB? Or really just 5 GB? So your cache is just half a GB? And you're running Oracle?

Well, here's my guess... you need more memory. Lots more memory. We have 4 cpus and 32 GB on our (Linux) Oracle servers. Ask Oracle support to suggest how much memory you need. They will quickly tell you that 5 GB is not enough.
# 4  
Old 01-21-2011
You're right, it was a typo. I was just checking if you were paying attention. =)
Actually, I was on Glance and got the wrong information as I was talking to the end user about the problem and got my numbers mixed up. There is actually 32GB on the box.

The server is a Superdome. We have an EMC DMX4. No NFS. Oracle is 10.2.0.

In regards to your comments about Oracle and indexing, etc., wouldn't I see high disk usage if this was true? If the process was waiting on data sorts then it should go to sleep correct until Oracle comes back with some data? The disks are not very busy at all. Glance shows 23% busy and sar shows about 6K blks/s total usage. avwait is 0 and avserv is around 2-4.

We are already scheduling a reboot to see if by chance this will clear it up. Maybe it will, maybe it won't, but even if it does, I'd still like to be able to explain what is happening to the higher powers when they asked why we needed an emergency reboot.
# 5  
Old 01-21-2011
32 GB is a much better figure but still a bit low for 8 cpus. And I still say memory is the problem. How much of that is devoted to Oracle's shared memory segments? ipcs -mb should show that.

To read data from a filesystem the data is loaded into cache (unless it's already there) and then delivered to the process. If cache is full a cache buffer must be freed. Once the buffer is free the read can continue. When you write to a file you change the data in a cache buffer. The buffer is said to be "dirty". A dirty buffer must be written to disk before it can be freed. You need more cache buffers and this probably means more memory.

Locking cache at 10% is probably not a great idea. I would try min at 10% and max at 20%. The would give the kernel some more freedom in managing what memory it has.
# 6  
Old 01-21-2011
T ID KEY MODE OWNER GROUP SEGSZ
Shared Memory:
m 131 0x00000000 --rw-r----- oracle dba 2202075136
m 132 0x00000000 --rw-r----- oracle dba 2197815296
m 133 0x00000000 --rw-r----- oracle dba 1411452928
m 134 0x97604490 --rw-r----- oracle dba 16384


We just got through rebooting the server and ran another process and it is exhibiting the same behavior.

I can look into increasing the DBC but that may be difficult to get through and take some time with our change management.

What's strange is that this very process has been running fine for years. Then, within the past week, it has all of a sudden been acting strangely. I am trying to figure out what has changed but so far it appears nothing has.

Someone recommended I run truss against the process. I've never run truss before so I do not know what to expect from it but it's worth a shot.

Also, I am not certain that what I am seeing in Glance is a problem. I think it is abnormal to see a process waiting on cache but this may be absolutely normal and running the way it should based on the query. The problem could be somewhere else but I have no idea where else to look. The issue we're trying to solve is "a certain query is all of a sudden taking a long time to run".


Thanks everyone for the help so far.

---------- Post updated at 01:52 PM ---------- Previous update was at 01:41 PM ----------

Quick update: I ran the truss (HP's tusc command) and all I see is line after line of:

read *****************
lseek *****************
read *****************
lseek *****************
read *****************
lseek *****************
read *****************
lseek *****************
read *****************
lseek *****************

and on and on and on. I'm inclined to believe now that there is not anything wrong with the server but getting our DBAs to delve deeper can be quite an issue. My job is to prove that nothing is wrong with the server or else find the problem and fix it.
# 7  
Old 01-21-2011
Check that the query is correctly designed
Check your memory sizing
Check your I/O configuration (asynchronous?)
Check your OS kernel parameters as well as your Oracle parameters

Do you have direct I/O ?
(also have a look at post #29 of this thread )

Any error message in the alert<DB_SID>.log ?

Get in touch with your Oracle DBA and ask him (or get it on your own) , the corresponding Oracle Installation Guide, you will find some sizing requirements as well as rule of thumb and advice for parameter tuning (also for some of your OS kernel parameters).


Any other appli running on that superdom ?

Is this oracle instance a new install or does it run for a long time and suddenly has performance issue ?

Are you running PA-RISC or Itanium?

You might also read this and read carefully the OS-Tuning and Kernel Tuning part on page 18 (and of course also the rest of the document ... Smilie )

You might want to work closely with your DBA if you are not familiar with oracle because this kind of troubleshooting may require OS+DBA knowledge.

Also ask your DBA if he has activated the automatic SGA and PGA management (he should, since those functionnalities are available in your oracle version)

Last edited by ctsgnb; 01-21-2011 at 04:34 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. OS X (Apple)

UNIX cache problem?

So I worked out a script for Mac OSX from an existing script. This script checks the versionnumber of a plugin and a package. If the plugin version is different from the package it has to update the plugin. The script works fine but the final check fails. Here is my script: #compare version... (6 Replies)
Discussion started by: mattiasvdm
6 Replies

2. Solaris

DNS Cache Problem-Urgent !!!!!!

I have DNS Server running in solaris 10 . There is website called exaple.com ,whcih was hosted in this dns server with IP 1.2.3.4 ,now we deleted the DNS entry of that website from our DNS Server (db.exmaple.com is deleted from named.conf ) and it is hosted with some other name server with IP... (1 Reply)
Discussion started by: sandeep.tk
1 Replies

3. Linux

File cache /Page cache Linux

Hi All, could any one point out any open source test-suites for "File cache" testing and as well as performance test suites for the same. Currently my system is up with Linux/ext4. Regards Manish (0 Replies)
Discussion started by: hmanish
0 Replies

4. Linux

getting info on Cache Size, Data Cache etc..

Hi all I saw in Microsoft web site www.SysInternals.com a tool called CoreInfo from able to print out on screen the size of the Data and Instruction caches of your processor, the Locigal to Physical Processor mapping, the number of the CPU sockets. etc.. Do you know if in Linux is available a... (2 Replies)
Discussion started by: manustone
2 Replies

5. Emergency UNIX and Linux Support

VPN performance problem

This is a weird problem I've been butting my head against for days now... I have two OpenVPN servers set up with identical configurations except for the keys. One of them is hosted in a datacenter with a large backbone, the other is hosted on my home server's limited residential internet. One... (9 Replies)
Discussion started by: Corona688
9 Replies

6. UNIX for Dummies Questions & Answers

Network performance problem

I have a Teradata Machine, using MP-RAS Unix, with a 1000 Intel Ethernet card and a Cisco switch. If I configure the ethernet card and the switch to auto, so they negotiate to 1000, or configure the ethernet card and switch manually to 1000Full or 100Full, the velocity is very very low. Only... (2 Replies)
Discussion started by: cuatrodos
2 Replies

7. Solaris

Performance problem

Hi All, There is a virtual user "ecoouk" which logs on to the server and runs some scripts. I want to know how much server performance can I gain if I put off all the scripts run by this user. Please tell me how to analyse how much resources a specific user is using. Regards, Abhishek (3 Replies)
Discussion started by: max29583
3 Replies

8. UNIX for Advanced & Expert Users

UBC cache vs. Metadata cache

hi, What is the difference between UBC cache and Metadata cache ? where can i find UBC cache Hits and Metadata cache Hits in hp-ux? Advanced thanx for the help. (2 Replies)
Discussion started by: sushaga
2 Replies

9. UNIX for Advanced & Expert Users

performance problem

Hello, I have a mail server (sendmail) with SUNOS 5.5.1. Just recently it began to respond very slowly. I used vmstat to check the performance data. Only interupt, system call and CPU context swiching are relatively high. Other statistics are normal, especially CPU utilization are very... (5 Replies)
Discussion started by: caoai
5 Replies
Login or Register to Ask a Question