Lots of page faults and free memory


 
Thread Tools Search this Thread
Operating Systems AIX Lots of page faults and free memory
# 1  
Old 12-15-2009
Lots of page faults and free memory

Hello,

I've been reading your forums for quite a while and the great amount of information I find here always come in hand.This time however, I need some specific help...

I have a doubt with an AIX server which I'm failing to understand as I'm new to its concept of memory management...

Straight to the point, I have a sever which yields high number of page faults even though it has plenty of available memory.

This server runs a file-reading-intensive program and an oracle database. I have no serious performance problems so far, but these page faults started to worry me as we plan to stuff some more tasks into this 40GB men server.

I first went on and did my homework reading about AIX VMM (this is my first time with AIX servers Smilie ) and got a glance of it's peculiar way of paging everything files and programs and the way it uses a deamon to steal and clean whenever it runs short. Well... I come from Solaris, where short memory causes pages faults. Page faults causes scan-rate and scan-rate most-likely means paging. When I issue my vmstat and see the 'sr' column with 4 digit numbers in a production server, it feels... wrong...

Now, I'll past below my stats.. could someone give me more perspective of what I'm seeing?

System (from nmon startup):
Code:
│                               6 - CPUs currently                   │
│                               6 - CPUs configured                  │
│                            1900 - MHz CPU clock rate               │
│                  PowerPC_POWER5 - Processor                        │
│                          64 bit - Hardware                         │
│                          64 bit - Kernel                           │
│                         Dynamic - Logical Partition                │
│                    5.3.7.1 ML07 - AIX Kernel Version               │

Code:
$ vmstat 10 5
 r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec
 3 1 6892117 4855669 0 1 0 543 3695 0 3333 26670 12425 82 5 13 0 2.30 91.9
 5 0 6892111 4855614 0 0 0 439 3895 0 3245 74660 12366 77 13 10 0 2.37 94.9
 3 0 6891110 4856689 0 0 0 635 4806 0 3129 44884 12170 80 6 14 0 2.29 91.4
 4 0 6891517 4856185 0 0 0 504 4241 0 3208 41366 13178 80 6 14 0 2.29 91.5
 3 0 6891105 4856693 0 0 0 388 2059 0 3162 27696 13502 82 5 12 0 2.33 93.1

Code:
$ vmstat -v
             12582896 memory pages
             11905361 lruable pages
              4858699 free pages
                    1 memory pools
              3055581 pinned pages
                 80.0 maxpin percentage
                 10.0 minperm percentage
                 20.0 maxperm percentage
                 19.9 numperm percentage
              2379986 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 19.9 numclient percentage
                 20.0 maxclient percentage
              2379986 client pages
                    0 remote pageouts scheduled
                 5114 pending disk I/Os blocked with no pbuf
               151149 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf
                41094 client filesystem I/Os blocked with no fsbuf
                 8101 external pager filesystem I/Os blocked with no fsbuf
                    0 Virtualized Partition Memory Page Faults
                 0.00 Time resolving virtualized partition memory page faults

nmon shot (memory and paging):
Code:
│ Memory ────────────────────────────────────────────────────────────────────────│
│          Physical  PageSpace |        pages/sec  In     Out | FileSystemCache  │
│% Used       62.6%     40.5%  | to Paging Space   0.0    0.0 | (numperm) 18.7%  │
│% Free       37.4%     59.5%  | to File System    0.0  207.8 | Process   21.6%  │
│MB Used   30779.8MB  7405.8MB | Page Scans        0.0        | System    22.3%  │
│MB Free   18372.1MB 10898.2MB | Page Cycles       0.0        | Free      37.4%  │
│Total(MB) 49151.9MB 18304.0MB | Page Steals       0.0        |           ------ │
│                              | Page Faults    5474.0        | Total    100.0%  │
│------------------------------------------------------------ | numclient 18.7%  │
│Min/Maxperm     4651MB(  9%)  9301MB( 19%) <--% of RAM       | maxclient 18.9%  │
│Min/Maxfree     960   1088       Total Virtual   65.9GB      | User      35.7%  │
│Min/Maxpgahead    2      8    Accessed Virtual   27.0GB 41.0%  Pinned    24.3%  │
│                                                                                │
│ Paging-Space ──────────────────────────────────────────────────────────────────│
│    Volume-Group PagingSpace-Name Type LPs  MB    Used IOpending                │
│          rootvg              hd6  LV  128  4096  60%    0    Active    Auto    │
│          rootvg         paging00  LV  126  4032  60%    0    Active    Auto    │
│          rootvg         paging01  LV  318 10176  25%    0    Active    Auto    │
│────────────────────────────────────────────────────────────────────────────────│

topas shot:
Code:
 Tue Dec 15 13:45:24 2009   Interval:  2         Cswitch   11920  Readch  3124.7K
                                                 Syscall   29015  Writech 2256.6K
 Kernel    9.4   |###                         |  Reads       703  Rawin         1
 User     82.4   |########################    |  Writes      326  Ttyout      238
 Wait      0.0   |                            |  Forks         3  Igets         0
 Idle      8.2   |###                         |  Execs         3  Namei      2381
 Physc =  2.40                     %Entc=  96.1  Runqueue    4.5  Dirblk        0
                                                 Waitqueue   0.0
 Network  KBPS   I-Pack  O-Pack   KB-In  KB-Out
 en4    6243.7   5640.5  1053.0  6179.9    63.7  PAGING           MEMORY
 lo0       0.0      0.0     0.0     0.0     0.0  Faults     3226  Real,MB   49151
                                                 Steals        0  % Comp     42.2
 Disk    Busy%     KBPS     TPS KB-Read KB-Writ  PgspIn        0  % Noncomp  18.8
 hdisk3    2.0     1.7K   28.5     0.0     1.7K  PgspOut       0  % Client   18.8
 hdisk23   1.5   512.9     4.0     0.0   512.9   PageIn        3
 hdisk14   0.5   172.3    35.1     0.0   172.3   PageOut     560  PAGING SPACE
 hdisk2    0.0     0.0     0.0     0.0     0.0   Sios        563  Size,MB   18304
                                                                  % Used     40.0
 Name            PID  CPU%  PgSp Owner           NFS (calls/sec)  % Free     60.0
 java        1208454  68.5 144.4 util            ServerV2       0
 syncd        348408   4.8   0.5 root            ClientV2       0   Press:
 java        1364098   1.1  60.0 root            ServerV3       0   "h" for help
 topas        577692   0.0   2.0 util            ClientV3       0   "q" to quit

I appreciate any feedback!

cheers!

f.
# 2  
Old 12-15-2009
Page faults occure when a page is demanded but has not been read into memory. So this is not a problem at 1st hand but often just a usual way when things are read from disk into memory and I do not mean from paging space.

For scan rate and freeing memory I wrote something here:
Scan Rates

It seems your box has running a CPU intensive java application and is not really short on memory. Nothing to worry about as long as the value of kthreads in the r-column (runqueue) is not bigger than the number of CPUs. Even though I saw boxes with higher numbers there and the application ran smooth though. It seems that current VMM setting just make it scanning memory where there is still enough available.
Severe memory problems usually are indicated when the pi/po columns of vmstat for example (which stand for pagingspaceins and pagingspaceouts) show any numbers. Usually you just try out some VMM settings and get over this. If not you might need more memory, which seems not the case with your box. Other tools show pi and po and mean normal pagein and pageout. Your topas output shows both, pageins/pageouts and pgspins/pgspouts. The later two regarding Paging Space are the problematic ones.

As it seems somebody has already tuned a bit on the box since
Code:
                 20.0 maxperm percentage

Does not have the default value anymore. Today people usually don't set this anymore but people set lru_file_repage=0 and let maxperm and maxclient on 80-90%. minperm is often reduced to 5%. So LRU decides with those thresholds when to clean up or not. You could check out if this is more pleasant for you and reduces sr/fr.

There is also some things here, that might not be noticable but can be tuned with ioo:
Code:
                 5114 pending disk I/Os blocked with no pbuf
               151149 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf
                41094 client filesystem I/Os blocked with no fsbuf
                 8101 external pager filesystem I/Os blocked with no fsbuf

If those values don't increase fast/in big amounts, you might not want to worry about them - best monitor them to decide if you tune them with ioo or not.

If you decide to use vmo to try out what I suggested, you could do the following:
Code:
vmo -p -o lru_file_repage=0
vmo -p -o minperm%=5
vmo -p -o lru_poll_interval=10

With vmo -x or -L you can always check back which command needs a reboot etc. and what the default values are and what the current are. So maybe write them down somewhere in case you want to go back. These 3 need no reboot. It can take some minutes until LRUD has settled memory.

And here some interessting links:

http://www.filibeto.org/unix/aix/lib...cement-vmm.pdf
Jaqui's AIX Performance and Security Blog
... as well as the Performance Tuning Redbooks from IBM.

Oh and please give note when you try out something and tell us how it went, thanks Smilie
# 3  
Old 12-15-2009
I agree with zaxxon - your system urgently needs a proper system tuning ...
Your stats look like the system had been recently rebooted. You are not allowing your system to use the memory it has - and even worse you are allowing your system to move computational pages into paging space what is a real bad thing.

Assuming your oracle database is 10.2.0.4 or 11, I would suggest even more drastical tuning values. Try using cio on your filesystems, if it isn't ASM, and try following tunables if it is an OLTP database:

Code:
vmo -p -o minperm%=3
vmo -p -o maxperm%=90
vmo -p -o maxclient%=90
vmo -p -o minfree=960
vmo -p -o maxfree=1088
vmo -p -o lru_file_repage=0
ioo -p -o pv_min_pbuf=1024
ioo -p -o j2_maxPageReadAhead=128
ioo -p -o j2_dynamicBufferPreallocation=16

you should see your minfree list going down what is expected.
How many disks do you have in rootvg - if you only have one mirrored pair, drop the additional pagingspaces and extend instead hd6. If you have more disks, you can keep the additional paging spaces but make sure they're not mirrored and each one is on a separate disk.

Kind regards
zxmaus

Last edited by zxmaus; 12-16-2009 at 02:33 PM.. Reason: added code tags :)
# 4  
Old 12-16-2009
Hi guys,

Thanks a lot for the replies and sorry for the late response!

So yeah, I do agree that the setting as poor right now. Appart from so many page faults, the isn't a sensible impact in performance. Actually as zaxxon pointed out, there isn't a whole lot of I/O happening here.

As for the tuning recommendations, they seem right, in fact they are a lot more in accordance to what this white paper says:

http://www.ibm.com/developerworks/forums/servlet/JiveServlet/download/747-3

(page 14)

But to be honest I can't really touch that. I'm more of application/dev level, so I'm trying to figure out what I could tell from my pov what could be done.
For now, I can only report what I've found to be 'normal' according to the current settings, and forward the white book as a guide line.
Besides, these flags seem to have been changed already. The guys doing it had they reason and knowledge for that.

Nevertheless, I'll monitor any changes they make in the server and let you guys know as it gets sorted.

Still, I wish I could tell more precisely, i.e. 1+1=2, why I'm seeing so much page faults.
While I can see that 20% maxperm, would drive my LRU nut trying to free more space in a memory intense environment, I cant really grasp how it causes so many page faults...

A guy from the IBM forum following the same tread said that I probably wont be able to figure that out without deep knowledge of what's running. That's somewhat... unsaddling.. I mean, would that mean that to fine tune my memory manager, I have to be a specialist in everything that runs in a server plus the server itself? I'd prefer to believe in a JVM-like world, where I can monitor and test GC's statistics and figure out my settings without having to know a lot of what or how does the program does...

cheers,

f.

EDIT - removed grumpiness Smilie

Last edited by flpgdt; 12-16-2009 at 09:57 AM..
# 5  
Old 12-16-2009
You can point your admins politely/carefully here to this thread/forum. It's about helping each other not to show off so maybe they appreciate it Smilie
# 6  
Old 12-16-2009
err... sure that!
I'm sorry if sounded sceptical-grumpy!
Not an awesome day here and I might be a bit grumpy, but I never intended to criticise anyone!

Apologies again!

f.
# 7  
Old 12-16-2009
Nah, didn't mean that. Just thought in case you tell your admins about what "to do", that they might be a bit of abused/insulted. I did not mean that you sound rude to us! Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

High Paging when lots of free memory AIX 5.3

I am new to AIX, I have few AIX 5.3 servers and I could see there are significant difference in paging space utilization on servers even though they are running same applications below server is working fine which shows 2-5 % paging usage throuh out the day cpu_scale_memp = 8... (12 Replies)
Discussion started by: bibish
12 Replies

2. Solaris

Page faults on OS

Hi guys, I have a zone on a M5000 server running solaris 10. The zone has an SAP application running on it and facing some performance issues. As part of the troubleshooting, I've been recommended to look for any paging on the OS. Please advise how to look for the paging. I've been looking at... (4 Replies)
Discussion started by: frum
4 Replies

3. AIX

AIX 7.1 high page faults

hi guys i hope you can help me with this situation. i have 2 lpar with aix 7.1 and oracle 11gr2 in grid mode. when i start nmon to check the current system health i notice that page fault are over 3000/s. than i have opened a case with ibm and they say that the problem is not paging nor... (10 Replies)
Discussion started by: gullio23
10 Replies

4. AIX

Lots of page faults on AIX mySQL lpar

Hi, OS = AIX 5.3 Large number of page faults recently start to occure on AIX 5.3 lpar with mysql database installed. I need help in setting AIX OS parameter to solve the paging problem and some guidance on interpreting my stats t Code: # vmstat... (5 Replies)
Discussion started by: crosys
5 Replies

5. Programming

How to deal with lots of data in memory in order not to run out of memory

Hi, I'm trying to learn how to manage memory when I have to deal with lots of data. Basically I'm indexing a huge file (5GB, but it can be bigger), by creating tables that holds offset <-> startOfSomeData information. Currently I'm mapping the whole file at once (yep!) but of course the... (1 Reply)
Discussion started by: emitrax
1 Replies

6. Solaris

how to get the more memory free space (see memory free column)

Hi all, Could please let me know how to get the more memory free space (not added the RAM) in local zone. -bash-3.00# vmstat 2 5 kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s1 s1 in sy cs us sy... (3 Replies)
Discussion started by: murthy76
3 Replies

7. AIX

High Page Faults

Sorry my poor english In 570 pseries nmon shows excessive page faults, ascents of something more than 30000 Page faults. System: AIX 5.2 ML5 Processor Type: PowerPC_POWER5 Number Of Processors: 2 Processor Clock Speed: 1656 MHz CPU Type: 64-bit Kernel Type: 64-bit Memory Size: 2816 MB ... (1 Reply)
Discussion started by: daviguez
1 Replies

8. UNIX for Advanced & Expert Users

Shared memory shortage but lots of unused memory

I am running HP-UX B.11.11. I'm increasing a parameter for a database engine so that it uses more memory to buffer the disk drive (to speed up performance). I have over 5GB of memory not being used. But when I try to start the DB with the increased buffer parameter I get told. "Not... (1 Reply)
Discussion started by: cjcamaro
1 Replies

9. HP-UX

Intransient blocking page faults

Hi, Will anybody tell me what is this 'intransient blocking page faults' in HP-UX, it is in the structure _pst_vminfo in the header file /ust/include/sys/pstat/vm_pstat_body.h? (4 Replies)
Discussion started by: sushaga
4 Replies

10. HP-UX

Copy on Write page faults

Hello Please can you tell me how to access COPY ON WRITE page faults in HP -UNIX. I found the structure in /usr/include/sys/vmmeter with the structure name vmmeter. Please tell me the function to fill the values to this structure and also the arguments for function.:: (5 Replies)
Discussion started by: manjunath
5 Replies
Login or Register to Ask a Question