AIX memory issue

05-05-2012

Registered User

364, 3

Join Date: Mar 2008

Last Activity: 2 November 2018, 6:40 PM EDT

Posts: 364

Thanks Given: 31

Thanked 3 Times in 3 Posts

@bakunin

Please check the below output.

Code:

bash-3.2# oslevel -g
6.1.0.0

bash-3.2# vmo -o
vmo: A flag requires a parameter: o
Usage:  vmo -h [tunable] | {[-F] -L [tunable]} | {[-F] -x [tunable]}
        vmo [-p|-r] (-a [-F] | {-o tunable})
        vmo [-p|-r] [-y] (-D | ({-d tunable} {-o tunable=value}))

bash-3.2# schedo -o
schedo: A flag requires a parameter: o
Usage:  schedo -h [tunable] | {[-F] -L [tunable]} | {[-F] -x [tunable]}
        schedo [-p|-r] (-a [-F] | {-o tunable})
        schedo [-p|-r] [-y] (-D | ({-d tunable} {-o tunable=value}))

bash-3.2# ioo -i
ioo: Not a recognized flag: i
Usage:  ioo -h [tunable] | {[-F] -L [tunable]} | {[-F] -x [tunable]}
        ioo [-p|-r] (-a [-F] | {-o tunable})
        ioo [-p|-r] [-y] (-D | ({-d tunable} {-o tunable=value}))

bash-3.2# vmstat 1

System configuration: lcpu=8 mem=30720MB

kthr    memory              page              faults        cpu    
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
13  0 2170427 40491   0   0   0   0    0   0  10 4514 4905  3  1 96  0
 8  0 2170427 40491   0   0   0   0    0   0   3 4355 5100  3  1 96  0
20  0 2170428 40490   0   0   0   0    0   0   4 4573 5072  3  1 96  0
 6  0 2170428 40490   0   0   0   0    0   0   1 4131 4717  3  1 96  0
25  0 2170428 40490   0   0   0   0    0   0  16 3941 4397  2  1 96  0
 7  0 2170428 40490   0   0   0   0    0   0  18 4141 4693  3  1 96  0
27  0 2170428 40490   0   0   0   0    0   0   1 4353 4821  3  1 96  0
30  0 2170428 40490   0   0   0   0    0   0   1 3978 4633  2  1 96  0
 3  0 2170772 40146   0   0   0   0    0   0  45 4338 4965  7  1 91  0
 8  0 2170772 40146   0   0   0   0    0   0   4 3963 4570  3  1 96  0
22  0 2170779 40137   0   0   0   0    0   0   7 74295 4828  8  4 88  0
 2  0 2170780 40136   0   0   0   0    0   0  10 3869 4701  3  1 96  0
 2  0 2171860 39056   0   0   0   0    0   0   5 5076 4807  6  2 92  0


bash-3.2# iostat 5 | grep -v '0.0'

System configuration: lcpu=8 drives=6 paths=5 vdisks=5

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           0.2       2.4       0.6          0        12

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk2           0.2       3.2       0.2          0        16
hdisk3           0.2       3.2       0.2          0        16
hdisk4           0.2       3.2       0.2          0        16

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn
hdisk0           4.8      59.8      13.8          0       300
hdisk2           0.4      28.7       6.0          0       144

tty:      tin         tout    avg-cpu: % user % sys % idle % iowait

Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn


bash-3.2# svmon -G
               size       inuse        free         pin     virtual   mmode
memory      7864320     7824082       40238      701299     2170675     Ded
pg space    4718592        8760

               work        pers        clnt       other
pin          461763           0           0      239536
in use      2170675           0     5653407

PageSize   PoolSize       inuse        pgsp         pin     virtual
s    4 KB         -     6673266        8760      323731     1019859
m   64 KB         -       71926           0       23598       71926


bash-3.2# vmstat -v
              7864320 memory pages
              7602176 lruable pages
                46384 free pages
                    1 memory pools
               701202 pinned pages
                 80.0 maxpin percentage
                  3.0 minperm percentage
                 90.0 maxperm percentage
                 73.8 numperm percentage
              5616515 file pages
                  0.0 compressed percentage
                    0 compressed pages
                 73.8 numclient percentage
                 90.0 maxclient percentage
              5616515 client pages
                    0 remote pageouts scheduled
                   66 pending disk I/Os blocked with no pbuf
                    0 paging space I/Os blocked with no psbuf
                 2484 filesystem I/Os blocked with no fsbuf
                  313 client filesystem I/Os blocked with no fsbuf
                10368 external pager filesystem I/Os blocked with no fsbuf
                 28.0 percentage of memory used for computational pages

Last edited by Scrutinizer; 05-05-2012 at 01:00 PM.. Reason: code tags instead of quote tags..

learnbash

View Public Profile for learnbash

Find all posts by learnbash

05-05-2012

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

First off, i'd like to apologize for leaving some (unnoticed) typos in my first answer: vmo -o should have read vmo -a of course, analogous with schedo and ioo. I wanted to know the state of all your kernel tuning parameters and hence you to print out these values.

Still, you could have anticipated that by having a single look in the man page of one of these tools when the syntax error showed up. The fact that i am trying to help you does not exempt you from showing initiative of your own. When i give you a command with some explanation you might as well have a look in the documentation to understand what the command does so to be able to use it yourself the next time a similar problem shows up. You may even ask (here or in a separate thread) if something puzzles you and you are not able to figure it out yourself. But DO something, by all means! Not understanding something is OK, doing nothing is NOT.

OK, back to your problem. I suppose you have taken the data when the system was under load, not when the (main) application was switched or doing nothing. If it was not under load: all i wrote below is under the presumption of the system being under load and hence nonsense. Start over again and this time with real data, not watching the system doing nothing.

The output of iostat shows that there is next to no activity at all on the disks. The "grep" part of the command i gave you filters out all disks with 0.0 activity and in your case that left practically none. The minimal activity at the end is mainly the system disks, probably (re-)loading some program or library to memory. I'd say your disks are idle.

The vmstat output shows a similar picture: the first column ("r" - "run queue") shows a lot of programs running, but the system has all the means to run them: the second column ("b" - blocked processes) is constantly 0, so all the processes which could run are able to do so. The last 4 columns emphasize this. They are labeled "us", "sy", "id", "wa" and are percentages of the CPU usage. Together they add up to 100. "us" is for "user", the percentage of the time the CPU(s) spend(s) processing code of your programs. "sy" is for "system" and is the time the CPU spends executing system code, "id" is for "idle" - the CPU having nothing to do - and the last "wa" is for "wait": the time the CPU would be avle to run a process but has to wait because the process waits for I/O (usually swapping).

As you see the value for "wa" is 0 (this is how it should be), the value of "us" and "sy" are both very low and "id" is near 100. All in all your system looks like being overpowered a lot. Probably you could do with half or a quarter of the real CPUs assigned to this LPAR in the HMC profile.

The colums labeled "pi" and "po" count pages swapped in ("pi") and out ("po"). They are constantly 0, which is how it should be. At the same time this tells me that the system has now memory problem at all. This is further emphasized by the fact that "in" ("pages inspected - for possible swapping, that is) is low too - the system doesn't even bother to look for possibilities to make memory available, which means it has this resource in relative abundance.

The same story is told by the output of svmon which gives an aggregated, long-term account of memory consumption. Numbers there are memory pages (4k in AIX). Note the second number in the first line, "inuse", which is the memory available for programs: 7.8 million x 4k ~ 30G minus some system overhead (the difference between "inuse" and "size"). Now compare this with "virtual", the next to last value in the first line. This is the memory the system has really needed over time. 2.2 mio x 4k ~ 9G. This is what the system really needs to have, everything else is more "nice to have" than of real value. I would consider shrinking the system down to 12 GB (the necessary 9 plus some in contingency) if this is the typical status of the system.

Conclusion: if your system is slower than expected than this definitely has nothing to do with memory or CPU. It might be the network (we haven't investigated that yet), but frankly, i doubt that, because some residues of that would show up in the "vmstat -v" output.

Another possible reason to investigate would be sloppy application programming. As i don't know anything about the application you run this is pure speculation.

There could be some arbitrary shortage of assigned memory for the application: check with some DBA the size and the layout of the SGA and some related DB tuning options. DB_BLOCK_BUFFERS for instance, might put an arbitrary constraint in I/O operations. Similar, if some Java processes run on top of the database they might have not enough memory assigned. This is a common problem in Websphere application servers. The number of "pinned pages" in the output of "vmstat -v" suggests that the SGA is using only a small part of the available memory.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

05-06-2012

Registered User

364, 3

Join Date: Mar 2008

Last Activity: 2 November 2018, 6:40 PM EDT

Posts: 364

Thanks Given: 31

Thanked 3 Times in 3 Posts

Currently below is the status and there is no activity right now, only applications are running. still system have this status. I am sorry i did not read the man page.

Code:

load averages:  8.60,  8.62,  6.88;                                                                                  23:19:56
132 processes: 131 idle, 1 running
CPU states: 99.3% idle,  0.2% user,  0.3% kernel,  0.0% wait
Memory: 30G total, 21G buf, 139M free
Swap: 18G total, 18G free


bash-3.2# vmo -a
             ame_cpus_per_pool = n/a
               ame_maxfree_mem = n/a
           ame_min_ucpool_size = n/a
               ame_minfree_mem = n/a
               ams_loan_policy = n/a
  enhanced_affinity_affin_time = 1
enhanced_affinity_vmpool_limit = 10
           force_relalias_lite = 0
             kernel_heap_psize = 65536
                  lgpg_regions = 0
                     lgpg_size = 0
               low_ps_handling = 1
                       maxfree = 1088
                       maxperm = 6841958
                        maxpin = 6339364
                       maxpin% = 80
                 memory_frames = 7864320
                 memplace_data = 0
          memplace_mapped_file = 0
        memplace_shm_anonymous = 0
            memplace_shm_named = 0
                memplace_stack = 0
                 memplace_text = 0
        memplace_unmapped_file = 0
                       minfree = 960
                       minperm = 228065
                      minperm% = 3
                     nokilluid = 0
                       npskill = 36864
                       npswarn = 147456
                     numpsblks = 4718592
               pinnable_frames = 7161165
           relalias_percentage = 0
                         scrub = 0
                      v_pinshm = 0
              vmm_default_pspa = 0
            wlm_memlimit_nonpg = 1


bash-3.2# schedo -a
         affinity_lim = 7
        big_tick_size = 1
ded_cpu_donate_thresh = 80
     fixed_pri_global = 0
            force_grq = 0
              maxspin = 16384
             pacefork = 10
      proc_disk_stats = 1
              sched_D = 16
              sched_R = 16
        tb_balance_S0 = 2
        tb_balance_S1 = 2
         tb_threshold = 100
            timeslice = 1
      vpm_fold_policy = 1
           vpm_xvcpus = 0


bash-3.2# ioo -a
                    aio_active = 1
                   aio_maxreqs = 65536
                aio_maxservers = 30
                aio_minservers = 3
         aio_server_inactivity = 300
         j2_atimeUpdateSymlink = 0
 j2_dynamicBufferPreallocation = 16
             j2_inodeCacheSize = 400
           j2_maxPageReadAhead = 128
             j2_maxRandomWrite = 0
          j2_metadataCacheSize = 400
           j2_minPageReadAhead = 2
j2_nPagesPerWriteBehindCluster = 32
             j2_nRandomCluster = 0
              j2_syncPageCount = 0
              j2_syncPageLimit = 16
                    lvm_bufcnt = 9
                    maxpgahead = 8
                    maxrandwrt = 0
                      numclust = 1
                     numfsbufs = 196
                     pd_npages = 65536
              posix_aio_active = 0
             posix_aio_maxreqs = 65536
          posix_aio_maxservers = 30
          posix_aio_minservers = 3
   posix_aio_server_inactivity = 300

Moderator's Comments:

Please use [code]...[/code] tags instead of [quote]...[/quote] tags for code and samples

Last edited by bakunin; 05-16-2013 at 08:09 PM.. Reason: code tags instead of quote tags

learnbash

View Public Profile for learnbash

Find all posts by learnbash

05-06-2012

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

OK, we have now some of the information my colleagues and i asked from you. We still do not know:

- what the system is built of: is it a LPAR or WPAR? If the former, which configuration do you use (capped/uncapped CPU assignments, shared/dedicated networks, which types of disks are attached and how, etc., etc.)

- what the system exactly does: which applications (and versions/releases/patch levels thereof) are running, what the expected performance should be (versus what the performance actually is - which part of the systems operation can you name that is doing slower or less than expected), how is the application used (for instance: is the database used for user interaction and therefore expected to have its maximum load during day? Or is it batch oriented and the heavy load can be expected to start every day at the exact same time? Or ...)

- how the users connect to the system: Not at all (because only application servers connect directly and the users connect to the app-server?), via a shell session (say: telnet, rlogin, ssh, etc.), via a graphical interface (CDE, X-session, ... ?), etc.

As much as i am willing to help there are two general constraints: i will help you to help yourself but i will definitely not do the work you cannot do - its simply not my job and if you want to have me working for you please hire me, i make a living from that. Right now i ask ten questions, you answer the two which require the least effort and i try to make as much as possible from incomplete data. Have a look at my first post and ask yourself which of the questions i posed there (and repeated here) you have in fact answered. Which of the theories i have developed did you comment on and tried to approve/disprove? How much have you written about your system and how much did i or, generally said: how much effort are you putting into this and how much do i? Sorry to be that blunt, but: this is not showing the right attitude.

The second thing is: what exactly is the problem? We still do not know what your SLA (service level agreement) is. The thing is: you (the metaphorical "you") agree to deliver a system which performs function X at the rate of Y. If the actual performance is less than Y you have a performance problem, otherwise not. "Tuning" is always a process, where some function X operating at the rate of some Y' is raised (or lessened) to this Y. If you reach this goal you again have no problem while if you don't you have one. The expected "rate of Y" is what is called SLA.

Example: your boss tells you to have the system do 100k transactions per second (your SLA). It does 80k/s right now, so you have to do some tuning. You change the system and it does 90k/s - goal not reached and you are not done. You change the system further and it does 105k/s - mission accomplished, go back to surfing the web.

I hope this helps.

bakunin

These 2 Users Gave Thanks to bakunin For This Post:

bakunin

View Public Profile for bakunin

Find all posts by bakunin

05-07-2012

Registered User

364, 3

Join Date: Mar 2008

Last Activity: 2 November 2018, 6:40 PM EDT

Posts: 364

Thanks Given: 31

Thanked 3 Times in 3 Posts

@bakunin

Dear sir, Thanks so much for your kind reply, now system load is decrease, we are using lpar, regarding hard-disk and other detail i will provide you in detail. Actually my main concern is why buffer is 21gb, is it possible we can decrease that, like in linux we can decrease memory/buffer cache.

Anyway once again thanks, i will try to give more information soon.

---------- Post updated at 01:32 AM ---------- Previous update was at 01:30 AM ----------

Users are connected via ssh only no gui interface involved in that.

Some time load goes 2 and sometimes now to 9. Java and oracle is taking memory alot.

learnbash

View Public Profile for learnbash

Find all posts by learnbash

05-07-2012

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by learnbash

Actually my main concern is why buffer is 21gb

I will try to put it simple, so the experts might see some detail missing in order to make it easier to understand.

It so, because you told the system (see the values of "maxperm%" and "minperm%" in the output of vmo -F -a) to preserve a certain amount of memory as free. The rest is first used for programs. When all the programs are loaded and there is still some of this memory available then this will be assigned to caching I/O-operations (this is your "buffer memory"). This assignment is temporary in nature and as soon as there is a program to be started an according amount of memory will be taken away from this buffer memory and given to the program. If some program ends, on the other hand, its memory will not be simply "free" (which means unused), but given to the buffer memory as long as it isn't needed elsewhere.

If you want to reduce buffer memory (which will probably have no adverse effects as far as i can tell) decrease the size of the assigned memory of the LPAR on the HMC profile. This will reduce the size of available memory after fulfilling all the programs requests and therefore less memory will be assigned to I/O-buffers.

Also investigate - together with the DBA! - the possibility of enlarging the SGA. To be honest i wonder how you get away with this much free memory without having the DBA pestering you to get it. In my experience they are a memory-hogging lot.

Btw., as i said before, the system is probably overpowered with respect to the physical CPUs assigned to it too. You might want to investigate possibilities to shrink that back to a sensible amount.

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

AIX

AIX memory issue

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Out of memory issue in perl

Discussion started by: ramkumar15

2. Linux

Swap memory issue

Discussion started by: ratheeshjulk

3. Red Hat

Memory Issue

Discussion started by: rsheikh01

4. AIX

Memory issue

Discussion started by: powerAIX

5. SuSE

Memory utilization issue

Discussion started by: solaris_1977

6. AIX

Memory consumption issue in AIX box

Discussion started by: Rookie_newbie

7. Solaris

Locked memory issue

Discussion started by: fugitive

8. AIX

Shared memory issue

Discussion started by: sdspawankumar

9. Linux

Memory issue while diff !!!

Discussion started by: csaha

10. Windows & DOS: Issues & Discussions

Memory Issue

Discussion started by: vestro