Newsgroups: comp.unix.aix,comp.parallel.pvm
Path: ukc!uknet!pipex!howland.reston.ans.net!news.ans.net!ngate!serv4n57!aix.kingston.ibm.com!wombat
From: wombat@donald.aix.kingston.ibm.com (Peter R Badovinatz)
Subject: Re: Q:vmstat, PVM
Message-ID: <1994May3.181318@aix.kingston.ibm.com>
Sender: wombat@aix.kingston.ibm.com (Peter R Badovinatz)
Date: Tue, 3 May 1994 22:13:18 GMT
References:  <2q4s20$5b4@nuscc.nus.sg>
Organization: IBM Corporation, Kingston NY
Lines: 77
Xref: ukc comp.unix.aix:38171 comp.parallel.pvm:1703

In article <2q4s20$5b4@nuscc.nus.sg>, nsrcchk@leonis.nus.sg (Heng Kek) writes:
|> Hi citizens
|> 
|> This query has to do with PVM and the 'vmstat' report on AIX3.2.3,
|> IBM RS/6000 370.
|> 
|> procs    memory             page              faults        cpu     
|> ----- ----------- ------------------------ ------------ -----------
|>  r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa 
|>  0  0  7021 13758   0   0   0   0    0   0 115   81  27  9  1 91  0
|>             ^^^^^	
|> 
|> From the man pages for the 'vmstat' cmd, the underlined value above
|> refers to the 'size of the free list'.  I assume that this means
|> 'available free memory'(?).  I find that this value dwindles from a
   ^^^^^^^^^^^^^^^^^^^^^^^ essentially, yes.

|> value of, say 22000 (immediately after a reboot) to a value of 5000
|> after running some cpu intensive jobs.  However, after the jobs end,
|> the value of the 'free list' doesn't seem to be restored to the
|> original 22000.  Why?
Some of these pages will be taken up by other processes that are still
running on the system.  System daemons, any other user's processes, etc.
Also, if pvmd is still running, it will hold some of the pages as well.
You can use the 'ps' command, with the 'u' or 'v' option to get information
about the amounts of memory used by different processes.  See the man page
or info entry for 'ps'.  The fields you can look at are "SIZE", "SZ", 
"RSS".  You should do this before starting your pvm jobs to see what is on
the machine, during the job to see how large your jobs are, and after to
see what is still holding pages.  Also, doing 'ps u' or 'ps v' will show
only your own processes, add in 'g' to see all processes on the system.

|> 
|> Also, if the value of the 'free list' is small (say 5000), the
|> execution of some PVM jobs seem to be affected in a rather 'bad'
|> way.  i.e. they get killed at random!  I'm not sure if this happens
|> for the 'ordinary' (non-PVM) jobs.
It is possible that you are running out of paging space, and that is why
AIX will kill processes!  Use the command 'lsps -a' to see how much
paging (or swap, if you prefer) space is available and what amount of 
that is being used.  You see this with few free pages, because AIX has 
to swap out pages to get free pages for your jobs to run.  Use lsps during 
your run to see if the paging space is getting filled up.  And yes, AIX 
will not pick on specifically PVM jobs to kill, but will pick from all of 
the running processes.  However, if the PVM jobs are the bulk of work on 
the system, they will get chosen more readily to be killed.

|> 
|> Has anyone else had such an experience?
|> If so, what has been done to rectify such a situation?
To fix this, you need:
 - more memory -- costs money :-(
 - more paging space -- if you have available disk space, smit will
    allow root to easily add paging space.  If the machine has no 
    disk space, buying more disks costs money :-(
 - smaller (or fewer) jobs :-(

|> Is this a bug with Aix?
No, this seems to be normal behavior.

|> 
|> Thanks in advance for any sort of help.
|> 
|> Sincerely
|> Heng Kek
|> National Supercomputing Research Centre
|> Singapore

--
These have been the opinions of:  Peter R Badovinatz, aka "Wombat"
 The wide world:  wombat@donald.aix.kingston.ibm.com
                   or wombat@austin.ibm.com
 Inside IBM: wombat@mailserv.aix.kingston.ibm.com  or  WOMBAT at KGNVMC 
 Voice:  Tieline: 695-8030   Outside: 914-385-8030   Fax: 914-383-4239
 SnailMail:  IBM Corp., Dept. 83PA/MS 658  Kingston, NY  12401, USA
and in no way reflect official opinion or policy of IBM.


