Newsgroups: comp.parallel.pvm
From: Graham Edward Fagg <sssfagg@reading.ac.uk>
Subject: xpvm pvmd problem
Organization: University of Reading, U.K.
Date: Thu, 8 Jun 1995 19:56:26 +0100
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Message-ID: <Pine.SOL.3.91.950608194706.24209A-100000@suma3.reading.ac.uk>

Dear All,
 I have recently had real trouble tracing some of my programs using XPVM.
The reason appears that the code generates large numbers of messages and 
they  cannot be displayed fast enought by xpvm and so the pvmd on its 
machine has to store these pending messages. The daemon then runs out of 
memory and crashes taking xpvm and maybe the application with it..
As shown below:
suma2% ~ whom ; pstat -s ; mpp
  7:50pm  up 51 days,  1:16,  4 users,  load average: 3.70, 3.45, 3.22
   2 sssfagg
   1 suqstmbl
   1 operator
550232k allocated + 5976k reserved = 556208k used, 38868k available
sssfagg  16041 62.6 77.849365694276 ?  R    18:51  19:25 
/home/sufs1/ru11/ss/sssfagg/pvm3/lib/SUN4/pvmd3 -s -d0 -nsuma2 1 
86e1b832:0d2e 4096 10 86e1200c:0000
sssfagg  16064 55.5  3.024700 3676 p3 R    18:53  24:33 xpvm
sssfagg  16289 11.0  0.2   84  288 ?  S    19:22   4:58 pvm3/bin/SUN4/pcss
sssfagg  16572  0.0  0.2   32  196 p4 S    19:50   0:00 grep pvm
suma2% ~ 

As can be seen the daemon is taking 78% of the memory... 
Note this is note a slow machine, its a Sun 4/690MP with little loading 
other than myself and pvm. I'm also only tracing mcast,send and recv!

Would setting up the trace masks and dumping straight to a trace file via 
the console work better? or do I just have to wait until the next version 
of xpvm? 

Thanks for any idea,

Graham.
===============================================================================
Graham Edward Fagg ||| *** Cluster Computing Lab. ***  ||| e-mail me some time
 Computer Science ||  Software Engineering Subject Grp.  ||  G.E.Fagg@rdg.ac.uk
01734-875123 7626 | http://www.cs.reading.ac.uk/people/gef/ | PVM/MPI/LINDA/VMD
===============================================================================


