Newsgroups: comp.parallel.pvm
Path: ukc!uknet!pipex!howland.reston.ans.net!europa.eng.gtefsd.com!MathWorks.Com!news.kei.com!news.byu.edu!cwis.isu.edu!u.cc.utah.edu!math.utah.edu!cosmic.physics.utah.edu!thomas
From: thomas@cosmic.physics.utah.edu (Stanton Thomas)
Subject: PVM on large numbers of machines
Sender: news@math.utah.edu
Date: Wed, 16 Mar 1994 00:49:19 GMT
Message-ID: <1994Mar16.004919.22249@math.utah.edu>
Keywords: PVM, parallel, RS6K crashes
Organization: Department of Mathematics, University of Utah
Lines: 20

Some months ago I attempted to run a PVM application on over 80 machines.
The configuration consisted of approximately 40 IBMs (RS6K) machines, several
Decstations, SGIs, etc.  The application crashed due to some internal PVM
software bug which only shows itself for large (>60 ???) numbers of machines.
The crash was soft on the Decstations and other machines but on the IBMs the
crash was very hard locking out TCP/IP access to the machines and even direct
keyboard access was frozen.  All forty machines had to be manually rebooted using
the key on each machine.  Needless to say the incident did not make any friends 
for myself or for PVM.  I have since limited my configurations to 40 or less hosts.
I sent the various logs etc. to the PVM authors and carried on an electronic
disscussion for a couple of days but I have not heard anything since regarding the
status of this bug.  Can anyone tell me if this bug has been eliminated in the
most current release/patch levels of PVM.  I would very much like to extend my
host configuration if I can be certain that I will no be the cause of another
system wide hard crash.


				Stan Thomas



