Newsgroups: comp.parallel.pvm
From: Donald Krieger <don@neuronet.pitt.edu>
Subject: Re: Hosts not joining the
Organization: University of Pittsburgh
Date: 19 Jan 1995 14:26:05 GMT
Message-ID: <3flspt$nv6@usenet.srv.cis.pitt.edu>

ismisa@wi.uni-muenster.de (Mika Saastamoinen) wrote:
>
> Hello,
> 
> I've been wondering why the hosts in my setup don't always join the 
> "PVM-machine", even when they are all up (telnetting works) and all the 
> directories pointing to pvmd and user executables are correct. It seems that 
> the machines which are not in the local LAN are the most likely to NOT to join 
> the PVM. But they don't "stay" out or join in consistently, sometimes a 
> particular machine does join in, sometimes not.

When a pvmd starts on a machine it creates a file called
/tmp/pvmd.UID .   If that file already exists when the host
is added, the pvmd startup will fail because the existence
of that file is the indicator to PVM that a pvmd is already
running on that machine.  When PVM is halted or when a pvmd
is deleted, the /tmp/pvmd.UID file is deleted.  But if the
connection is lost to the node in question or if the pvmd is
killed, e.g. kill -9 , i.e. if the pvmd dies ungracefully, the
/tmp/pvmd.UID will not be deleted and any attempt to add that
host will fail from for that user until the file is deleted.

So if you can't add a host, check for the existence of that file.
If it exists, you can delete it but it may be worthwhile to determine
why it is was not deleted by PVM.

This stuff is documented in the pvmd man page.  Of course, this
may not be the problem.  Good luck.

			Don Krieger



