Newsgroups: comp.parallel.pvm
From: Chris Humphres <chumphre>
Subject: Re: How do you kill all those processes?
Organization: Duke University, Durham, NC, USA
Date: 21 Nov 1995 18:22:38 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <48t5de$8lm@news.duke.edu>

randy@axon.cs.byu.edu (Randy Wilson) wrote:
>  Several things can go wrong, however.
>  1. If I want to stop the whole process (such as when I find out there's
>a bug in the simulation program, or I'm out of time), I can kill the
>master, and this seems to cause all the slaves to be killed, which is
>exactly what I want.  However, the 'system' commands executed by the
>slaves are NOT killed, so I have to rlogin to every system and kill each
>process by hand.  YUCK!

2 possible solutions (1 clean, 1 messy):

1. Use PVM to spawn the simulation program on the localhost.  This ensures
that PVM knows about the program and will terminate it correctly.

2. Catch the SIGTERM signal that the PVMD sends to the slaves at shutdown.
When the signal is caught, terminate all system commands and then exit the
slave.  I'm not sure how you would terminate the system commands though.

>  2. Sometimes PVM gets left running on some hosts and not others.  When I
>try to run "pvmd hostfile &" on my local host, it gives me an error for
>any host that already has a /tmp/pvmd.279 file on it.  Again, I have to go
>in to each such host and kill the pvmd process by hand and/or delete the
>/tmp/pvm* files in order to reclaim those hosts in my virtual machine.
>

Always terminate PVM from the master (original) pvmd. If you terminate PVM from
the master pvmd, then the other "slave" pvmd's will terminate after discovering
the master does not respond.  If the /tmp/pvmd.xxxx files are not removed 
properly, write a script to rsh and remove these files from all machines 
before starting the new PVM session.

Good Luck!

Chris


