Newsgroups: comp.parallel.pvm
From: jcargill@grilled.cs.wisc.edu (Jon Cargille)
Subject: Problem with PVM-3.3.7 and Solaris 2.4
Summary: pvm processes hanging around as zombies
Organization: U of Wisconsin CS Dept
Date: 14 Apr 1995 20:54:28 GMT
Message-ID: <3mmne4$ngd@spool.cs.wisc.edu>



I'm having a bit of trouble running PVM-3.3.7 on a Sparc
multiprocessor (2 CPU) machine running Solaris 2.4.

The problem is that the PVM daemon never seems to reap children that
have completed.  For example, when a pvm process has run to
completion, it is still visible via the PVM console "ps" command; in
fact, it's still hanging around as a zombie process.

Has anyone else seen this behavior?  Any idea what I can do to fix it?

Thanks,

Jon

Example follows:
---------------
> 
> pvm
pvm> 
pvm> conf
1 host, 1 data format
                    HOST     DTID     ARCH   SPEED
       strip.cs.wisc.edu    40000    SUNMP    1000
pvm> 
pvm> ps
                    HOST      TID   FLAG 0x COMMAND
pvm> 
pvm> spawn -> hello
[1]
1 successful
t40003
pvm> [1:t40003] i'm t40003
[1:t40003] from t40005: hello, world from strip.cs.wisc.edu
[1:t40005] EOF
[1:t40003] EOF
[1] finished

pvm> ps
                    HOST      TID   FLAG 0x COMMAND
       strip.cs.wisc.edu    40003  16/o,c,f hello       
       strip.cs.wisc.edu    40005  16/o,c,f hello_other 
pvm>        
pvm> spawn -> hello
[2]
1 successful
t40007
pvm> [2:t40007] i'm t40007
[2:t40007] from t40009: hello, world from strip.cs.wisc.edu
[2:t40009] EOF
[2:t40007] EOF
[2] finished

pvm> 
pvm> ps
                    HOST      TID   FLAG 0x COMMAND
       strip.cs.wisc.edu    40003  16/o,c,f hello       
       strip.cs.wisc.edu    40005  16/o,c,f hello_other 
       strip.cs.wisc.edu    40007  16/o,c,f hello       
       strip.cs.wisc.edu    40009  16/o,c,f hello_other 
pvm> 
pvm> quit
pvmd still running.
> 
> ps
   PID TT       S  TIME COMMAND
  5191          Z  0:00 
  5192          Z  0:00 
  5193          Z  0:00 
  5194          Z  0:00 
  5195          Z  0:00 
  5140 pts/4    S  0:00 -tcsh
  5196 pts/4    O  0:00 ps
> 
> ps -l
 F   UID   PID  PPID CP PRI NI   SZ  RSS    WCHAN S TT        TIME COMMAND
 8  1984  5191  5190  0   0       0    0                   Z  0:00 
 8  1984  5192  5190  0   0       0    0                   Z  0:00 
 8  1984  5193  5190  0   0       0    0                   Z  0:00 
 8  1984  5194  5190  0   0       0    0                   Z  0:00 
 8  1984  5195  5190  0   0       0    0                   Z  0:00 
 8  1984  5140  5139 78  48 20 1084  876   Q_LIMS S pts/4     0:00 -tcsh
 8  1984  5198  5140 12  28 20  916  776          O pts/4     0:00 ps -l
> 



Any idea why those are hanging around?

