Newsgroups: comp.parallel.pvm
From: ericd@backus (Eric Dye)
Subject: Re: Unexpected problems with pvm_spawn and pvm_joingroup
Organization: Concurrent Engineering Research Center
Date: Tue, 25 Jul 1995 13:11:15 GMT
Message-ID: <DC9wMr.1HM@cerc.wvu.edu>

Gordon Hogenson (ghogenso@u.washington.edu) wrote:
: I'm having some trouble with a new installation of PVM 3.3.7.  I have
: it installed on two machines, a SUN4 and and SGI5. Specifically,
: the SGI is an IRIX 5.2 Indigo^2 and the SUN is SunOS 4.1.4. My program
: (see below) is run (started from the shell prompt) on the SUN4, 
: and spawns 1 process on the SGI5.

: The first problem is that pvm_spawn always returns 0 on the first call,
: (with -7 returned in the tids array), but on the second call it
: returns 1 as expected and the 'tid' of the spawned task is correct.

: Furthermore, the program calls pvm_joingroup().  The parent process
: gets a return value of 0, as expected.  But for some unknown reason,
: the spawned process on the SGI gets a return value of 11, whereas I
: would have expected it to be 1.  The documentation says that it
: counts upward and that the return value is the first unused id.  Why
: '11' then?  Is the return value supposed to be an arbitrary number
: or is this a bug?

: Here's the program (same program on both machines):


: #include <stdio.h>
: #include "pvm3.h"

: int tids[2];

: int main()
: { 
:   int i,j;
:   int info;
:   int mytid;
:   int me;

:   mytid = pvm_mytid();
:   
:   /* Join a group and if I am the first instance */
:   /* i.e., me = 0, spawn more copies of myself */
:   
:   me = pvm_joingroup("foo");
:   printf("me = %d mytid = %d\n", me, mytid);
:   tids[me] = mytid;
:   if (me == 0)
:     {
:       int numt = 0;
:       while (! numt)  /* keep trying until pvm_spawn succeeds */
: 	{
: 	  numt = pvm_spawn("tst2", (char**)0, 0, "", 1, &tids[1]);
: 	  printf("pvm_spawn returned %d\n", numt);
: 	  if (numt == 0)
: 	    {
: 	      printf("tids array tids[1] is %d\n", tids[1]);
: 	    }
: 	}
:     }

:   printf("tids[0] = %d", tids[0]);
:   printf("; tids[1] = %d\n", tids[1]);

:   /* Wait for everyone to start up before proceeding */
:   
:   printf("Waiting for everyone to start up.\n");
:   info = pvm_barrier("foo", 2);
:   printf("pvm_barrier returned %d\n", info);
:   /*-----------------------------------------------------------*/

:   return 0;
: }

: The local output is:

: me = 0 mytid = 262204
: pvm_spawn returned 0
: tids array tids[1] is -7
: pvm_spawn returned 1
: tids[0] = 262204; tids[1] = 524316
: Waiting for everyone to start up.
: pvm_barrier returned 0

: The '-7' problem (the error code translates as 'Specified executable 
: cannot be found') is unexplained.
: Calling pvm_spawn again (as in the code) always solves this problem.

: The remote output however shows that pvm_group is returning 11, not 1,
: the next consecutive number available.

: [t80040000] [t8001b] me = 11 mytid = 524315
: [t80040000] [t8001b] tids[0] = 0; tids[1] = 0
: [t80040000] [t8001b] Waiting for everyone to start up.
: [t80040000] [t8001b] pvm_barrier returned 0

: The above was found on the local machine in the /tmp/pvml.XXX file.

: Expected output:
: [t80040000] [t8001b] me = 1 mytid = XXXXXXX

: Any suggestions?  Other programs such as spmd.c provided with PVM,
: work fine.  "hello"/"hello other" work but usually not on the first
: try.  I.e., invoking "hello" once fails, twice it works.

: Other possibly useful information:

: % pvm
: pvmd already running.
: pvm> conf
: 2 hosts, 1 data format
:                     HOST     DTID     ARCH   SPEED
:                      t13    40000     SUN4    1000
:                 t13graph    80000     SGI5    1000
: pvm> 

: Gordon.
: -- 
: ---------------------------------------------------------------
: Gordon J. Hogenson                       work: (505) 667-9471
: ghogenso@u.washington.edu                home: (505) 661-6753
: ---------------------------------------------------------------


I am fairly new to PVM but I think I am having problems similar to 
Gordon.  I am running PVM on a SUN4 and SUNMP.  When running the
example programs several of them worked every other time.  I also
tried a quicksort.c program which I found on one of the PVM pages.
It also works every other time.  What I have found is that if
I only use the SUNMP as my virtual machine then the programs work
fine.  If the SUN4 is the only machine then they never work (give
wrong answers on some programs and sometimes lock-up on others).  In most 
cases I use the same code for both architectures.  How can I find out 
what's going wrong when running on the SUN4?  Also I am having trouble
printing inside a spawned task, what is the easiest or best way to
do this?  Thanks.

Eric Dye
Morgantown, WV
ericd@cs.wvu.edu

