Newsgroups: comp.parallel.pvm
Path: ukc!uknet!EU.net!howland.reston.ans.net!agate!ihnp4.ucsd.edu!network.ucsd.edu!sdcrsi!equalizer!timbuk.cray.com!driftwood.cray.com!par
From: par@poplar511.cray.com (Peter Rigsbee)
Subject: Re: Code without Spawn and Default group for CM-5
Message-ID: <1994May3.090108.19698@driftwood.cray.com>
Originator: par@poplar511
Lines: 75
Nntp-Posting-Host: poplar511
Organization: Cray Research, Inc.
References: <9405021623.AA29083@ea.msc.edu> <2q3pno$d3p@infomeister.osc.edu>
Date: 3 May 94 09:01:08 CDT


In article <2q3pno$d3p@infomeister.osc.edu>, vaigl-j@osc.edu (James Vaigl) writes:
> In article <9405021623.AA29083@ea.msc.edu>
> saroff@msc.edu (Stephen Saroff) writes:
> 
> >On the T3d, there is a notion of a default group.  (PVMALL), does such a
> >notion exist for the CM-5 and the intel?
> >
> >I am asking because Iwould like to right a purely spmd code (no dummy
> >header) on the CM-5, which include a barrier.
> >
> >To do this, I was (roughly) thinking of doing the following
> >	call pvmfmytid()
> >	call pvmfgsize(PVMALL ,isize )
> >	call pvmfbarrier(PVMALL,isize,status)
> >	...
> >If there is no notion of default group, I could do the following
> >	call pvmfmytid()
> >	call pvmfjoingroup("default",myinst)
> >
> >And then go on.  Which is right?
> 
> I can't answer your question directly, since I haven't used PVM on
> CM-5 or Intel, but there's another issue here which deserves
> attention, too.  You need more synchronization than the above at the
> start of your program.  Consider the scenario where there are two pvm
> processes started at (nearly) the same time.  The first one gets the
> size of the group -- one, since it's the first to join, then calls
> barrier.  It immediately falls through.  Then the second guy joins the
> group and gets the size, now two.  He calls barrier and blocks,
> waiting for one other process to start.  This is clearly not what you
> wanted.
> 
> You need to know a-priori how many processes will be in the group,
[...]

What Stephen didn't point out was that the T3D pvmfgsize call returns this
number (that is, the number of processors being used).  This same information
can be obtained through compiler intrinsics on the T3D, and through various
mechanisms on other MPP systems.  So needing to know how many processes are
in the group isn't a big deal, when PVM is used in a static environment
like this.

By using that number in the pvmfbarrier call, the problem you describe does
not arise.  The barrier will wait until that number of tasks arrives at
the barrier.

We added the "PVMALL" default group to the T3D version of PVM just so
users wouldn't have to go through this setup to define this very common
and important group.  This makes sense in the more static MPP world, but 
not in the more dynamic, network world.  I've talked about this with the 
PVM developers in Tennessee, but they have been reluctant to add it 
because it doesn't make much sense in a dynamic setting.

> On the CM-5, where you're forced to run one identical process on every
> node, then the 'default' group, if it exists, might be useful if it
> just has the number of nodes on your machine, but this code won't be
> portable.

I'd suggest that most programs that would find such a concept useful are 
by definition going to be ported between MPP or SPP systems where the 
number of nodes is going to be fixed at process startup.  Such programs
will not expect nodes (i.e., tasks) to dynamically enter and exit the 
virtual machine.  So for such programs, this concept is portable.

A program that uses pvm_spawn repeatedly to create new processes isn't 
going to find this concept useful, but such programs won't run well on
most MPP systems, either.

I'd like to see PVMALL (or something comparable) added to the standard
PVM for their MPP implementations, at least, but I also recognize that it
doesn't extend well outside of these static situations.

	- Peter Rigsbee
	  par@cray.com

