Newsgroups: comp.parallel.pvm,comp.parallel
From: tony@aurora.cs.msstate.edu (Tony Skjellum)
Subject: Re: MPI_Bcast problems?
Keywords: MPI
Organization: Mississippi State University
Date: Wed, 15 Jun 1994 17:55:54 GMT
Message-ID: <CrG9t7.Got@dcs.ed.ac.uk>

nupairoj@cps.msu.edu (Natawut Nupairoj) writes:

>In article <2tjlb9INNhgv@dubhe.anu.edu.au>, sits@cs.anu.edu.au (David Sitsky)
>writes:
>[stuffes deleted]
>|> 
>|> Although a non blocking MPI_Bcast operation may solve this problem, I'm just
>|> curious why the MPI standard doesn't include a function MPI_Mcast which
>|> has the same semantics as pvm_mcast (ie non-collective call but message appears
>|> as an ordinary message to the destination processes).
>|> 
>|> This seems to me like an important function that isn't present in MPI.  Is there
>|> some reason why it wasn't included?  Are there any workarounds?

>MPI has a "group" concept which allows you to create your own communication
>domain.  Thus, to do multicast in MPI, you can just create a new group and then
>use MPI_Bcast.

>Natawut.
>nupairoj@cps.msu.edu

In order to implement a multi-cast message with superior performance to
a linear series of sends, requires that all messages be tagged with a
multicast bit in their envelope; that way, messages so marked can be
forwarded to the rest of a subtree on arrival (since a communicator has
a group, the rest of the subtree can be computed by knowing the
process, the root of the broadcast, and the communicator).  However,
that means that each message that comes in will have to be tested to be
a multicast message, so that all receives wil slow down so that
multi-cast sends can work.  P4 does it this way, for instance.  This
was seen as a negative in the MPI forum.  

Some small performance gains can be achieved if one just does a multiple
linear send from the sender, but avoids some of the overheads of context
switching and buffer setup/teardown.  This approach is probably also good
in the limit of 4-8 (max) sends, while the sub-tree approach (above)
wins thereafter.  Thus, a good implementation would want to support
a poly-algorithm, as a function of the size of the group involved.

Furthermore, what argument should one give as the source of the message
if a receive is to be matched with a broadcast/multicast.  The receiver
cannot know the "true" source, as that is a detail of the tree
algorithm or linear algeorithm, as appropriate.  If we insist that the
collective operation have good semantics, and not "back-mask" with
other collective operations, including subsequent bcast/mcast calls,
then we have to be extremely careful about its implementation, and the
use of wildcard receives, even within a group-safe communicator.  
We insisted on reliable ability to predict when messages could interfere
with one another in terms of order, and when they cannot (in MPI).  The
mcast operation is collective on the sender and point-to-point on the
recipient.  A strategy to assure that such (effectively collective)
operations don't interfere is non-obvious, except if the user manually
uses a new tag for each such mcast in a group, and manages these
tags himself/herself, and does not use these same tags for other
point-to-point messages.  [NX has a call that requires this type of
restriction.]

-Tony Skjellum

--
	.	.	.	.	.	.	.	.      .
"There is no lifeguard at the gene pool." - C. H. Baldwin
            -             -                       -
Anthony Skjellum, MSU/ERC, (601)325-8435; FAX: 325-8997; tony@cs.msstate.edu



