Newsgroups: comp.parallel.mpi
From: lusk@donner.mcs.anl.gov (Rusty Lusk)
Subject: Re: T3D MPI Send/Wait (was Re: PVM send buffers)
Organization: Argonne National Laboratory
Date: 8 Apr 1996 16:12:22 GMT
Message-ID: <4kbdt6$hcn@milo.mcs.anl.gov>

In article <4k18cq$ie7@murrow.corp.sgi.com>, salo@mrjones.engr.sgi.com (Eric Salo) writes:
|> > The most straightforward conversion from csend/crecv for a T3d would be
|> > to MPI_Send/MPI_Recv.  That is a match for the semantics.  You should also
|> > expect performance reasonably close to shmem_put, at least with respect to
|> > bandwidth.  You might expect higher latency since you are asking for more
|> > in the way of semantics.  - Rusty Lusk
|> 
|> Rusty, I'm not sure that I agree. Doesn't csend always return even if the
|> matching receive has not yet been posted? We've been bitten many times by
|> NX ports that used MPI_Send as a drop-in replacement for csend because we
|> don't buffer most messages by default. MPI_Bsend is a better match for
|> the semantics, I think.

I am glad you brought this up.  (I was sure someone would.)  I deliberately
said that MPI_Send/MPI_Recv was the analog of Intel's csend/crecv because
neither one of them *guarantees* the amount of buffering that is provided by
the system.  csend, like MPI_Send, returns when the buffer can be reused,
and whether this requires it to wait until the matching receive is posted
or not depends on how big the message is and how much buffering is provided by
the system.

Now it is true that Intel provided lots of buffering in NX, so users got used
to the fact that csend usually returned even before the receive was posted.
But there is always *some* size message for which the program

        process 0        process 1
        ---------        ---------

       send to 1        send to 0
       recv from 1      recv from 0

will hang, because there is no place to put the messages so that the sends
can return.  What that size is is implementation-dependent in MPI, as it
is in NX, although the standard implementation (the one from Intel) provided
lots of buffering.

MPI has 

a) MPI_Send to match the semantics of most other message-passing systems
   (implementation-specific amount of buffering),
b) MPI_Isend to allow the program to proceed while the buffer is in use
   (solves above example problem), and 
c) MPI_Bsend to give the user the ability to supply the buffer out of user
   space.

Rusty Lusk


