Newsgroups: comp.parallel.mpi
From: A Gordon Smith <smith>
Subject: Re: Can you do mpi_reduce using same send and receive buffers?
Organization: Department of Computer Science, Edinburgh University
Date: Fri, 10 Nov 1995 13:35:01 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <DHtxqE.1o.0.staffin.dcs.ed.ac.uk@dcs.ed.ac.uk>

chmbt@zeus.bris.ac.uk (MB. Taylor) wrote:
>
>Is it possible to sum a number of elements on different processes
>without using a separate buffer to do so?
>
>Suppose one has a variable at a buffer buf1 on each process, and wants
>the sum over all processes of buf1 to end up on each process.
>What I'd like to be able to do is something like:
>
>   mpi_allreduce (buf1, buf1, count, datatype, mpi_sum, comm)
>
>In the MPI document I can't find anything saying whether you can use
>the same buffer for send and receive or not, but since none of the examples
>I've seen show this being done, I suppose that you can't.
>So instead I have to do something like:
>
>   mpi_allreduce (buf1, buf2, count, datatype, mpi_sum, comm)
>   buf1 = buf2
>


Hello Mark,

The MPI standard implies this from the fact that the 'sendbuf'
argument ('buf1' above) is classed as an "IN" argument, where
"the call uses but does not update an argument marked IN"
(Section 2.2 Procedure Specification). This could perhaps be
stated more clearly in the MPI standard.

It is always necessary to use a separate buffer in order to receive
the contribution of another process before combining with the local
contribution, whatever the allreduce communications pattern.
If the sendbuf is allowed to coincide with the recvbuf, MPI must
claim and use a temporary buffer in that case. Alternatively, one
might want to preserve the original contribution separately from
the reduction result; in that case, use of temporary buffering would
be redundant. So, MPI avoids always having to check for coincidence
of send and receive buffers, or potentially redundant use of temporary
buffering, by making the restriction.


>which uses an extra buf's worth of memory.

Which is required anyway. The only way to reduce the amount of temporary
memory used would be to transfer process' contributions in installments
and combine at each step (minimum: element size); this would be rather
slow.

>
>Does anybody know if I can in fact use the same buffer for send and receive
>in a reduction operation?  Is there any other way of doing this without 
>incurring the memory allocation penalties of the second approach?
>I'm using the EPCC implementation of MPI on a Cray T3D. 
>

Using CRI/EPCC MPI for T3D you will get incorrect results, due to
overwriting of the local contribution, if the send and receive buffers
are the same. I believe that some other MPI implementations check
for this condition and will generate an error.


>Thanks in advance for any helpful comments.
>
>Mark
>
>-----------------------------------------------------------------------
>| Mark Beauchamp Taylor  -  physicist trapped in a chemist's body.    | 
>| mark.taylor@bris.ac.uk    http://zeus.bris.ac.uk/~chmbt/index.html  |
>| Department of Chemistry, University of Bristol, UK                  -------
>-----------------------------------------------------| ... It's the future! |
>                                                     ------------------------

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
 -=-=- A. Gordon Smith -=- Edinburgh Parallel Computing Centre -=-=-
 =-= Email <smith@epcc.ed.ac.uk> -=- Phone {+44 (0)131 650 6712} =-=
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


