Newsgroups: comp.parallel.mpi
From: ohnielse@oersted (Ole Holm Nielsen)
Reply-To: ohnielse@fysik.dtu.dk
Subject: Re: Can you do mpi_reduce using same send and receive buffers?
Organization: Physics Department, Techn. Univ. of Denmark
Date: 16 Nov 1995 11:05:28 GMT
Message-ID: <48f5to$mbk@news.uni-c.dk>

MB. Taylor (chmbt@zeus.bris.ac.uk) wrote:
> Is it possible to sum a number of elements on different processes
> without using a separate buffer to do so?
> Suppose one has a variable at a buffer buf1 on each process, and wants
> the sum over all processes of buf1 to end up on each process.
> What I'd like to be able to do is something like:
>    mpi_allreduce (buf1, buf1, count, datatype, mpi_sum, comm)
> In the MPI document I can't find anything saying whether you can use
> the same buffer for send and receive or not, but since none of the examples
> I've seen show this being done, I suppose that you can't.
> So instead I have to do something like:
>    mpi_allreduce (buf1, buf2, count, datatype, mpi_sum, comm)
>    buf1 = buf2
> which uses an extra buf's worth of memory.
> Does anybody know if I can in fact use the same buffer for send and receive
> in a reduction operation?  Is there any other way of doing this without 
> incurring the memory allocation penalties of the second approach?
> I'm using the EPCC implementation of MPI on a Cray T3D. 

I agree with the other responses that a receive buffer is required.
However, if buffer size is a problem, you could break the mpi_allreduce
into N independent sub-arrays, each of which will use a buffer that is
only 1/N-th of the large buffer.  If your arrays are that large,
the performance of sending messages 1/N-th the original length
will probably still be satisfactory.

There are some sophisticated algorithms for doing mpi_allreduce-like
operations, and I wouldn't know if your library implements any of them.
They are order log(N) in time.  For a couple of papers, e.g.,

     J. Bruck and C.T. Ho,
     ``Efficient Global Combine Operations in Multi-Port Message-Passing
     Systems'', Parallel Processing Letters, Vol. 3, No. 4,
     pp. 335-346, December 1993.

you may wish to contact Ching-Tien Ho <ho@almaden.ibm.com>.

Ole H. Nielsen
Department of Physics, Building 307
Technical University of Denmark, DK-2800 Lyngby, Denmark
E-mail: Ole.H.Nielsen@fysik.dtu.dk
WWW URL: http://www.fysik.dtu.dk/ohnielse.html
Telephone: (+45) 45 25 31 87
Telefax:   (+45) 45 93 23 99

