Newsgroups: comp.parallel.mpi
From: keith@earth.ox.ac.uk (Keith Refson)
Subject: Problem with nonblocking send/recv
Organization: Dept of Earth Sciences, Oxford University, UK.
Date: Wed, 13 Mar 1996 11:07:23 GMT
Message-ID: <1996Mar13.110723.6144@rahman.earth.ox.ac.uk>

I'd appreciate it if you MPI gurus would take a look at this code and
tell me if I have made some silly mistake.  I am experimenting with a
function, pretty much equivalent to Allgather, it to collect slices of
data from all p processors, concatenate them and distribute the result
to everything.   I have an apparently working implementation using
single-ended push semantics (using the BSP and Cray SHMEM libraries),
and was hoping to try out an MPI implementation too.  The one twist in
the tail is that the data is contained in a 2d array divided along
the final (C order) dimension between processes.  I am using a
vector type to handle this with an upper bound to allow interleaving
of the data.

The trouble is that is seems to work and give correct results most
of the time.  But on the T3D it crashes for one particular test
in a non-deterministic fashion.  This varies from the 57 th to 500th
call of the function.  The crash is a segmentation fault on a random
process deep within one of the MPI library calls - usually in
"malloc_brk".  I can't tell if the calls prior to the crash got the
correct result although the occasional runs which did complete
(of1200 calls) did get correct results.

Have I made some wrong assumption?  The send and receive areas are
disjoint with one another although the blocks making them up are
interleaved.  Is this correct?

I am also a little unsure of the correct way to use derived types.
Is it necessary to free on each call? Will I run out of resources
if I don't or simply by calling MPI_Type.... etc repeatedly.

Here's the code:  Ithread is set to the rank and Nthreads to the
size outside this function.  The values for the fauling run are
n=18, stride=576 and nblk alternates between 8 and 9.  "real" is a
typedef to "double" and dalloc() is a checking wrapper around malloc,
and memcp is just macro calling memcpy while casting its arguments.


void
par_collect_all(send, recv, n, stride, nblk)
real    *send, *recv;
int     n, nblk, stride;
{
   int  i, right, left, iblk, ibeg, ibegr;
   int  blens[2];
   MPI_Datatype vtype, block, types[2];
   MPI_Aint displs[2];
   MPI_Request req[6], *reqp;
   MPI_Status status[6];
   static real *recvbuf = 0;
   static int  recvsize = 0;
   static int icall = 0;
   icall++;
   if(nblk*stride > recvsize)
   {
      if( recvbuf )
         free(recvbuf);
      recvbuf = dalloc(nblk*stride);
      recvsize = nblk*stride;
   }

   /*
    * Use the defined datatypes of MPI to collect the whole array from
    * distributed slices across processors.  The "vtype" vector defines
    * the actual data, "nblk" blocks of "n" elements with a stride of "stride".
    */
   MPI_Type_vector(nblk, n, stride, M_REAL, &vtype);
   blens[0]  = 1;     blens[1]  = 1;
   types[0]  = vtype; types[1]  = MPI_UB;
   displs[0] = 0;     displs[1] = n*sizeof(real);
   MPI_Type_struct(2, blens, displs, types, &block);
   MPI_Type_commit(&block);
   
   for(iblk = 0; iblk < nblk; iblk++)
      memcp(recvbuf+ithread*n+iblk*stride, send+iblk*stride, n*sizeof(real));

   for (i=1; i<nthreads; i*=2) 
   {
      left  = (nthreads + ithread - i) % nthreads;
      right = (ithread + i) % nthreads;
      ibeg = ithread + 1 - i;
      ibegr = left + 1 - i;
      reqp = req;

      if( ibegr >= 0 )
         MPI_Irecv(recvbuf+ibegr*n,i,block, left, 101, MPI_COMM_WORLD, reqp++);
      else
      {
         MPI_Irecv(recvbuf+(ibegr+nthreads)*n, -ibegr, block, left, 102, 
                   MPI_COMM_WORLD, reqp++);
         MPI_Irecv(recvbuf, left+1, block, left, 103, MPI_COMM_WORLD, reqp++);
      }
      if( ibeg >= 0 )
         MPI_Isend(recvbuf+ibeg*n,i,block, right, 101, MPI_COMM_WORLD, reqp++);
      else
      {
         MPI_Isend(recvbuf+(ibeg+nthreads)*n, -ibeg, block, right, 102, 
                   MPI_COMM_WORLD, reqp++);
         MPI_Isend(recvbuf, ithread+1, block, right, 103, 
                   MPI_COMM_WORLD, reqp++);
      }
      MPI_Waitall(reqp-req, req, status);
   }
   memcp(recv, recvbuf, nblk*stride*sizeof(real));
   MPI_Type_free(&vtype);
   MPI_Type_free(&block);   
}


Thanks for any responses

sincerely

Keith Refson
-- 
------------------------------------------------------------------------------
| Email   : keith@earth.ox.ac.uk    | Dr Keith Refson, Dept of Earth Sciences|
| TEL(FAX): +44 1865 272026 (272072)| Parks Road, Oxford OX1 3PR, UK         |
------------------------------------------------------------------------------

