Newsgroups: comp.parallel.mpi
From: gdburns@osc.edu (Greg Burns)
Subject: system buffers (was: Re: Problem using LAM5.2)
Organization: Ohio Supercomputer Center
Date: 4 Dec 1995 14:44:07 -0500
Message-ID: <49vj27$191@tbag.osc.edu>

In article <MARR.95Dec4174651@jura.dcs.ed.ac.uk> "Marcus Marr" <marr@dcs.ed.ac.uk> writes:
>
>> Hi, Currently, i have a piece of MPI code and it runs
>> successfully using MPICH.  However, with the same piece of
>> code, it hangs half way through the execution when using
>> LAM5.2.  May i know what could be the possible reasons?  I'm
>> running the processes on a SP2.  Thanx in advance!
>
>One possibility is that you are running out of buffer space.  Other
>implementations of MPI will allow a communication to succeed without
>using the buffer as long as the matching receive has already been
>posted, but this is not the case with LAM-MPI.
>
>The most common cause for this sort of behaviour is if one process is
>flooding another with messages (e.g. as in a pipeline) and hoping that
>an MPI_Send will block until either buffer space becomes available
>*or* a matching MPI_Recv has been posted.  Doubling the available
>buffer space will not normally help things here.  If this is the case,
>the simplest solution is to use 'synchronous' sends using MPI_Ssend().
>
>Good luck,
>
>Marcus

An implementation can handle system buffers and system resources in
just about any way it wants.  A portable implementation may actually
do something different for each port.  We have studied this topic in
great detail.  If you are interested a justification, see our paper,
"Robust MPI Message Delivery Through Guaranteed Resources".  In it,
we propose a standard measurement called GER to address this portability
issue.

For performance reasons, LAM 5.2 does not protect process-pair buffer
utilization.  For system-friendliness reasons, LAM 5.2 has a fairly
small (2Mb) default buffer size _per node_, but this can be arbitrarily
increased.  LAM 6.0, by default, guarantees that it can deliver a
certain (advertised) number of messages for each process-pair for
which the source has not blocked and the destination has not received.
Beyond this limit, non-blocking sends will cause an error.  A loop
of unreceived, non-blocking sends must eventually cause an error on
any implementation.  The purpose of GER is just to tell you _when_ it
will fail.

-=-
Greg Burns				gdburns@tbag.osc.edu
Ohio Supercomputer Center		http://www.osc.edu/lam.html

