Newsgroups: comp.parallel.mpi
From: christal@imag.fr (Michel Christaller)
Subject: Re: Ordering of MPI messages in multi-threaded programs
Organization: IMAG
Date: 26 Aug 1996 12:13:09 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 8bit
Message-ID: <4vs4cl$a6o@imag.imag.fr>

Well, just to add some comments to this "thread":

There are some kinds of MPI + threads implementations:
prototype MPI-F of IBM,
MPI-CH on top on Nexus of ANL,
Athapascan-0b (more exactly Athapascan-Kernel) of my lab, IMAG.

As I understand them, they all schedule a new thread when one block
on a communication. This is done by simulating locally-blocking operations 
(like MPI_Send) by immediate, non-blocking ones. So they do not introduce
hangs compared to mono-threaded MPI-1.

About the utility of threads for parallelism, one can argue a lot 
of advantages:

-manage more parallelism than physically available in the machine,
 (eg. split a problem in 100 or 1000 pieces on 10 processors)

-keep the application parallelism independent of the machine's
 (eg. always split the problem in 1000 pieces, having either 1 or 
 10 or 100 processors available)

-run efficiently on SMP machines (eg. multi-processors with shared memory,
 or clusters of such machines),

-increased reaction time. A new thread can take up a new computation
 to explore, something fairly common with irregular problems,

-explore concurrently various solutions. Threads are useful for super-
 linear, logical-like computations, due to time-sharing of computing each
 one,

-ease of expression: many parallel applications include a lot of (almost) 
 independent computations that can be better expressed by sequential threads
 than by an automaton managing communications as transitions to computations,

-avoid deadlocks, as for example the ones that can arrive in MPI-1, by letting
 other thread proceed and unblock the flow of computations,

-mask communications delays by useful computations,

-avoid using many heavy-weight processes, with heavy-weight IPC, to express
 the application, and so, allow low-cost shared memory communications locally.

Note that the masking of communication delays, which was sought as the major
advantage of threads, seems now deprecated (one can do a more efficient
run by carefully managing asychronous (immediate) communications and only 
one thread of computation). The utility of threads for MPI seems now to be 
in the support of irregular computations (eg. which one can't forsee their
behaviour, so there is no useful schedule that can be applied).

The drawbacks of multithreading lie in their somehow non-standard aspects
(every computer has a different threads kernel, even with POSIX normalisation,
and MPI+Threads projects are not mature right now), in their cost (multithreaded
communications have, to my knowledge, a higher latency than mono-threaded ones),
and in the inherent concurrency they introduce, which is difficult to work with
for an application engineer, which is probably not ready to spent a lot of time 
in computer-science related problems.

Folks interested can find pointers to related topics in a web page I manage:
http://huron.imag.fr/threadedcomm/threadedcomm.html

comments are welcomed.


-----------------------------------------------------------------------
Michel Christaller                   Campus Universitaire
                                     LMC, Institut Fourier, BP 53X
Michel.Christaller@imag.fr           100 Rue des Mathematiques
Tel: (33) 76 51 46 31                38041 Grenoble CEDEX 9, FRANCE
Fax: (33) 76 63 12 63
#######################################################################
TRY LINUX ! A FULL UNIX SYSTEM FOR FREE ! 
thanks to all those who contributed
#######################################################################


