Newsgroups: comp.parallel.mpi
From: gdburns@osc.edu (Greg Burns)
Subject: Re: Porting MPI to an experimental machine
Organization: Ohio Supercomputer Center
Date: 20 Dec 1995 20:14:16 -0500
Message-ID: <4bacd8$kkm@tbag.osc.edu>

In article <4b2u25$fme@metro.ucc.su.OZ.AU> igor@sedal.su.oz.au (Igor Milosavlevich) writes:
>We are building an experimental parallel machine based on TMS320C40
>processors.

The C40 is a transputer-like DSP chip with six links, for those
who don't know.

>I was wandering how difficult it might be to port an
>existing MPI library such as MPICH onto such a machine.  The system is
>pretty simple, it does not have an operating system, just a library of
>subroutines.

Do you think that MPI is just a library of subroutines that you would
somehow layer on your set of primitive subroutines?  There is this
section of the standard called non-blocking communication whose
important goal is to allow communication and computation to overlap - a
tried and true optimization in parallel processing.  However, the
semantics of the MPI_I* functions along with some general properties of
MPI pt-2-pt communcation exceed this goal, IMO, and strongly demand
some kind of concurrent behaviour wrt the application process.

This bit of concurrency changes the nature of implementing the MPI
"library" to something more like implementing the MPI "system".  Still,
MPI-1.1 can be done, for practical purposes, in a complicated library
with no other help.  MPI-2 will add much more functionality with
concurrency implications that are less practical to do in a library and
will need extra hardware or (in your case) interrupts to properly fake
it.  Interrupts + context switching is my characterization of the guts
of an operating system.  You always need some morsel of context
switching when you take an interrupt, a lot more if you want to switch
to other user code.  I'm not sure if any of the MPI-2 proposals include
switching user codes.

One technique used by MPICH to implement concurrency is an interval
timer interrupt.  In this case you would need to implement that on your
C40 boards (can't remember if the C40 has an on-chip timer).  I'm not
sure if this is the portable method or it is underneath the ADI (Abstract
Device Interface).  The authors can advise you on the alternatives and
how to proceed.

LAM 6.0 has an RPI (Request Progression Interface) for portability of
MPI communication, but it uses a daemon in the background for process
management and debugging - definitely requiring a typical host
operating system.  This does help to make the RPI quite narrow.  While
LAM is not applicable to your bare iron machine, I am very interested
in how MPI and future extensions impacts implementors on your style of
machine.

-=-
Greg Burns				gdburns@tbag.osc.edu
Ohio Supercomputer Center		http://www.osc.edu/lam.html

