Newsgroups: comp.parallel.mpi
From: cameron@epcc.ed.ac.uk (Kenneth Cameron)
Subject: Re: MPI on T3D
Keywords: t3d,mpi,psc
Organization: Edinburgh Parallel Computing Centre
Date: Wed, 12 Oct 1994 13:14:47 GMT
Message-ID: <CxKA4n.L4D@dcs.ed.ac.uk>

In article <37buv0$5si@lll-winken.llnl.gov>, (Rob Neely) writes:
> > We at Edinburgh Parallel Computing Centre are currently working on a
> > native implementation of the full MPI interface for the Cray T3D in 
> 
> Any early indications on performance?  Do you expect to be able to
> obtain better performance than the PVM layer CRI provided?  Are you able
> to take advantage of the block transfer (SHMEM) stuff, either in point-
> to-point or group comm?

Both the point to point and collective make use of the shared memory library
(SHMEM), this is also the layer that PVM is built on. PVM uses get operations
to do data transfer. We've tried to favour put (which has twice the bandwidth)
where we can. The MPI sync comms modes allow us to avoid some copying which
also helps. The end result will depend on the kinds of comms that users try
to do. The collective library uses a different protocol to the point to point
which allows it to cut down the latency quite a bit over building on point
to point. If you want some numbers our group head (Lyndon Clarke) is
presenting a paper at the Cray Users Group meeting in Tours, France this week
which has some graphs and tables of early results for both point to point
and collective which are `encouraging'. Check out the proceedings.

> Although I realize that the SHMEM layer is inherently faster, this 
> discrepancy (especially in latencies) seemed a little extreme.  Others
> that I talked to seemed to think that the MPI port *should* be able to
> do much better than the PVM layer.

shmem_put has a bandwidth ~120 MBytes/s. shmem_get is about half that,
~60 MBytes/s. Add in a local copy (we've measured it at ~90 MBytes/s).
and your ~30 MBytes/s is not all that surprising. As for latency both
the PVM and MPI implementations use protocol queues to coordinate
transfer between PEs. We've measured our protocol queue latency 
as ~8 uS to transfer a protocol message. PVM uses larger messages
and may be a little slower. If you need two-three protocol messages
to do a send/recv, plus matching over head etc you can reach ~80 uS
pretty fast. (Our MPI collective calls don't use the protocol queue
but dedicated slots to cut down on latency and avoid queue contention).

-- 
e||) Kenneth Cameron (kenneth@ed)     Edinburgh Parallel Computing Centre e||)
c||c Applications Scientist, KTG.       University of Edinburgh, Scotland c||c
"Do not write obscure code. When you ignore this rule, try to make clear ... "
                                           - From a coding standards document.

