Newsgroups: comp.parallel.mpi
From: sanders@ira.uka.de (Peter Sanders)
Subject: MPI-Benchmarking
Organization: Universitaet Karlsruhe, Germany
Date: 27 Oct 1996 14:57:25 +0100
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit
Message-ID: <t1rk9sc4iii.fsf@i90s25.ira.uka.de>

Hello,

we are about to start a mini-project on MPI-benchmarking.  In order to
make the results as widely useful as possible, we would like to ask for
some feedback regarding our approach.

What we want to do: 
- Make timing measurements for "important" MPI-functions including
  point-to-point, collective,... and using "typical" communication
  patterns. The resulting suite is intended to have the following
  properties:
  * yield accurate, reliable timings
  * be portable on a wide range of machines, (C)-compilers  and
    MPI-implementations
  * easy to expand (e.g., for MPI-2)
- prepare and evaluate the data in tabular, graphical and
  mathematical form (by fitting curves)
- collect data for different machines and implementations 
- compare machines and implementations
- make the results and the code available on the Internet


Why we think this is useful:
- Programmers would like to be able to estimate the consequences of
  implementations decisions before they actually run the code, because
  * exploring a blind alley is expensive;
  * not all the targeted machines may be available during development;
  * development runs using production-size inputs and full
    machine-configurations are often difficult or expensive.
- Profiling is useful, but limited for the above reasons.
- A publicly available benchmark might be an additional incentive for
  implementors and hardware vendors to optimize for all important functions.
- We are aware of a number of evaluations of communication
  performance. But so far, few have used MPI, the number of
  measured functions is relatively small and the
  measurement code is usually not accessible.
- Application oriented benchmarks like LINPACK or NAS are certainly
  necessary but not directly useful for programmers.

What we (currently) do not want to do:
- Inhomogeneous systems
- Application benchmarking (except perhaps for very simple kernels)
- We will not fully cover all functions. In particular, 
  intercommunicators, groups (except those implied by MPI_Comm_split),
  attributes, error-handling, `homemade' topologies, Pack/Unpack
- Collapse the entire results into a single "performance index"
  because this might be misleading.

======================================================================
What we would like to know from you:
- What measurements would be useful for you?
  Can you explain why? For which applications are they useful?
- Can you give us any pointers to previous work?  (We have done a WWW
  search and a bibliography search. In case there are other people with similar
  aims, we would be interesting in coordinating or pooling our resources
- Who would be willing to run our benchmark?
- What are "important" MPI-functions
- What are "typical" communication patterns
- Are there any clever tricks for avoiding mismeasurements due to
  operating system intervention, cache effects,...(beyond taking
  averages/medians etc.)
- Can you warn us of any portability or accuracy pitfalls before we
  fall into them?
- How should latency hiding capabilities be measured?
- What usage of noncontiguous data types will best differentiate
  between optimized and nonoptimized implementations?
- How important are measurements of various operations on
  subnetworks? (e.g. reduce on rows of virtual meshes) Which
  combinations should we try?
- Can you warn us of any legal problems?
  (E.g. vendors who do not wish to see certain results published?)
- Now, speak out, what is the fundamental flaw in our ideas? ;-)

Regards,

Lutz Prechelt, Ralf Reussner, Peter Sanders
University of Karlsruhe
Department of Computer Science

P.S. We will summarize.

