Newsgroups: comp.parallel
From: P.H.Welch@ukc.ac.uk
Subject: Crisis in HPC Workshop - Conclusions (Summary)
Organization: University of Kent at Canterbury, UK.
Date: 5 Oct 1995 14:22:17 GMT
Message-ID: <450pmp$rbo@usenet.srv.cis.pitt.edu>


CONCLUSIONS: Crisis in HPC Workshop (11/9/95, UCL)
__________________________________________________


Registration: 76
  Attendance: 70 (==> 6 either didn't show or showed up too late for
                  our registration desk)

__________________________________________________


Structure of workshop:

  o Aims: set of questions presented at the start

  o Presentations by:
    - current users of MPP facilities
    - architects of parallel h/w and s/w
    - tool builders for parallel applications

  o Four separate workgroups for small group discussion, loosely
    themed around:
    - parallel h/w (Chair: Dennis Parkinson, QMW)
    - parallel s/w (Chair: Chris Clack, UCL)
    - models of parallelism (Chair: Chris Wadsworth, RAL)
    - parallel applications (Chair: Chris Jones, BAe)

  o Plenary report session to hammer out conclusions.

__________________________________________________


Summary of workshop conclusions:

[Status: this summary has been drafted by the Workshop Chair (Peter
 Welch, University of Kent) and has been ratified by the speakers at the
 workshop and the workgroup chairs.  A fuller report of the proceedings
 will be published in due course - please check the URL:
 
      http://www.hensa.ac.uk/parallel/groups/selhpc/crisis/

 for further details.]

These are summarized through the answers achieved in the final plenary
session to the questions posed at the start of the workshop.

  o Are users disappointed by their performance with current HPC?

    Yes.  In some quarters this was severe enough to be causing serious
    economic embarrassment and a recommendation to think hard before
    moving into HPC (especially for non-traditional users - like
    geographers).  Some felt that this disappointment is partly the
    result from over-selling on the part of vendors and funders and
    local enthusiasts (e.g. "When the 40 Gflops (40 billion arithmetic
    operations per second), 256-processor Cray T3D is commissioned
    this Spring by the University's Computing Services it will be
    amongst the ten fastest computers in the world ...", from the
    "Supercomputer procurement - press release (3/2/1994)"), which
    led to grossly raised expectations.

  o Is there a problem with the efficiency levels obtained by users
    of MPPs for real applications?

    Yes.  See below.

  o Does efficiency matter - given that absolute performance levels
    and value-for-money are better than the previous generation (vector)
    HPC machines?

    Yes.  Supercomputers are a scarce resource (about 2-and-a-half in
    the UK now available to academics) and user queues are inevitable.
    Lower efficiency levels mean longer waiting times to get your job
    turned around - this is the real killer to users, over and above
    the actual execution time achieved.

    Comparison against efficiency levels for PCs is irrelevant.  PCs
    sit on your desk.  We can afford to over-resource them so we have
    access to them immediately (a classic real-time constraint).  If
    supercomputers were similarly available, no one would worry about
    efficiencies ... (except those wanting interactive response).
    There is one other difference.  If you have a problem that is too
    large for a workstation on your desk, you can use a supercomputer.
    If you have a problem that is too large for a supercomputer, you
    can either wait for the next generation of supercomputer or improve
    your program's efficiency.

    Efficiency also seems to be an easily measured benchmark against
    which funding agencies (EPSRC/industry) are judging projects.

    Having said this, there certainly exists a range of problems that
    can only be solved on the current MPP machines - for them, there
    is no choice but to live with the long turn-arounds and accept
    low efficiencies.

  o (If there are any "yes" answers to the above) where do the problems lie?

    Hardware architecture for MPPs.  Real blame here.  Quoting from
    David May's opening presentation: "Almost all the current parallel
    computers are based on commodity microprocessors without hardware
    assistance for communications or context switching.  With the
    resulting imbalance it is not possible to context switch during
    communications delays and efficiency is severely compromised".
    Parallel architecture had not picked up strongly enough on known
    problems (and their solutions) from the 1980s and had concentrated
    on building unattainable MFLOP/s and MBYTE/s.  Considerable
    frustration was expressed at those wasted opportunities - we should
    be doing much better than we are.  This has to change.

    Software architecture for MPPs.  Largely clobbered by the long
    latencies for communication startups and context switches.  Forced
    to rely on large-grain parallelism (for computation and remote
    fetches/deposits) that is very difficult to balance (and only
    applicable to large problem sizes).  Also suffers from the pressure
    to preserve serial paradigms for system design and implementation
    (so as to be able to exploit large existing codes).  Too much
    machine specific knowledge needed for good performance.  Mainstream
    parallel languages (e.g. HPF) or libraries (e.g. PVM/MPI) do
    little to hide this.  Some astonishment that the shared-memory libraries
    for the T3D are so much used in preference to the BSP libraries
    that are known to scale well, are as efficient -- and are portable.
    Need to look much more closely at BSP and occam (which is being
    worked on by the TDF/ANDF group and within the EPSRC PSTPA programme).
    Need to understand better and improve our understanding about
    disciplines for parallel sharing of resources (e.g. combining
    operators).  Need to appreciate particular problems for parallelism
    inherited from serial languages (e.g. C and Fortran) -- for example,
    the difficulties imposed on parallelization tools from undetectable
    aliasing of names to objects.  Need to bind concepts of parallelism
    into languages and/or design tools at a very high level of abstraction.
    Not much of this is happening.

    Users' ability to work with the technology effectively.  We should
    always strive to improve ourselves - education and training is
    vital.  But this must not be constrained to training on differing
    varieties of Fortran and/or message-passing libraries!  Efficiencies
    on the T3D seem to range from 1% through 17%, depending on the
    problem and who was solving it.  Even the best groups only achieved
    5% on some problems (e.g. conjugate gradients).  Blame not laid here.

    Note that these conclusions are the reverse of the normally received
    wisdom (which says that parallel hardware is brilliant, but the
    parallel software infrastructure to support it lags behind and
    users' abilities to exploit what is on offer are weak).  This
    workshop suggests that users have a natural affinity for the
    parallelism inherent in their applications, that sound and
    scalable models for expressing that parallelism exist, but that
    current parallel hardware lacks some crucial technical parameters
    that are necessary for the execution of those expressions at a
    worthwhile level of efficiency.

  o Can we do better?

    Current generation HPC.  Yes - but not immediately.  Users must
    be made aware of the impact of long comms-startup latencies,
    context switching, non-overlapped communications/computation
    and cache incoherency on their applications and trained how best
    to live with them.  Research/development of tools/libraries/languages
    that minimize these problems should be a priority.

    Next generation HPC.  The announcement (by President Clinton)
    of the 1.8 TFLOP/s supercomputer comprising some 9000 Intel P6s
    for late 1996 was greeted with some caution.  Nothing in the
    public announcement addressed any of the concerns that were
    being debated in this workshop.  Without further information,
    the workshop wondered what MTBF rates and what efficiency levels
    would be achieved for an application running the full system.
    It was pointed out that a sustained 1% efficiency would yield
    18 GFLOP/s for user applications ... and that this performance
    gain over most current facilities could be labeled a "success".
    It's too late to influence this next generation -- hence, same
    answer as above.

    Next-but-one generation HPC.  There may not be one.  Many MPP
    vendors have gone out of business and more may find the market
    unprofitable.  The remaining supplier(s) might exploit a monopoly
    situation so that prices would rise again to the detriment of
    end users.  The remaining supplier(s) would be derived from
    the commercially viable PC/games/embedded market place (which
    alone can afford the development budget) and may not be attuned
    to the technical needs for MPP systems and their users.

    Nevertheless, it may be possible to influence such architecture
    (hardware and software) - in particular, latency and context
    switch times *must* move in line with computational performance
    and communications bandwidth.  Machines designed from scratch
    - using the fastest commodity micro-cores for processor, memory,
    link and routing components and maintaining the correct balance
    as an overriding design constraint - would be easier to program,
    extract high efficiencies from and be closer to a general purpose
    machine.  Such machines would obtain huge leverage from well-behaved
    models for parallelism -- not least through the automatic control
    of cache coherency, without the need for hardware or software
    run-time checks and remedies.  It will be necessary to re-cast
    our application software to conform to those disciplines -- for
    some, it will be necessary to re-write them.  Failure to make
    such changes will result in HPC becoming increasingly ineffective,
    which will be serious since the need for HPC looks set to increase.
    The technical knowledge to avoid this exists in great parts and
    can be developed considerably -- if this is used, the future looks
    exciting and we can be optimistic.

  o Is there a crisis in HPC?

    On the political front, there is no immediate crisis but there
    is disappointment.  Access to HPC facilities still exists, although
    at a lower scale than many had hoped.

    At the engineering level, there is a crisis.  There has been little
    or no progress in MPP architecture over the past 5 years as
    manufacturers and their clients have pursued obvious goals
    (MFLOP/s and MBYTE/s) and not emphasized the twiddly bits (low
    startup latencies and context switches, portable and scalable
    models of parallelism, prevention of cache incoherency, ...)
    that are necessary to make them work properly.

    The result is real difficulty even for experienced users, the scale
    of whose applications give them no choice but to accept the machines
    and live with the long turn-arounds.  There is discouragement to new
    users from entering the fray, especially if they are from non-traditional
    HPC fields of application.

    Herein lies the basis for a real political crisis that may be upon us
    soon.  If the engineering problems are not resolved in the near future,
    pressure will build to close down (or, at least, not upgrade) existing
    HPC facilities that may be difficult to resist.  Such pressure is
    already being felt in the USA and this feeling is not very comfortable.

  o Actions?

    Educate/research/develop/publish/influence.

    Teach high-level models of parallelism, independent of target
    architecture.  Teach and research *good* models that scale, are
    efficient and can extract much more of the parallelism in the
    users' applications.  Priorities are: correctness, efficiency and
    scalability, portability, reliability and clarity of expression.
    Maintenance of existing vector/serial codes is not relevant for the
    long term.

    Be fundamental - don't be afraid to question the existing consensus,
    whether this be HPF, MPI, FP, CSP, BSP or whatever.  Do not set up
    a single `centre of excellence' for the provision and dissemination
    of training and education in HPC.

    Listen (and get manufacturers and funding organizations to listen)
    to *real* users.  Don't go for raw performance (e.g. 1.8 TFLOP/s).
    Demand to know what will really be sustained and publish the
    answers and the reasons behind them.  Do some real computer science
    on the performance and programmability of `grand challenge'
    machines (which may be difficult in the UK as the funding bodies
    for HPC and computer science seem to be entirely separate).

    Don't necessarily expect to provide efficient HPC solutions for
    *all* problems that need them - some badly behaved ones may need to
    wait (these need to be characterized).  Look to the embedded/consumer
    market for the base technologies of the future (e.g. video-on-demand
    servers and their supporting communications and switching) - influence
    and modify them to the special needs of HPC applications.

    Don't just accept what is on offer from today's HPC - the hardware
    may have to be accepted, but software access to it may bear
    considerable improvement.

    Don't do nothing!

    Review progress in 12 months - another workshop?  Meanwhile,
    work through suitable Internet newsgroups.  Disseminate results
    and concerns through newsgroups and archives (e.g. the SEL-HPC
    Parallel archive at http://www.hensa.ac.uk/parallel/).  Move the
    discussion beyond the UK.

__________________________________________________


For information, the timetable that ran on the day was:

  09:50  Introduction to the Day
         (Professor Peter Welch, University of Kent)
  10:00  High performance compute + interconnect is not enough
         (Professor David May, University of Bristol)
  10:40  Experiences with the Cray T3D/PowerGC/...
         (Chris Jones, British Aerospace, Warton)
  11:05  More experiences with the Cray T3D/...
         (Ian Turton, Centre for Computational Geography, University of Leeds)

  11:30  Coffee

  11:50  Experiences with the Meiko-CS2/...
         (Chris Booth, Parallel Processing Section, DRA Malvern)
  12:15  Problems of Parallelization - why the pain?
         (Dr. Steve Johnson, University of Greenwich)

  13:00  Working Lunch (provided) [Separate discussion groups]

  14:20  Language Problems and High Performance Computing
         (Nick Maclaren, University of Cambridge Computer Laboratory)
  14:50  Parallel software and parallel hardware - bridging the gap
         (Professor Peter Welch, University of Kent)

  15:30  Work sessions and Tea [Separate discussion groups]

  16:30  Plenary discussion session
  16:55  Summary

  17:00  Close

__________________________________________________

