Newsgroups: comp.parallel
From: Becker@informatik.uni-stuttgart.de (Wolfgang Becker)
Subject: Questionnaire European Computing on Broadband Net
Organization: IPVR, University of Stuttgart, Germany
Date: Wed, 12 Apr 1995 07:25:04 GMT
Message-ID: <3mje2b$ss3@usenet.srv.cis.pitt.edu>

                                 222
  EEEEE          M     M   CCCC     2             ***  Questionnaire  ***
  E       ====   MMM MMM  C       2
  EEEEE          M MMM M  C      2222             European Meta Computing
  E       ====   M  M  M  C                       Utilizing Integrated
  EEEEE          M     M   CCCC                   Broadband  Communications


This questionnaire is important feedback for the E=mc2 project.
Please mark the boxes you agree with [+] or disagree [-] and add comments
anywhere. Return it soon to (email preffered):

   Wolfgang Becker, Dr. Walter Strommer                         E=mc2 project
   Institute of Parallel and Distributed High-Performance Systems (IPVR)
   University of Stuttgart, Breitwiesenstr. 20-22, D-70565 Stuttgart, Germany
   Phone  +49 711 7816 433           Fax  +49 711 7816 424
   Email  Wolfgang.Becker@informatik.uni-stuttgart.de
   WWW    http://www.informatik.uni-stuttgart.de/ipvr/ipvr.html

! Your answers and comments will not be published or reused directly in       !
! connection with your name or company. Instead, ONLY AN AGGREGATED EVALUATION!
! of all returned forms WILL BE PUBLISHED in the newsgroup comp.parallel and  !
! in an E=mc2 project deliverable for the European commision.                 !

1. TEN-IBC and E=mc2 Project Ideas

  TEN-IBC (Trans European Networks - Integrated Broadband Communication) is an
  EU project that investigates and evaluates different application types for a
  European high speed network. The results willbe disseminated to related
  interest groups and user communities.
  Application domains investigated within TEN-IBC are cooperative work,
  distributed information services and parallel / distributed computing
  (E=mc2). The focus is on application level rather than on network level.

  The E=mc2 (European Meta Computing Utilizing Integrated Broadband
  Communications) project within TEN-IBC couples several European high
  performance computing centers to exploit the aggregated computing power
  consisting of supercomputers, parallel systems and workstation clusters.
  It performs various parallel and distributed processing scenarios of large,
  real world applications.

  E=mc2 project partners are:
  Cerfacs Toulouse       (computing center & application engineer)
  GMD     Bonn           (computing center)
  IPVR    Stuttgart      (computing center & application engineer)
  Octacon Middlesborough (network service provider)
  Queens  Belfast        (computing center & application engineer)
  RAL     ???            (computing center)
  Telmat  Soultz         (network evaluation)

  E=mc2 evaluation criteria are mainly:
  [ ] technical feasability of wide area distribution across nation boundaries
  [ ] software support requirements for proper distribution
  [ ] bandwidth requirements / bandwidth utilization for different parallel
      application classes and for multiuser load distribution
  [ ] potential benefits from trans European high speed networks
  [ ] detailed observation of application behavior and network utilization
  [ ] obtain user requirements and user satisfaction to evaluate marketability

  Motivation and importance of the E=mc2 project: Investigation of another,
  economically very important and promising domain of applications using
  European high speed networks with completely different profiles and
  challenges than the more common and more unterstood domain of video/voice
  transmission and cooperative work.

  Wide area distributed computing by coupling high performance computing
  centers
  [ ] is of important commercial benefit in the near future
  [ ] will be profitable mainly for the following markets:
      [ ] scientific, simulations
      [ ] database processing
      [ ] unimportant compared to video conferencing, video/voice transmission
      [ ] unimportant compared to cooperative work
  [ ] is a waste of time and money

  Major challenges are
  [ ] network avaliability and reliability  [ ] network management
  [ ] network bandwidth  [ ] network latency  [ ] network costs
  [ ] operating system support for addressing/communication within applications
  [ ] operating system support for automatic remote execution / load balancing
  [ ] parallelization / restructuring / tuning of relevant applications
  [ ] portability / flexibility of relevant applications
  
  The main user communitiy will be
  [ ] HPC centers themselves
  [ ] universities and research centers
  [ ] companies that need huge computing resources
  [ ] single end users for private applications



2. E=mc2 Trials

  The E=mc2 project takes existing applications and adapts them as necessary
  for wide area distribution, heterogeneous network capacities and processing
  nodes. It uses IP (internet protocol) over ATM rather than native ATM
  protocols, because all existing parallel / distributed applications are based
  on proprietray or on IP mechanisms.

  [ ] heterogeneous computer architectures realistic
  [ ] IP over ATM is realistic
  [ ] IP over ATM is the only major way to use ATM for HPC within next years

  The trials use the HPC centers PPC at Queens University Belfast, RUS at
  University of Stuttgart and the HPC at CERFACS Toulouse. These centers are
  currently connected by the narrowband ethernet network and will be connected
  by high spped ATM during the broadband trials.

  [ ] network and HPC topology realistic and interesting

  As representatives for relevant and typical application classes following
  applications have been elected: European weather forecast simulation coupled
  with oceanographic simulation, parallel grid based numerical simulations,
  workstation cluster load distribution services with different sample
  application loads, client-server structured parallel applications (database
  operations, image processing, numerical simulation) with automatic load
  balancing support.

  The challenges of these trials, demanding high bandwidth networks, are
  following: Not just decoupled, isolated computations but large, parallelized,
  data communication intensive applications are performed, including heavily
  cooperating tasks and remote data access. Proper exploitation of the European
  computing resources requires not just keeping all processors busy, but also
  considering the imposed network load. Hence, a suitable task distribution
  granularity and clustering of tasks is necessary and must be as dynamically
  and automatically as possible.

  [ ] application types relevant
  [ ] application types cover major HPC patterns
  [ ] challenges are existent and important
  [ ] automatic load balancing should be investigated as a base application
  [ ] network monitoring relevant to characterize the application behavior



3. Applications used in the load balancing trial (trial 2)

  Load balancing approach:
  [ ] Dynamic task assignment, central & local task queues.
  [ ] No preemptive migration due to node heterogenity.
  [ ] Centralized per cluster, decentralized between clusters.
  [ ] Decision cost model: Minimum task response time, estimated by
      wait time in local server queue + compute time on node +
      delays for data communication.
  [ ] Decisions base on measuremts of nodes' load, data distribution,
                   data exchange cost (including network bandwidth and latency)
  [ ]            and on pre-estimations of task size,   data access pattern

  [ ] it is important that load balancing explicitely considers communications

  Applications observed under load balancing:
    [ ] Task parallelized, client - server structured
    [ ] Cooperation by accessing virtual shared data (runtime system support)
    Parallel finite element analysis:
      Example is a block under stress, linear approach.
      Element calculation is parallelized by element number ranges, global
      stiffness matrix is established, storage format for sparce matrices.
      Conjugate gradient solver contains three parallelized sections per
      iteration (matrix*vector, vector+vector etc.). Regular task parallelism,
      stable task sizes and data reference patterns. [ ] realistic [ ] typical
    Parallel image recognition:
      Segmentation of raster images into regions. 1. blockwise fine grain
      parallelized split and merge of image squares. Irregular, high degree of
      parallelism depending on image structure. 2. Merge of arbitrary polygons
      and 3. Boundary extraction, initially blockwise coarse grained
      parallelized, but task sizes and data access patterns depend on image
      structure.                                   [ ] realistic    [ ] typical
    Parallel complex database operations:
      Complex query operator graph including scans, joins, inserts, and
      projections. Functional decomposition and data parallelism by key range.
      Task sizes, data access patterns forseeable.   [ ] realistic  [ ] typical

    [ ] It can be profitable to distribute fine granular communicating tasks
        within applications across the network.
    [ ] Client-server task decomposition is a suitable structure for large
        parallel computations in general.
        [ ] for image processing   [ ] numerical simulations  [ ] db processing
    [ ] Communication via access to virtually shared data is a suitable concept
        for these kind of computations.
        [ ] for image processing   [ ] numerical simulations  [ ] db processing



4. E=mc2 measurement results so far   (project still runnning and ATM unstable)

  Characterization of existing networks:
  Process to process roundtrip message latency for average sized packets
  (1 kilobyte) on existing networks is in the range of 200...5000 milliseconds
  between Belfast, Stuttgart and Toulouse. With the help of extensive ping and
  tcpblast measurements following numbers could be obtained:

                           throughput    standard derivation   packet loss rate
  Ethernet at night        ~ 75 KBit/sec     ~ 90 KBit/sec            ~ 5 %
  Ethernet during workdays ~ 25 KBit/sec     ~ 30 KBit/sec           ~ 20 %

  [ ] corresponds to common experiences   [ ] other characteristics important

  Comparative measurements of current network at night versus during working
  days were performed to show the strong sensitivity of the application
  performance and load balancing achievments on the network capacity.
  Workstation clusters in Belfast, Stuttgart and Toulouse were used, 5 parallel
  applications per type were executing concurrently. Results:

         appl. type:  parallel finite    parallel           parallel complex
  network:         \  element analysis   image recognition  database operations
  Ethernet at night           1680 sec          255 sec           665 sec
  Ethernet during workday     2210 sec          255 sec           830 sec

  Load balancing decided to use mostly the Stuttgart machines for the finite
  element tasks, at night the Toulouse and Belfast clusters where utilized
  slightly more.
  For the rather tightly coupled, fine grained and communication intensive
  image processing tasks, load balancing decided to use Stuttgart only. When
  forced artifically to use the others also, about 640 sec elapsed at night.
  For database processing load balacning fully utilized all clusters.

  [ ] scenarios suitable   [ ] results plausible
  [ ] Similar comparative measurements of these scenarios using the ATM pilot
      will substantiate and prove the indications observed so far.

  The Interop experiment:
  During the Interop fair '94 in Paris an ATM link was established between
  Paris and Stuttgart and a parallel finite element analysis was run in
  parallel across Europe using 3 workstations. Maximum ATM access speed was 10
  MBit/sec only and message latency was at 120 milliseconds due to several
  routers and IP level converters along the way. This scenario was compared to
  a similar scenario based on existing Ethernet between Stuttgart and Toulouse,
  at night and during workdays. In all scenarios we forced load balancing to
  distributed the computation over all machines. The application elapsed times
  where:
                              element    one iteration      available network
                            calculation   solving step    througput     latency
                               phase
  Ethernet during workdays       850 sec   365 sec      ~ 25 KBit/sec  400 msec
  Ethernet at night              455 sec   300 sec      ~ 75 KBit/sec  200 msec
  ATM broadband, Ethernet access 290 sec   160 sec    ~ 6000 KBit/sec   65 msec

  These first, small broadband measurements indicated that parallel and
  distributed high performance computation depends on the network capability,
  which is reflected up to sustained application level performance,
  and that it is really profitable to couple distant HPC centers for improved
  exploitation of the European computing resources.
  It further showed, that distributed HPC depends on low latency, especially as
  the throughput increases. The throughput cannot be used all the time by one
  parallel application, however it can be used better with several concurrent
  parallel applications.

  [ ] scenario suitable     [ ] results plausible    [ ] conclusions derivable



5. E=mc2 General Experiences

  [ ] Large, important existing applications are mostly inflexible and not
      ready to be distributed in a flexible way across the network. Suitable
      porting and tuning of the application is necessary and requires big
      efforts.
  [ ] While internetworking configuration and addressing is still very
      difficult and requires detailed experiences, wide area ATM is even more
      complicated. [ ] High speed networks are new technology and availability
      / stability are prototype like and cannot be used commercially right now.
  [ ] Operating system support has to be enhanced in terms of more transparent
      network management and addressing and in terms of automatic load
      balancing support.
  [ ] On current low bandwidth networks across within Europe interactive remote
      work and coupled computing are absolutely infeasible. [ ] Coarse grained
      load distribution of large, isolated jobs is already possible.
  [ ] Wide area coupled HPC is severly affected by concurrent multi user
      network traffic.
      [ ] This can be solved suitably by ATM bandwidth reservation facilities.
  [ ] Trans European high speed networks can be utilized profitable and enable
      trans European cooperative computations and suitable workload balancing,
      i.e. the exploitation of the distributed computing resources like one
      large meta-computer.
  [ ] Message latency is an important limiting factor for parallel and
      distributed computing when fine granular data communication and
      synchronisation within parallel applications is inherent to the problem.
      [ ] A major part of important algorithms are bound to these closely
      coupled, communication intensive execution patterns.

  European meta computing is [ ] technically and [ ] commercially feasible
  with the approach investigated by the E=mc2 project. [ ] It is another main
  reason for establishing a trans European broadband network.



6. For more detailed E=mc2 information we refer to the ftp site at Octacon, UK:

  ftp.octacon.co.uk    user ftp    password [email address]
  cd pub/proj/emc2

  Global Market Assessment	                        definition/oct002a3.rtf
    Describes the global market for HPC equipment. Industrial policies
    regarding HPC that are implemented in the U.S. and Europe are described,
    as well as national policies for the deployment of HPC centers.

  Technically & Economically Possible Scenarios		definition/oct005a3.rtf
    Considers the relations between HPC Centres, the evolution of the service
    to end users, and the scenarios that will require telecommunications. Each
    scenario is assessed with regard to technical and economic feasibility.

  Telecommunications Options				definition/oct006a3.rtf
    Focussed on the networking aspects of connecting together widely dispersed
    High Performance Computing Centres.

  Telecomms Services Requirements Definition		definition/oct007a3.rtf
    Describes HPC application types which absolutely demand high bandwidth
    international connections, and the requirements that they are likely to
    impose on network services.  The report forms the baseline for defining the
    network connectivity and services necessary to support these applications,
    and the specifications for actual distributed HPC trials which form the
    basis for the operational trial phase of E=MC2.

  Network Specifications				definition/oct008a2.rtf
    Examines the HPC Centre requirements and those of their user groups on
    network capabilities, in order to identify the most probable broadband
    networking scenarios for HPC centres over the medium term future.

  Trial Specification					definition/oct009a1.rtf
    Describes the project's proposals for implementing broadband trials of
    distributed high performance computing (HPC) applications between a number
    of HPC centres in different European countries.

  Magazine Article about E=MC2 (ULCC & SARA)		definition/oct011a1.rtf
    The European Commission's programme on cross-border broadband
    communications TEN-IBC has finished its definition phase and plans are well
    advanced for running real broadband experiments.  The High Performance
    Computing flag is being flown by the E=MC2 project, which is planning a
    range of remote access and distributed application experiments involving
    centres in the UK, France, and Germany.

  Brief Summary						definition/oct012a1.rtf
    The objective of the E=MC2 project is to evaluate the impact of Europe-
    wide broadband network availability on the use of supercomputers by
    research agencies and commercial users.

  HPC Centre Requirements				definition/qub001a4.rtf
    Summarises the findings of the survey of HPC Centres throughout Europe.
    The key issues that will determine the future developments of HPC Centres
    within Europe are discussed.

  Trials & Applications Descriptions			    trials/oct201b1.rtf
    Comprehensive description of the 3 planned trials (5 applications) and the
    initial approach to monitoring network traffic during each trial. The 3
    trials cover: Collaborative working, distributed computing with wide area
    workstation clusters, load sharing & load balancing.

  Evaluation Plan					    trials/oct202a4.rtf
    Describes the proposals of the E=MC2 project for evaluating the results of
    its experiments in distributed and remotely executed high performance
    computing (HPC) applications.

  Network Implementation/Initial Evaluation -Interim Part   trials/qub204a3.rtf
    Interim report of realization and results obtained so far, Queens trial.

  Network Implementation/Initial Evaluation -Interim Part   trials/ipv204a1.rtf
    Interim report of realization and results obtained so far, IPVR trial.



7. For more detailed load balancing information we refer to the ftp site at
   Stuttgart University, Germany (several papers from the HiCon project):

  ftp.informatik.uni-stuttgart.de    user ftp    password [email address]
  cd pub/ipvrpub
  In German:  Bericht1993_1.ps.Z  Bericht1994_4.ps.Z  IFE94.ps.Z  BI95.ps.Z
  In English: Bericht1994_9.ps.Z  HPDC94.ps.Z  PARS94.ps.Z  APS95.ps.Z

 

8. Invitation to HPCN'95

  The E=mc2 offers a free workshop at the HPCN'95 taking place in Milan, 2nd
  May 1995, presenting results of our trials and experiences with ATM. We look
  forward to seeing you in Milan. Information and registration is available via
  WWW:    http://www.sara.nl/HPCN-Europe


