Newsgroups: comp.parallel
From: Robert van de Geijn <rvdg@cs.utexas.edu>
Subject: Collective Communication
Organization: CS Dept, University of Texas at Austin
Date: 17 Mar 1995 22:39:55 -0600

In article 11965, Roger Butenuth and Peter Sanders write:

   In article <3jkn8n$i05@usenet.srv.cis.pitt.edu> rvdg@cs.utexas.edu
   (Robert van de Geijn) writes:
   
   |>                   vector        NX        InterCom      ratio
   |>     Operation     length       (sec)       (sec)     (NX/InterCom) 
   |>   -----------------------------------------------------------------
   |>
   |>     Broadcast     8 bytes      0.0017      0.0014        1.21
   |>                 64K bytes      0.0356      0.0069        5.18
   |>                  1M bytes      0.5788      0.0493       11.75
   |>
   |>   Global Sum      8 bytes      0.0032      0.0029        1.10
   |>     to all      64K bytes      0.3780      0.0195       19.35 
   |>                  1M bytes      5.9353      0.1791       33.15
   
   We were surprised about this performance data, because the operating
   system Cosy (Concurrent Operating SYstem) with the Library PIGSeL is
   about as fast as the Paragon, considerung it is running on Transputers
   (T805, 30MHz), with only 30 MIPS / 2 MFLOPS / 1.7 MB/s bandwith on its
   links.
   
   The performance on the Paragon should be at least an order of
   magnitude better than on Transputers, an more than five years old
   architecture. 
   
   We did our measurements on the GCel1024 in Paderborn with 16 x 32
   processors in a grid topology (half of the machine) and want to thank
   the PC^2 for giving us this opportunity.
   
                       vector        NX        InterCom      Cosy
         Operation     length       (sec)       (sec)        (sec)
       -------------------------------------------------------------
    
       Global Sum      8 bytes      0.0032      0.0029      0.0062
         to all                                             ======
   
   
Notice that this is a performance number for SMALL message length, for
which bandwidth is meaningless.  All this indicates it that the
transputer network has communication startup twice as high as the
Paragon.  I always thought that one of the advantages of transputer
networks was that communication startup was relatively small....
Could the authors kindly provide performance numbers for 64K bytes and
1M bytes?

Robert van de Geijn

