Newsgroups: comp.parallel,comp.parallel.pvm
From: wrankin@ee.duke.edu (William T. Rankin)
Subject: Re: Distributed Proccessing - How to measure speedups?
Organization: Duke University EE Dept.; Durham, NC
Date: Mon, 1 Aug 1994 15:47:33 GMT
Message-ID: <Ctv579.96J@dcs.ed.ac.uk>

In article <CtLn35.7zG@dcs.ed.ac.uk>, chasman@chem.columbia.edu (David Chasman) writes:

|> In article <Ct8JBz.KDJ@dcs.ed.ac.uk> nfotis@theseas.ntua.gr (Nick C. Fotis) writes:

|> >- How can / should I measure the efficient execution of programs in a
|> >  heterogeneous network?
|> >
|> >We don't know what to measure anymore - the CPU seconds spent in each CPU
|> >are rather irrelevant, as we may have CPUs from 30 SPECfp to 300 SPECfp each
|> >- and the network delays aren't the same on each machine.
|> >
|> >We cannot isolate the network, since it's not our own, and the wall-clock
|> >time is not adequate metric, since he does research on efficient parallel
|> >algorithms (till now on homogeneous, shared memory machines)

Thanks for bringing up this question, Nick.  This is a significant
issue for me since I am involved in some load balancing issues for
my doctoral research.  

Firstoff, elapse (wall clock) time, *is* a valid metric for distributed
systems.  It can be used to indirectly measure transmission latencies,
and other delays that cannot be directly measured from the CPU.  For
scientific computing, your baseline metric is simply: "How long did it
take to finish after I hit the return key?"


|> 	The relevant resource on any machine is the number of
|> 	here is "megaflop seconds" :

Only if the code is floating point intensive.  An important point
here is that the FP performance of a processor may be vastly different
than the integer performance.  In addition, the FP/INT performance
ratio may vary widely for different processors.

Ie. Just because two processors have the same FP performance does not
mean that their integer performance is going to similar.

|>	MFS = SPECfp * ( user_time / elapsed_time )
|> 	So, the efficiency of any code is:
|> 	E_serial = Time_of_execution / MFS
|> 
|> 	-------------------------------------
|> 
|> 	In a parallel or distributed environment - 
|> 	MFS = sum_i ( SPECfp_i * ( user_time_i / elapsed_time_i ) ) 
|> 	the index i runs over the processors used.
|> 	One thing that you should take a look at is :
|> 	E_parallel = Time_of_execution / MFS


Perhaps a better approach would be to run the serial application on each
of the different platforms and record the user_time for each platform.
Use this number to create a basic performance index that replaces the
SPECfp(i) parameter in the above equations.  By doing this, you scale
your performance measurement for the specific workload presented by your
application.

In "Computer Architecture: A Quantitative Approach", Hennesey and Patterson
give a good argument for this approach and on avoiding the use of synthetic
benchmarks.

One thing I would like to see is a basic discussion on performance measurement
of distributed systems.  Topics like how communications overhead, bandwidth
and latency can be modeled and measured?  how to take dynamic external workloads
into account?  

Anyone want to take the ball from here?

-- 
----                                /       __/    /    /
bill rankin                        /              /    /
wrankin@ee.duke.edu               ___  /    /    /    /
philosopher/coffee-drinker       /    /    /    /    /
                                /    /    /    /    /
                             _______/  __/  __/  __/





