Newsgroups: comp.parallel
From: slater@nuc.berkeley.edu (Steve Slater)
Subject: Help - explain superlinear speedup?
Organization: University of California
Date: Thu, 12 Jan 1995 18:43:54 GMT
Message-ID: <3f23ht$ec9@agate.berkeley.edu>


I have a program which has superlinear speedup and I
can't explain it. Does anyone have any ideas. Here is
the summary.

I am using a code which passes messages using p4,
on 4 Sparc 2's running SunOS 4.1.3. The code solves
coupled matrix equations, much like a heat equation.
The processors are each assigned a geometrical region 
like:

------------------
|        |       |
|   A    |   B   |
|________|_______|
|        |       |
|   C    |   D   |
|________|_______|

Each job/process analyzes only one region of A through D.

What happens in the code (not really important to
my problem though) is a matrix is solved for each
A-D, then the boundary coditions are passed between
each region (outgoing heat current = incoming for
each neighbor) and the matrix equations are solved
locally again. The process repeats until the solution
converges.

With p4, I first run 4 processes (4 regions) on only 1 machine.
The messages are passing through sockets. Then I run
on 2 machines, each having 2 processes (2 regions), and
finally on 4 machines, each having 1 process (region).

You would expect less than linear speedup since with
only one machine, no messages are sent over the ethernet,
they are just communicated via sockets. But I get very
superlinear speedup like:

1 proc:  556 sec               4 unique processes on 1 machine
2 proc:  204 sec               2 processes on each of 2 machines
4 proc:   38 sec               1 process on each machine

There was NO memory swapping occurring during the entire
execution time. I would periodically check with ps.

Does anyone have any thoughts?

Thanks,

Steve Slater
slater@nuc.berkeley.edu



