Newsgroups: comp.parallel.mpi
From: Mike Yukish <may106@psu.edu>
Subject: Why shouldn't this code fragment work?
Organization: Applied Research Lab
Date: 5 Dec 1995 15:22:09 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Message-ID: <4a1o31$r8f@hearst.cac.psu.edu>

Hi Folks,

I am running MPI on a two processor SPARC 20,
but I am running four processes. I wanted to use
a torus configuration for matrix multiplication.

When I check the efficiency of the processes
with mpstat(), I find that there is endless page
swapping and context switching going on. One
solution, which I haveimplemented, is to rewrite
the code to run as two processes. The code now
runs at about 180% w.r.t. the serial code, so
that's good. But I thought I would try something
to make the 4-proc version run faster, and here
is my fragment:

Basically, I am multiplying two matrices
together. So the code looks like:

C = Mult(A,B)    // lots of context switches,
sys faults

Swap(A,B)

C = mult(A,B)
.
.
.

What I thought I would do is put in some
barriers as follows:

If (member of right column)
			MPI_Barrier();

C=Mult(A,B);

if (member of left column)
		MPI_Barrier();

Swap(A, B)

[ more mult.]

It seems like this would sequence my processes
through the multiplication. I have the processes
in a cartesian grid, so two of them are in the
right column, so to speak. They would hold until
the other two get through their multiplications,
letting them dominate the processor during the
math intensive calculation.

Unfortunately it doesn't work. I've looked and
looked at my code, and this isn't the first MPI
code I've written. Is there any fundamentally
wrong thing I've done here?



Mike Yukish
Applied Research Lab
may106@psu.edu
http://quark.arl.psu.edu/staff/yuke-home.html

