Newsgroups: comp.parallel.mpi
From: salo@mrjones.engr.sgi.com (Eric Salo)
Subject: Re: Performance problems using mpi on the SGI Power Challenge
Organization: Silicon Graphics, Inc.  Mountain View, CA
Date: 2 Oct 1996 22:36:45 GMT
Message-ID: <52uqpt$f0h@murrow.corp.sgi.com>

> Of course that's what you would say publicly.  However, I believe the
> reason is that the vendor (including SGI, Digital, ...) cares far more
> about latency/bandwidth benchmark numbers than how the machine might
> actually be used.  The former sells machines, while the latter is far less
> relevant (to the vendor, that is) once the machine is sold.

Well, I'm sorry that you feel that way. Real world performance is *always*
the bottom line for me, and as anyone who has been following the recent
MPI-2 discussions could attest, I don't have a tactful bone in my body;
there is very, very little difference between what I'll say in public and
what I'll say in private. Perhaps you should get to know me a bit better
before jumping to such conclusions.

It does us no good at all if we sell a system based on the strength of some
brainless ping-pong test but can't deliver any real performance to the
customer. If we did that then they would look elsewhere next time and we
would deserve it.

Our MPI implementation is far from perfect and I'm the first to admit it.
Our latency is still higher that I would like it to be. Our collective
ops need work. We need to improve the reliability of our HIPPI bypass
messages. Our non-contiguous messages are too slow. Our MPI_Cancel()
is completely unimplemented and our MPI_Bsend() is still broken. And so on.
But there are also a lot of very *nice* things about our implementation,
which were made possible because (for example) we made the decision that it
was more urgent to crank up our host-to-host performance than it was to
worry about what happens when someone tries to run 15 processes on a 10 CPU
system. We'll get there eventually, but there are only so many hours in a
day...

> How about also providing a MPI_MAX_SPINWAIT environment variable, or
> something similar? Then the user has control over the "cost" of
> "performance".

Sure, we could do that. In fact I can think of a whole bunch of possibly
useful stratagies:

	1) Always spin
	2) Never spin
	3) Spin for some user-tunable period of time, then sleep
	4) Test once, go to sleep for a fixed period of time, repeat
	5) Test and sleep with some sort of linear backoff
	6) Test and sleep with some sort of exponential backoff
	7) Test and sleep, and let the user provide their own function
	   for determining the next backoff interval

Some of these would require a substantial amount of work, others would
be of only dubious usefulness. And as I mentioned before, we do intend
to address this problem in the future. We simply have not yet had
sufficient interest from our customers to make it an immediate priority.

Eric Salo         Silicon Graphics Inc.             "Do you know what the
(415)933-2998     2011 N. Shoreline Blvd, 8U-802     last Xon said, just
salo@sgi.com      Mountain View, CA   94043-1389     before he died?"

