Newsgroups: comp.parallel
From: mjr@netcom.com (Mark Rosenbaum)
Subject: Re: SMP vs. MPP
Keywords: MPP, SMP, PVP
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
Date: Mon, 2 Jan 1995 18:50:03 GMT
Message-ID: <mjrD1n3BL.KuL@netcom.com>

Wow! this is great. I lively discussion in comp.parallel. It sure
beats the typical request for tridiagonal solvers.

In article <9412271747.AA24335@hikimi.cray.com>,
Roger Glover <glover@tngstar.cray.com> wrote:
>In the following, I use end notes ([n]) for "asides" that are not
>strictly on the topic.  The notes are after my signature.
>
>In article <9412202130.AA18009@idaho.SSD.intel.com>, Timothy G. Mattson
><tmg@SSD.intel.com> wrote:
>|> 
>|> SMP requires (as the name implies) coherent shared memory of
>|> some type.  This typically means that the processors must
>|> share some type of memory bus.  

Symmetric Multi-Processors (SMP) implies that any processor can
be either user or supervisor. Shared memory is just the most
common way of having the different processors communicate. While 
This is an obvious solution I don't think it is the only one.

Lets assume that shared memory is what you really want. Buses
are only the most common way of implementing it. It is not 
required. Non bus based memory could scale much better. In fact
there are already some switch based system comming out (IBM & Masscomp)
I think.

>
>|> There have been some attempts to merge SMP and MPP.  KSR is
>|> the most recnet example.
>
>I thought KSR's memory was only "logically" shared.  IMHO, physically-
>distributed-logically-shared memory is not very SMP-like at all.  

Most of the popular SMP machines are physicall distributed logically
shared memory. If you do not believe this (and most folks don't on 
first hearing it) disable the caches on any of the machines and watch
what happens. The per processor performance will be much slow than 
uniprocessors with similar processors with cache disabled.

>|> They were unable to pull it off,
>|> however, and have consequently gone out of business.

Much has been said of the business problems of KSR but I would
be interested in hearing from KSR users and exemployees and how
well the system was doing technically. My understanding is that
there was a reliability problem that was either hardware or 
software related.

>|> 
>|> MPP systems (by which I mean distributed memory systems) are
>|> potentially scalable to hundreds if not thousands of
>|> processors (both Intel and Ncube have deployed systems with
>|> over 1000 processors and I have heard of a 512 node systems
>|> from Cray and IBM).

Building a system this large does not guarantee that it is usefull
for many applications. There will always be a few perfectly parallel
apps though.

Most of this discussion so far has ignored some very important points.

Macros can turn a message passing system into a global memory system,
and the other way.

The real battle is with latency and bandwidth not how to describe 
the symantics.

I cannot think of any usefull algorithms that don't try to partition
the data amoung processors if for no other reason than to avoid
memory conflict.

Finally the $64K question. Does anyone have an algorithm that 

a) actually users writable shared memory for anything other than locks.

b) is faster than other algorithms that don't.

Boy this could be fun.


Mark Rosenbaum				Director of Engineering
mjr@netcom.com				Otey-Rosenbaum & Frazier
(703) 536-9464				Consultants in High Performance 
Washington D. C. 			and Scalable Computing 
metro area				and Applications



