Newsgroups: comp.parallel
From: Peter Brewer <brewer@hamlet.umd.edu>
Subject: Re: SMP vs. MPP
Keywords: MPP, SMP, PVP
Organization: University of Maryland, College Park
Date: Mon, 9 Jan 1995 14:23:34 GMT
Message-ID: <3envq3$nr@hamlet.umd.edu>

In article <mjrD1n3BL.KuL@netcom.com>, Mark Rosenbaum <mjr@netcom.com> wrote:
>Wow! this is great. I lively discussion in comp.parallel. It sure
>beats the typical request for tridiagonal solvers.
>
>In article <9412271747.AA24335@hikimi.cray.com>,
>Roger Glover <glover@tngstar.cray.com> wrote:
>>In the following, I use end notes ([n]) for "asides" that are not
>>strictly on the topic.  The notes are after my signature.
>>
>>In article <9412202130.AA18009@idaho.SSD.intel.com>, Timothy G. Mattson
>><tmg@SSD.intel.com> wrote:
>>|> 
>>|> SMP requires (as the name implies) coherent shared memory of
>>|> some type.  This typically means that the processors must
>>|> share some type of memory bus.  
>
>Symmetric Multi-Processors (SMP) implies that any processor can
>be either user or supervisor. Shared memory is just the most
>common way of having the different processors communicate. While 
>This is an obvious solution I don't think it is the only one.
>
>Lets assume that shared memory is what you really want. Buses
>are only the most common way of implementing it. It is not 
>required. Non bus based memory could scale much better. In fact
>there are already some switch based system comming out (IBM & Masscomp)
>I think.
>

A wire network is a form of a bus so what he said is true. Bascially memory
is connected via a bus whether it is a typical I/O bus or a memory 'bus'.
An ethernet cable is a bus. Switches allow one to switch between various
'busses'. Some are fast and others are slow.  

>>
>Most of the popular SMP machines are physicall distributed logically
>shared memory. If you do not believe this (and most folks don't on 
>first hearing it) disable the caches on any of the machines and watch
>what happens. The per processor performance will be much slow than 
>uniprocessors with similar processors with cache disabled.
>
>
>>|> 
>>|> MPP systems (by which I mean distributed memory systems) are
>>|> potentially scalable to hundreds if not thousands of
>>|> processors (both Intel and Ncube have deployed systems with
>>|> over 1000 processors and I have heard of a 512 node systems
>>|> from Cray and IBM).
>
>Building a system this large does not guarantee that it is usefull
>for many applications. There will always be a few perfectly parallel
>apps though.
>

An MPP does not necessarily have to have its memory distributed. A system
could be designed which uses its memory as its 'switching network'. Such
a system would have an awful lot of potentially long wires, ( busses 
again ), and could experience timing problems ala' the Denelcorp HEP.
However, it is possible to build such a system making the above statement
not true in almost all cases. I am sure there are other architectures
and examples.

>Most of this discussion so far has ignored some very important points.
>
>Macros can turn a message passing system into a global memory system,
>and the other way.

Define what you mean by 'macros'?

>
>The real battle is with latency and bandwidth not how to describe 
>the symantics.
>

You mean memory bandwidth and network latency... e.g. busses again.

>I cannot think of any usefull algorithms that don't try to partition
>the data amoung processors if for no other reason than to avoid
>memory conflict.
>

Even large vector machines with a few fast processors must use gather/scatter
and other hardware to avoid memory bank conflicts. Even a single processor
machine would be wise to use these techniques when dealing with large data
sets which consume most of the system's memory. But algorthms do not have
to use these techniques to be useful with small data sets. 

>Finally the $64K question. Does anyone have an algorithm that 
>
a) actually uses writable shared memory for anything other than locks.

Gee, well there are recent versions of Unix spoolers which FIFOs. That is
a form of shared memory is it not? How about message queues although the
are slower than FIFOs in general... ( lately they've been improved ). You
should probably spend some time reading Steven's books on Unix programming.
So, I guess there's UUCP, printd, etc., etc., etc. 

>
>b) is faster than other algorithms that don't.
>
>Boy this could be fun.

>Mark Rosenbaum				Director of Engineering
>mjr@netcom.com				Otey-Rosenbaum & Frazier
>(703) 536-9464				Consultants in High Performance 

You sure you aren't one of Clinton's recent 'consultants"?

-- Peter Brewer              Lite Gatto

      An early anagram used to show the location of the Ark of the Covenant.

	                    S  A  T  O  R
                            A  R  E  P  O
                            T  E  N  E  T
                            O  P  E  R  A
                            R  O  T  A  S


