Newsgroups: comp.parallel,comp.lang.misc
From: jdonham@us.oracle.com (Jake Donham)
Subject: Re: Massively parallel programming for the Internet
Organization: Oracle Corporation. Redwood Shores, CA
Date: Fri, 6 Jan 1995 03:20:23 GMT
Message-ID: <JDONHAM.95Jan3113942@hadron.us.oracle.com>

"thinman" == Technically Sweet <thinman@netcom.com> asserts:

    thinman> Hmmm... I'll have to look at Linda again.

    thinman> I had crossed it off because it doesn't let go of the
    thinman> classic computer science obsession with computational
    thinman> efficiency.  All of the projects I've scanned make the
    thinman> same mistake.  We're talking about a system with 10,000
    thinman> computers randomly executing parts of one program while
    thinman> very slowly communicating with one another.  The CPU time
    thinman> is essentially free, it's the messages that are
    thinman> expensive.

    thinman> Given this, building special compilers that carefully
    thinman> optimize your compiled C or Fortran (which Gelernter did)
    thinman> is a complete waste of time, because it ignores the
    thinman> relative costs mentioned above.  When you factor in the
    thinman> cost of delivering N different binaries to your 10,000
    thinman> computers, C-Linda is (massively) silly.  Not to mention

Except that the whole point of the C-Linda compiler is to optimize
communication. It divides the tuple-space operations in your program
into groups based on the number of fields in the tuple and their
types, and then subdivides each group with a hash function. Each
group/hash bin is assigned to some node on the network. So most of the
work of locating a tuple is done at compile time, and a running
program doesn't have to query lots of nodes (i.e. incur lots of
communication cost) to find it.

The balance between computation and coordination can be altered by
changing the granularity of your parallelization; it is possible to
write "tunable" programs that have a knob for granularity and can be
optimized for (on one end of the scale) a shared-memory multiprocessor
machine or (on the other end) the Internet. Let me recommend (again)
Carriero and Gelernter's _How_to_Write_Parallel_Programs_.

The real problem with C-Linda is that it assumes a static program and
execution environment. An all-Internet programming environment would
be subject to all manner of machine and network failures, and Linda is
not designed to deal with them. Westbrook and Zuck (professors at Yale
who work with Gelernter) are working on a fault-tolerant
generalization of Linda called PASO. Moreover, if you allow the
program to be updated as the computation runs, all of C-Linda's
global, compile-time optimization goes out the window. PASO attempts
to address this issue as well, providing for an adaptive organization
of tuples; i.e. groups and the nodes which serve them are determined
as the program runs and data is stored in tuple space.

    thinman> the security problems of allowing some wanker on the
    thinman> Internet run a binary on your precious workstation.  You
    thinman> need to put a very simple verifiable interpreter on those
    thinman> machines which implements a nice dense program
    thinman> representation.  If I'm going to run some Internet daemon
    thinman> on my workstation, it's going to be something where I can
    thinman> examine the code and be damn sure it can't wipe out any
    thinman> of my data.

There would be plenty of room for participants in the Net computer who
don't actually run the computation: coordination on that scale
requires a large amount of support. Servers would be needed to store
pieces of the distributed memory (assuming that your model includes
one), and some system would be needed just to keep track of who is
participating in the computation. These pieces won't be running random
code.

    thinman> A second problem is that these projects fixate on
    thinman> designing one language for the problem.  This is hubris.
    thinman> You're not going to do a good language/control system the
    thinman> first time, so just build the infrastructure and let
    thinman> anyone submit jobs in the low-level interpreter.  Then
    thinman> you and they can independently do research in languages
    thinman> and interactive debuggers.

The Linda tuple-space model doesn't require that the program be
written in a uniform language--the tuple space acts as an abstraction
barrier, and it is possible for two languages which embed the
tuple-space operations to intercommunicate. I wrote an implementation
of the tuple-space operations embedded in Scheme which interfaced to
the PASO implementation, and I wrote systems composed of cooperating C
and Scheme programs which communicated through tuple space.

    thinman> Lance Norskog thinman@netcom.com
    thinman> Artisputtingtogether. Art s th ow n aw y.

This is a really interesting discussion!

Jake


