Newsgroups: comp.parallel.pvm
Path: ukc!uknet!pipex!howland.reston.ans.net!europa.eng.gtefsd.com!gatech!swrinde!elroy.jpl.nasa.gov!ames!cnn.nas.nasa.gov!wk81.nas.nasa.gov!dinucci
From: dinucci@wk81.nas.nasa.gov (David C. DiNucci)
Subject: Re: Msg passing test failures
Message-ID: <CLovw9.Hsu@nas.nasa.gov>
Sender: news@nas.nasa.gov (News Administrator)
Nntp-Posting-Host: wk81.nas.nasa.gov
Organization: NAS - NASA Ames Research Center, Moffett Field, CA
References: <2keluj$qf0@maxwell2.ee>
Date: Wed, 23 Feb 1994 18:12:08 GMT
Lines: 42

In article <2keluj$qf0@maxwell2.ee> bukoshy@mtu.edu (BOB U. KOSHY) writes:
>...
>	- The chances of the message being successfully received
>	  goes down as: 1) the length of the message increases and
>	  2) the number of nodes to which the message is sent 
>	  increases. For a single receiver, messages of upto a 
>	  million doubles length get across successfully, but for
>	  around ten receivers, this length drops to c. 100,000
>	  doubles. I'm using pvm_mcast for the send, and the 
>	  blocking pvm_recv for the recieve. I have tried both
>	  TCP socket based and UDP, daemon mediated communication
> 	  (by setting pvm_advise()) with more or less the same
>	  results.
>...
>
>I don't believe that I'm running out of memory - at least for the
>smaller messages. I assume that whatever buffers are used for
>sending a message are freed after the msg is sent, so that the
>"memory cost" of sending a msg once is the same as that of sending
>it, say, a 100 times.

I think the problem may be in this assumption.  To ensure reliable
communication, something somewhere must hang on to each outgoing message until
an acknowledgment (ACK) is returned from the receiver, since failure to receive
an ACK must result in a re-transmission of the message.  With UDP, this
responsibility is placed in the application (i.e. the PVM daemon in this case).

If this is correct, it could theoretically be averted by adding complexity to
the protocol -- e.g. by blocking senders until there is buffer space (i.e. ACKS
for previous messages were received), or in the case of multicast, saving only
one copy of the outgoing message for possible re-transmission.

This does not necessarily explain the TCP-IP case.

(Disclaimer:  I have not done any significant hacking on PVM.)

Dave
-- 
===============================================================================
Dave DiNucci              NASA Ames Research Center
dinucci@nas.nasa.gov      M/S 258-6
(415)604-4430             Moffet Field, CA 94035

