Newsgroups: comp.parallel.pvm
From: adilger@enel.ucalgary.ca (Andreas Dilger)
Subject: Re: Messages get lost
Organization: ECE Department, U. of Calgary, Calgary, Alberta, Canada
Date: 15 Apr 1996 19:19:20 GMT
Message-ID: <4ku7fo$188k@ds2.acs.ucalgary.ca>

In article <316EF0D7.41C67EA6@brc.uconn.edu>,
Chet Vora  <chet@brc.uconn.edu> wrote:
>... all of a sudden, messages seem to get lost and all tasks hang
>( it seems as if they are waiting for the recv which is never going to 
>occur).
>
>Is this normal ??

Probably what is happening is that you have a deadlock.  Each slave is
waiting for a message to arrive from before it sends the next message,
but it won't get the next message because the slave that is sending the
message is itself waiting for an incoming message.

>The symptoms for CONFIG 3 are that some tasks seem to be receive the
>correct data and but after a while, they get junk data ( or all 0s) 
>and hang.

I don't know about your setup, but how do the slaves know what size the
message should be?  Usually what you want to do is check the size of the
buffer and/or encode the size of the message at the beginning, and then
the slave knows how much space to allocate when decoding the message.

Cheers, Andreas.
-- 
Andreas Dilger   University of Calgary  \"If a man ate a pound of pasta and
(403) 220-8792   Micronet Research Group \ a pound of antipasto, would they
Dept of Electrical & Computer Engineering \   cancel out, leaving him still
http://www-mddsp.enel.ucalgary.ca/People/adilger/       hungry?" -- Dogbert

