Newsgroups: comp.parallel.pvm
From: shao@bohemia.cs.colorado.edu (Chung-Shang Shao)
Subject: Re: Experience with pvm+shm?
Organization: University of Colorado, Boulder
Date: 16 Apr 1995 05:05:58 GMT
Message-ID: <3mq8jm$k2t@csnews.cs.colorado.edu>

In article <3mjb8h$86a@soleil.uvsq.fr>,
Stephane WOILLEZ <wos@prism.uvsq.fr> wrote:
>
>I have the same problem than you. I use PVM3.3.7 on Sun Sparc workstations
>architecture SUN4. My program is splited in several nodes. Each node is made
>of 2 processes, on for the calculus and the other one for PVM I/O. Communications
>between the 2 processes in one node are done with shared memory and 2 signals.
>During the calculus, each node exchanges several of its datas with the other
>nodes. And it seems that, at the begining the calculus, one PVM message is
>simply lost. I test every PVM call and there is no error returned wich means
>(normaly) that everything is fine. Problem is that one of my messages is lost
>and it locks the whole algorithm. Every process that communicate owns it
>personnal tag. If what you say is correct, the solution may lies in the fact
>that one must never use pvm_recv with a -1 in the node field which implies that
>we have to develop an algorithm that check every processor cue using pvm_nrecv.
>
>I must also say that sometimes, the pvm deamon of one of my calculus nodes
>generates an error like empty message field or incomplete message.
>
>The problem with this deadlock in my program is that I am not sure that it's
>because of me or because of PVM :-)
>
>If somebody knows something about this problem, or holds the solution. Please
>post it or mail it to me. Any comment is also welcome.
>
>  Thanks,
>
>    Stephane.

i am using ALPHAMP, PVM 3.3.7, and libpvm3s.a, instead libpvm3.a.  i did
not encourter the problem (probably because i did not use shared memory
library).  so, i think one solution is to use libpvm3s.a.  what you lost
is efficiency.

another possible dirty solution is send 5 initial messages from master
to slaves, or to each other.  the key is that sender has to send 5 messages
and receiver only has to receive one out from five message.  then, it
should not matter if the first one lost or not. :)

cs shao.


