Newsgroups: comp.parallel.pvm
From: pikus@sbphy.physics.ucsb.edu (Fedor G. Pikus)
Subject: Re: PVM hangups - possible solution
Keywords: hang
Organization: University of California, Santa Barbara
Date: 6 Jan 95 20:20:31 GMT
Message-ID: <pikus.789423631@sbphy.physics.ucsb.edu>

I've experienced such problem: if you have few slaves, or they send few
messages, you are OK, if lots of slaves send lots of messages, some
get lost or something, anyway, you get hangup on receive call. 
The quick way to see if this is the problem is to put a call to sleep
in each slave after every send (in the group of sends which couse the
problem). This is not a good fix, but if it works with sleeps than the
problem is most likely insufficient memory for all the messages which
are being sent at the same time. In this case my solution was a flow
control: master sends to a slave a short Clear To Send message, slave
responds by sending few messages which will definitely not cause memory
problem (you have to experiment to find what this number of messages
is). Then slave waits for another CTS message. Master executes all
receives for the first batch of messages, then sends new CTS. This is
somewhat slow, so it helps if slaves can use non-blocking recieve and
have some work to do while master processes the messages, but it is a
safe option. 
Just a reminder: if all your nodes are of the same type, use 
PVMFINITSEND(PVMRAW, .. )
If you don't move the data until the send is complete (or use blocking
send), use PVMINPLACE instead of RAW. This speeds up deconding of
messages and may help master process more messages and avoid lockup.
Lastly, here is a FORTRAN interface to sleep function in case you want
to try the method above and forgot how to interface Fortran and C:
/*  fsleep.c */
/* Fortran interface to sleep function. Suspends process for */
/* requested number of seconds.                              */
#include <unistd.h>
void fsleep_ (long *Seconds)
{
  unsigned int i;
  unsigned int j;
  i = *Seconds;
  j = sleep(i);
}

Usage:
	Call FSleep(5) ! gives 5 second sleep

Hope this helps to somebody.

Fedor Pikus

