Newsgroups: comp.parallel.pvm
From: mjfrazer@dictator.uwaterloo.ca (Mark Frazer)
Subject: Re: Solaris 2.5 machines and PvmRouteDirect
Organization: University of Waterloo
Date: Wed, 22 May 1996 13:35:28 GMT
Message-ID: <Drt734.8r@novice.uwaterloo.ca>

Just to follow up on this:

Scott Townsend of NYMA Inc. at the NASA Lewis Research Center kindly
pointed out to me that the problem was due to a deadlock caused when
two processes attempt to establish direct communications with each
other at the same time.

Version 3.3.11 of PVM was to have fixed this.  However, I was still
found a hung application and "Connection refused" messages in the PVM
log when using 3.3.11.

The fix is to create an algorithm which will have each of the processes
send a dummy message to any process it wishes to communicate to without
deadlock at the start of your program.

For all communicating process pairs (P1,P2)
begin
    if(myTid == P1 || myTid == P2)
    begin
	sender = min(P1,P2)
	receiver = max(P1,P2)
	if(myTid == sender)
	    send to receiver
	else
	    receive from sender
    end
end

The original article is below:

In article <DrIK9J.G3v@watserv3.uwaterloo.ca>,
Mark Frazer  <mjfrazer@dictator.uwaterloo.ca> wrote:
>Hi.
>
>I have a PVM application which runs fine.  I've tried to add the following
>call near the top of my program:
>pvm_setopt(PvmRoute, PvmRouteDirect).
>
>This returns 2, indicating that the previous mode was PvmAllowDirect on 
>all nodes of my virtual machine.  Note that my virtual machine contains
>only 6 connections and that each node only talks to 3 others (left and right
>nodes in the ring and the parent node.)  so I'm not running out of file
>descriptors.
>
>My pvm log file contains:
>[t80040000] ready  3.3.10   Thu May 16 12:42:05 1996
>[t80040000] [tc0001] libpvm [tc0001]: pvmmctl() connect: Connection refused
>[t80040000] [t180001] libpvm [t180001]: pvmmctl() connect: Connection refused
>[t80040000] [t80001] libpvm [t80001]: pvmmctl() connect: Connection refused
>[t80040000] [t140001] libpvm [t140001]: pvmmctl() connect: Connection refused
>[t80040000] [t100001] libpvm [t100001]: pvmmctl() connect: Connection refused
>[t80040000] [t40003] libpvm [t40003]: pvmmctl() connect: Connection refused
>
>These program runs fine on my SunOS 4.1.3 machines.  I'm trying to run it
>on a network of Solaris 2.5 machines and have built pvm using gcc.  My
>application is built with the SparcCompiler CC.
>
>Thanks for any help,
>mark
-- 
Mark Frazer    Electrical and Computer Engineering    University of Waterloo
MJFrazer@Dictator.UWaterloo.Ca       http://www.pads.uwaterloo.ca/~mjfrazer/

