Newsgroups: comp.parallel.pvm
From: tony@aurora.cs.msstate.edu (Tony Skjellum)
Subject: Re: pvm_joingroup error message
Organization: Mississippi State University
Date: 3 Jul 94 18:04:35 GMT
Message-ID: <tony.773258675@aurora.cs.msstate.edu>

chuang@athena.mit.edu (Weihaw Chuang) writes:

>I've been getting this error message trying to use pvm_joingroup...

>wech@mulsanne>barrier
>libpvm [t40013]: gs_getgstid: Error 0
> 
>What does this mean?  Also has anyone created a synchronization
>mechanism without using pvm_barrier?  or suggestions for making one?
>Thanx

>-Wei

Wei,

There are race conditions inherent in dynamically joining a group, while
barriers are on-going, so I'll mention my thoughts for a reasonable
barrier, assuming that the group is not changing size.

In MPI, the MPI_BARRIER(comm) works over the group specified in the communicator.
This can be emulated in PVM, by constructing a static list of TIDS to participate,
in each of the processes you intend to barrier.  One option you have is to read
how the MPI or UNIFY source code (see earlier postings this week) do barrier, or
MPI_Allreduce.  However, the following generic pseudo code can easily be written
in straight PVM...

Call this list of TIDS "grp".  It has a well defined order, and is the same
in each process to be synchronized.  Each process in "grp" has a unique index
position in the list, we will call that the rank of the process.   We will
assume that there are N processes in grp.  Of course, whenever you do a PVM
send or receive, you will have to look up the TID corresponding to the rank
by referring to the array "grp", but the following algorithms are best
explained in terms of ranks...

Two approaches

1) Implement the combine operation over "grp" using trivial algorithm.

   Let N' be the power of 2 less than or equal to N.
   Let my_rank by the rank of the current process in the group of TIDs grp.
	(0 <= my_rank < N)

   If N' != N then

	Each process with rank >= N sends a message to its corresponding process with
		rank N less than it.  There are (N-N') such sends.

	Processes 0 ... (N-N')-1 wait for a message from the processes with ranks N'...,N-1

	[The message must be received with a specific TID, obtained from the grp list.
		Do not use a wildcard receive.  The message doesn't need to have any data in it;
		What's important is the receive with a specific source.]
   end

   Since N' is a power of 2, let nsteps = log2(N')

   Processes from 0 ... N'-1 do the following:

   	let spread  = 1
   	for steps = 1, nsteps
	{

		If my_rank xor spread < my_rank
	        {
			Send a message to process with my_rank - spread
			Receive a message from process with my_rank - spread
		}
		else
		{
			Receive a message from process with my_rank + spread
			Send a message to process with my_rank + spread
		}
		endif
		[Ordering assures that no buffering is needed for this procedure to work!]

		spread = spread * 2
	}

   If N' != N then 
   {
      Processes with ranks from 0...(N-N')-1 send a message to processes with ranks N'...N-1
      Processes with ranks N'...N-1 wait for their message from processes with ranks 0...(N-N')-1
   }

The above implements a "combine" over a general number of processes,
but it uses 2N'(log2(N')) + 2(N-N') send/receive pairs to accomplish
the algorithm; furthermore, it uses at most N' sends at a given time,
putting a possible strain on bisection bandwidth of the network in
use.  Though this algorithm can be further improved incrementally, but
the following algorithm is better...

2) Implement a fanin/fanout-style combine over "grp"

   Let N'' be the power of 2 greater than or equal to N.

   let nsteps = log2(N'')

   -- fanin step --
   spread = N''/2
   for steps = 1, nsteps
   {
        if my_rank >= spread 
	{
		send to process with rank := my_rank - spread
		break out of the loop
	}

	if my_rank xor spread < N
		Receive from rank := my_rank + spread

	spread = spread / 2
   }

   [The message can be empty, what is important is the selection of the messages based on
	their source.  do not use wildcards.]

   -- fanout step --
   if(my_rank == 0) then received_flag = true else received_flag = false
   pattern = my_rank
   spread = 1
   src = 0
   for steps = 1, nsteps
   {
	if received_flag == true
	{
		let tmp := my_rank + spread
		if(tmp < N)
		    send to process with rank := tmp
	}
  	else
	{
	    srcd = src xor spread;
	    if(srcd != my_rank)
	    {
		if(my_rank > srcd)
		   src = srcd;

		continue;
	    }
	    receive from process with rank := my_rank xor spread
	    received_flag = true
	}
   
	spread = spread * 2
    }
		

This operation works in 2 sections.  The first is a fanin step, that does a total of
N-1 send/receives.  The second is a fanout step, that does a total of N-1 send/receives.
Thus a total fo 2N-N send/receives occur (less compared to above algorithm).
During two of the iterations, this algorithm requires bisection bandwidth of N', whereas
the previous algorithm uses bisection bandwidth of N' sends at each iteration.

-Tony Skjellum


--
	.	.	.	.	.	.	.	.      .
"There is no lifeguard at the gene pool." - C. H. Baldwin
            -             -                       -
Anthony Skjellum, MSU/ERC, (601)325-8435; FAX: 325-8997; tony@cs.msstate.edu

