Newsgroups: comp.parallel.pvm
From: Stuart D Blackburn <sdblackb@uncc.edu>
Subject: PvmDataInPlace takes longer than PvmDataRaw???
Organization: University of North Carolina at Charlotte
Date: 3 Aug 1995 18:51:15 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <3vr5r3$4jj@news.uncc.edu>

I am writing PVM and MPI programs as part of a Graduate Project.
In particular, I am doing a 3D matrix decompostion. I have a master
process that initializes the outside edge of the matrix to permanant values,
and sends a single 2D array (Width X Height) to each slave process. At this
point, the slaves find their front and back partners ( other slaves) and
swapping data.
	They then loop computing new values for every interior point in their
own array (by taking the average of its neighboring points). When complete,
it sends a copy of this new data to each of its partners, and waits for
their data to be sent to it. This is repeated until no entry in the
slave's 2D array changes values by more than a set tolerance value. A
data status value if STABLE or UNSTABLE is send in the message with the
2D array to let the partners know whether or not more data will be sent
from this partner. When stability is reached, the slave also sends a copy
of its data to the master then exits.
	This is straight forward enough, but I thought that since I was
operating on a homogeneous network of Sun Workstations, I could speed things
up by using PvmDataInPlace as the "encoding scheme" of my send buffers.
These slaves are always sending exactly the same data to each other, and
so why go to the trouble of packing it up everytime into a buffer. 
	I am also timing (wall time) each slave to see how long each
executes before finally sending their final results to the master. The
suprising thing is, that for a 10X10X10 matrix it takes the slaves using
the one time PvmDataInPlace packing ~17 seconds to do their 90-98 iterations.
But having the slaves use PvmDataRaw every iteration took only ~7 seconds.

	The code was:
		pvm_initsend(PvmDataInPlace);
		pvm_pkint (&me, 1, 1);
		pvm_pkfloat( &(my_data[0][0]), (Height+2)*(Width+2), 1);
		for( msgtype = 1; ; msgtype++) {
			.  .  .
			if (me > 1) pvm_send(front_tid, msgtype);
			if (me < N) pvm_send(back_tid, msgtype);
			.  .  .
		}

				vs.

		for (msgtype = 1; ; msgtype++) {
			.  .  .
			pvm_initsend(PvmDataRaw);
			pvm_pkint(&me, 1, 1);
			pvm_pkint( &(data_status[1]), 1, 1);
			pvm_pkfloat( &(me_data[0][0]), (Height+2)*(Width+2), 1);
			if (me > 1) pvm_send(front_tid, msgtype);
			if (me < N) pvm_send(back_tid, msgtype);
			.  .  .
		}
	Width and Height are #defined to be 10 in this case, so the
message was 2 integers and 144 floats.

	I ran each version several times, after making sure that there
were no other differences, and the results were consistantly around these
times. It seems that for some reason, it took more that 2X longer to
use the in place method.
	I should say that I was running all process on a single Sparc2
machine. This may have some impact, but I ran both versions on the same
machine within minutes of each other, and with no other user processes 
running at the same time. 
	Does anyone have a guess as to why this might be the case?
Why should it take longer to copy directly from memory 2 integers
and 144 floats, compared to packing them into a buffer prior to sending
them??
 
       _/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/
      _/   Stuart D. Blackburn, Computer Science Graduate Student   _/
     _/          University of North Carolina at Charlotte         _/
    _/    Graduate Teaching Assistant (CSCI 1201 and 1202 Labs)   _/
   _/      225 E. North St. (PO Box 1012), Albemarle, NC 28002   _/
  _/         Home: (704) 982 0763          Office: 547-4574     _/
 _/E-mail:sdblackb@uncc.edu (http://www.coe.uncc.edu/~sdblackb)_/
_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/


