Newsgroups: comp.parallel.pvm
From: orenl@sunset.ma.huji.ac.il (Oren Laden)
Subject: Communication speed ? !
Date: 4 Apr 1995 15:55:37 GMT
Message-ID: <3lrq5p$anj@shum.cc.huji.ac.il>

Hello,

I am trying to test PVM's overhead in communication time, and I discovered
a very strange phenomena.
I use 4-16 machines (Pentiums), and spawn one process per machine. The
processes are connected in a cyclic way, such that each has two neighbours.
Each process was executed on a seperate machine.
They al repeat the following for several iterations:
  - if I am even process:
 	send to the right, send to the left, receive from the left,
	receive from the right.
    if I am odd process:
	receive from the left, receive from the right, send to the right,
	send to the left.
  - do CPU work of several seconds.
I repeated the tests with variable number of processes and variabhle size
of messages. I implemented two version: one uses PVM and one uses 'rsh' and
regular TCP (INET) sockets. I measured the overall communication time,
computation time and the overall execution time.
I set the buffer size of the sockets (in the second version) to the maximum
and set TCP_NODELAY flag. In PVM I set the PvmDirectRoute option.
My prediction to the results was that PVM will be a bit slower for low
message size (1K ~ 16K) but will perform much worse when using larger
messages size (I intentionally used pvm_send() and not pvm_psend()).
The results are very surprising to me: for small number of processes (4-8)
PVM was better (!) for some strange reason, for message size up to 256K or
so. For large number of processes the results were as I predicted.
Now, the thing is that I looked at PVM sources and saw that it uses 32K
buffer for the sockets (less then mine) and TCP_NODELAY option. Also, it
has much more things to do while trying to send (get control messages from
the daemon, check for incoming messages etc...), thus incurring a certain
overhead.
But, in spite of that, for 4 processes, 4K, 8K and 16K message size the
unix version kept giving me a total of 5 seconds (for 30 iterations) whereas
PVM's times were climbing up gradually from about 1.5-2 to 5 seconds.
A carefull analysis of the results showed that the first 'receive' in each
iteration in the unix version took ~160 msecs, and only about a third of it
in the PVM version. This behaviour was pretty consistent.
First I thought it is something to do with sending and receiving twice in
each process, so I converted the program to:
repeat n time:
  - if I am even process:
 	send to the right, receive from the left,
    if I am odd process:
	receive from the left, send to the right,
  - do CPU work of several seconds.
BUT - that didn't help at all. In fact I was getting now two unexplained
features: one, that PVM was still better, this time with a larger factor.
Second, the total communication time was larger then before in the unix
version !
Again, there was a considerate difference between times of the 'receive'
operation.
I know I missed something, and probably something big. However, I don't
have a clue what it is (also, as I'm trying to solve this for some time
I always check the same things...).

Does anyone have any idea which explains this ?  How can I improve my naive
implementation of IPC in the unix version to work better ?
I will be grateful for any help, for this is quite urgent and vital for my
work.

Thanks a lot,

Oren.

                                                    ___
**************************************** |\/\/\/|  |   \
** Oren Laden   (orenl@cs.huji.ac.il) ** |      |  |___/   /\     ___
** ---------------------------------- ** |      |  |   \  /  \   |   \ _____
** Distributed Operating Systems lab. ** | (0)(0)  |___/ /----\  |___/   |
** The Hebrew University of Jerusalem ** c      _)      /      \ |  \    |
**      Jerusalem,  Israel 91904      **  | ,___|                |   \   |
****************************************  |   /                          |
                                         /____\     S  I  M  P  S  O  N
                                        /      \








