Newsgroups: comp.parallel.pvm
Path: ukc!uknet!bnr.co.uk!pipex!howland.reston.ans.net!gatech!news-feed-1.peachnet.edu!umn.edu!kilo!mjlin
From: mjlin@kilo.cs.umn.edu (Mengjou Lin)
Subject: Re: Timing Anomaly
Message-ID: <CJnyJx.7w3@news.cis.umn.edu>
Sender: news@news.cis.umn.edu (Usenet News Administration)
Nntp-Posting-Host: kilo.cs.umn.edu
Organization: University of Minnesota, Minneapolis, CSci dept.
Date: Sat, 15 Jan 1994 09:00:56 GMT
Lines: 56


In Article: 1228 of comp.parallel.pvm From Kell S|nnichsen:
> I've had the same problem with 2 SUN Solaris machines connected with an
> ethernet.  With a ping-pong of N bytes between them there were two plateaus
> situated at 4073-5512 bytes and 8145-9584 bytes.  They were very distinct
> with times ~10 times the surrounding values of N.

This problem has been addressed several times and has been solved by pvm3.2.5.

In Article: 1057 of comp.parallel.pvm From Joerg-Thomas Pfenning:
> As I stated some time ago, this is a longstanding bug in the PVM
> TCP transmission mode. I posted a patch to some people and hope
> it will get incorporated into a future release.

The magic is to enable TCP_NODELAY after settup a TCP connection 
(PvmRouteDirect) between 2 computers.

In Article: 1055 of comp.parallel.pvm From Harro Kremer:
> This is a well-known phenomemon of TCP. It is caused by the part of the
>congestion control mechanisms of TCP that is called the Nagle algoritm (see RFC
> 813).
> This algorithm will try to avoid sending small fragments of TCP packets in
> ethernet packets. Hence the occurrence of the phenomemon around the ethernet
> maximum packet size.
> The phenomenon is extensively discussed in the following article
> Jon Crowcroft and Ian Wakeman and Zheng Wang and Dejan Sirovica
> Layering Harmful
> IEEE Network Magazine, january 1992, pages 20-24

In their paper, they tried to enable TCP_NODELAY but this abnormaility still
exist.  Then they fixed the problem by setting up the low water mark of
sending socket buffer.  They called the effect of mismatch of protocol layers.
However, the tcp version of sun os 4.1.3 has already adopted the low water
mark.  And the solution is to enable TCP_NODELAY instead.
This abnormality happened to sun os 4.1.3.  But it's not true for sgi iris.

I try to explian the situation as follows:
	1) The sender chops the message into smaller piece of packets (Nagle
	algorithm),
	2) The sender sends out the first small packets and expects that the 
	corresponding receiver sends acknowledgement back,
	3) The receiver gets the small packet and expects more packets
	and in the same time the receiver starts the delayed acknowledge
	timer, 
	4) The acknowledge timer of receiver expires after 200 milli
	seconds and sends the acknowledge back to sender.
After we enable the TCP_NODELAY for sender, it breaks the sequence 2 and 
the timing abnormality disappears.

Is there any flaw here?  I will very appreciate your comment on this.  Thanks.

Mengjou Lin
University of Minnesota
--

mengjou

