Newsgroups: comp.parallel.pvm
From: pkohli@cc.gatech.edu (Prince Kohli)
Subject: What does this error mean?
Organization: College of Computing, Georgia Tech.
Date: 28 Jun 1994 16:23:30 -0400
Message-ID: <2uq0s2$mqi@forge.cc.gatech.edu>

I have an application that runs on top of pvm 3.2.2. The problem is
that at random times, i.e., sometimes very soon after the program
starts and sometimes much later, the host console will give this error:

[t80040000] netoutput() timed out sending to <machine_name> after 23, 194.241408
[t80040000]  hd_dump() ref 1 t100000 n <machine_name> ar "SUN4" lo ""
[t80040000]            sa 130.207.114.58:3211 mtu 4096 f 0x0 e 0 txq 2
[t80040000]            tx 65537 rx 0 rtt 0.003648

The 130.207.114.58 is the address of <machine_name>.

After this, though the pvm daemon is still running there, the master host
thinks it is dead and removes it from the config, and all later packets
from it are marked bogus packets. And all this of course screws up my
application.

Anyone have any idea what is going on? Anyone seen this error before?
Know what it means? BTW, this is on top of Sun Sparcs. ANY sort of hint
will be appreciated. And this is very reproducible. And as certain to
happen as anything can be.

Thanks,

-Prince


