Newsgroups: comp.parallel.pvm
From: orenl@mangal.cs.huji.ac.il (Oren Laden)
Subject: FAILURE - large amount of communication
Organization: The Hebrew U. of Jerusalem, Computer Science Dept.
Date: 23 Jan 1995 11:54:00 GMT
Message-ID: <3g05cp$fk4@pretzel.cs.huji.ac.il>

Hi,

I've got a strange problem with PVM, when running a simulation involving
intensive communications.
The program is splitted into several processes. During each iteration each
task performs some local calculations and exchanges data concerning its
borders with the neighbouring tasks. One in 1a while,a special task
collects all the data from all tasks and presents a global view of the
simulation.
The program runs all right for a minute or so, and then PVM collapses, the
daemons are killed and the simulation is, of course, terminated.
Also, I get the following messages on screen, from PVM:

libpvm t40001: mxfer() EOF on pvmd sock
libpvm t40001: pvm_nrecv(): Can't contact local daemon

Does anyone have an explanation for this behaviour ?
Did you have similar problems ?

Recently I read some articles in this news-group discussing the inablity
of PVM the recover lack of memory resources in the deamon. I think there
is some correlation between those cases and mine.

The program ran on a Pentium under BSDi unix, with 64MB RAM.


Thanks for any help,

Oren.



