Newsgroups: comp.parallel.mpi
From: Chris Walshaw <C.Walshaw@gre.ac.uk>
Subject: Help - error messages in MPICH
Organization: School of Maths, University of Greenwich, U.K.
Date: 9 Nov 1995 14:36:56 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <47t3m8$gsh@jupiter.gre.ac.uk>

I am running a parallel code using MPICH (ch_p4) Version 1.0.10 on a Sun
running SunOS 5.4. I run between 3 and 9 processes all on the same workstation
(for testing purposes). Mostly it works perfectly but occasionally
(and frequently enough to be irritating) it crashes (almost always early
on in the run). The sorts of error messages are

p4_23302:  p4_error: net_recv read, errno = : 0
rm_l_821416_23303:  p4_error: interrupt SIGINT: 2

or

p0_2753:  p4_error: interrupt SIGINT: 2
bm_list_2754:  p4_error: interrupt SIGINT: 2

or


p1_18412:  p4_error: net_recv read, errno = : 0
rm_l_2046_18413:  p4_error: interrupt SIGINT: 2

Do those error codes mean enough to anyone to suggest what might be causing
the crashes.

	Thanks in advance

	Chris Walshaw
	C.Walshaw@gre.ac.uk


