Newsgroups: comp.parallel.mpi
From: "Joe Beda" <jbeda@hmc.edu>
Subject: Installation: MPI on SGI Indigo^2?
Organization: The Claremont Colleges
Date: 20 Sep 1996 01:36:25 GMT
Message-ID: <01bba694$ae6ed3d0$792cad86@macel>

I cannot, for the life of me, get MPI to run on a cluster of SGI Indigo^2's
with ch_p4.  Has anyone else had any problems.  I've tried various
configure options along with using the secured server vs. rsh.  

Here is a sample of the error that I'm getting:
===================================
0:wayland:mpitest > mpirun -np 2 pi
rm_0_5869: (0.019143) process not in process table; my_unix_id = 5869
my_host=eakins
p0_15961:  p4_error: net_recv recv:  EOF on socket: 882
bm_list_15962:  p4_error: interrupt SIGINT: 2
rm_0_5869: (0.019500) Probable cause:  local slave on uniprocessor without
shared memory
rm_0_5869: (0.019671) Probable fix:  ensure only one process on eakins
rm_0_5869: (0.019766) (on master process this means 'local 0' in the
procgroup file)
rm_0_5869: (0.019857) You can also remake p4 with SYSV_IPC set in the
OPTIONS file
rm_0_5869:  p4_error: p4_get_my_id_from_proc: 0
rm_0_5869:  p4_error: interrupt SIGSEGV: 11
rm_l_0_5870:  p4_error: interrupt SIGINT: 2
Killed
===================================

I'm not running more than one process per machine. In face, I've tried this
with a custom p4pg file and still encounter the same problem.  I've also
tried recompiling with SYSV_IPC in the OPTIONS file under the p4 tree. 
Here is my configure options: 

./configure -arch=sgi -device=ch_p4 -c++ -nompe -nompedbg -nodevdebug
-nof77 -cc=gcc

I've tried using -comm=shared also but this causes p4 to be compiled with
P4ARCH set to SGI_MP which causes some funky compiler settings (-cckr).

Here is some system information:
===================================
0:wayland:mpich > uname -a 
IRIX wayland 5.3 11091812 IP22 mips
0:wayland:mpich > hinv
Iris Audio Processor: version A2 revision 1.1.0
1 150 MHZ IP22 Processor
FPU: MIPS R4010 Floating Point Chip Revision: 0.0
CPU: MIPS R4400 Processor Chip Revision: 5.0
On-board serial ports: 2
On-board bi-directional parallel port
Data cache size: 16 Kbytes
Instruction cache size: 16 Kbytes
Secondary unified instruction/data cache size: 1 Mbyte
Main memory size: 64 Mbytes
EISA bus: adapter 0
Integral Ethernet: ec0, version 1
Integral SCSI controller 1: Version WD33C93B, revision D
Integral SCSI controller 0: Version WD33C93B, revision D
Disk drive: unit 1 on SCSI controller 0
Graphics board: GU1-Extreme
===================================


Any help that could be offered would be much appreciated!

Joe Beda
Harvey Mudd College
jbeda@hmc.edu

