Newsgroups: comp.parallel.pvm
From: pbultink@allserv.rug.ac.be (Patrick Bultinck)
Subject: Linux parallel trouble
Organization: University of Ghent, Belgium
Date: 21 Nov 1996 14:19:03 GMT
Message-ID: <571ocn$cg7@infoserv.rug.ac.be>

Dear,

I am trying to get a Linux box to work in parallel with my
IBM RS/6000 Workstations to test parallel GAMESS 
(an ab initio program) runs.
It uses TCGMSG as a parallel tool, but I think the problems are general, so
a posting here may be appropiate.

TCGMSG comes with a very small test program, that says 'hello from Node X'.

I had to put up a .rhosts file. I think it works since I can do 
rsh apple -me ls from the Linux machine, named Banana
and get the results of ls. Apple is a SUN machine, banana is Linux.

I can run the program from a SUN or IBM machine as kick-off place. So
the small program gets the program started on Linux, and it returns
sense results. In short typing 'parallel hello' on apple, causes
the program to run on apple AND on banana.

Hello from node 1
Hello from node 2
Hello from node 3
Hello from node 4
Hello from node 5

When I do the VERY same from the Linux machine (banana), going to the SUN
or IBM machines, I get in big trouble.

My .rhosts file on banana (and on apple the same) looks like :
banana.rug.ac.be moi
apple.rug.ac.be me

The error I keep getting is :


 Creating: host=banana, user=moi,
           file=/moi/tcgmsg/ipcv4.0/hello.x, port=1193
 Creating: host=apple, user=me,
           file=/staff/fwet/me/tcgmsg/ipcv4.0/hello.x, port=1195
10: RemoteCreate: in child after execv -1 (0xffffffff).
system error message: No such file or directory
 Creating: host=banana, user=moi,
           file=/moi/tcgmsg/ipcv4.0/hello.x, port=1193
 Creating: host=apple, user=me,
           file=/staff/fwet/me/tcgmsg/ipcv4.0/hello.x, port=1195
10: interrupt
 0: interrupt

The same .rhosts files exist on apple and banana, and the .p file
instructing the machine where all files are, are also the same, except
for the ordering on banana and apple. The files /moi/tcgmsg... and
/staff/fwet... on the respective machines also exist, and are executable.

When I go to banana, and instruct it to run 5 processes on banana alone,
it also works fine !

When making the tcgmsg programs, and linking parallel stuff, I had to 
link to libipc on banana.
From the faq's I learned that is part of libc (and ar t showed so for libc).

Could someone help me a little on the way ?


Patrick Bultinck
QuantumChemistry
Ghent, Belgium

