Newsgroups: comp.parallel.pvm
From: vetter@I_should_put_my_domain_in_etc_NNTP_INEWS_DOMAIN (Philippe Vetter)
Subject: Pvmd selectes a bad architecture type under XPVM & the pvm console
Organization: Laboratoire ERM/PHASE, ENSPS, ULP Strasbourg.
Date: 16 Aug 1995 18:42:40 GMT
Message-ID: <40te71$h76@apopi.u-strasbg.fr>

Dear PVM users,

I'm using PVM 3.3.7 and XPVM 1.1.1. with a network of PCs under Linux and RS6000s
under AIX.

I have a virtual machine of 2 different workstations : 1 Linux & 1 RS6000.
pvm> conf
2 hosts, 2 data formats
                    HOST     DTID     ARCH   SPEED
                    erm1    40000    LINUX    1000
                     doc    80000     RS6K    1000

When I'm launching the "fork" programm (1 master process who spawnes 3 slave 
processes with PvmTaskDefault option) from XPVM or the console with the 
command : spawn -> fork) PVM runs the master process and 2 slave processes on 
one workstation (independently LINUX or RS6K) and when PVM tries to run the third 
slave process, the pvm_spawn function returns an error code of -7 which
means : "Specified executable cannot be found".

pvm> spawn -> fork
[1]
1 successful
t80001
pvm> [1:t80002] libpvm [t80002]: pvm_upkint(): End of buffer
[1:t80003] libpvm [t80003]: pvm_upkint(): End of buffer
[1:t80002] libpvm [t80002]: pvm_upkint(): End of buffer
[1:t80002] Je suis le fils : 80002 . Je viens d'envoyer un message a 80001 , mon pere. == Demon : 80000
[1:t80003] libpvm [t80003]: pvm_upkint(): End of buffer
[1:t80003] Je suis le fils : 80003 . Je viens d'envoyer un message a 80001 , mon pere. == Demon : 80000
[1:t80002] EOF
[1:t80001] je suis le pere 80001 
[1:t80001]  Task_ID: 80002      Task_ID: 80003   -7 <------ ***** Error *****
[1:t80001] Length 4, Tag 11, Message recu de  t80002
[1:t80001] Length 4, Tag 11, Message recu de  t80003
[1:t80003] EOF
[1:t80001] EOF
[1] finished

To debug this problem I have start PVM with the debug mask "Task management", and
when I'm looking at the /tmp/pvml.509 file, it seems to be that PVM wants to start 
the last slave process on the second workstation without changing the PVM_ARCH 
environement variable. 

For example when PVM has first start the master process and
2 slave processes on the Linux workstation with PVM_ARCH set to LINUX (that is,
PVM looks for the executable file in ~/pvm3/bin/LINUX) it then wants to start
the third slave process on the RS6OOO workstation but it looks for the executable
on the ~/pvm3/bin/LINUX directory, rather than to look on the ~/pvm3/bin/RS6K
directory. 

It seems to be a confusion between the 2 different PVM_ARCH values (LINUX & RS6K)
of the 2 workstations.


If you have an idea of how to solve this problem, it would be very helpful for me.

In advance Thank You

philippe.



This is a trace of the /tmp/pvml.509 file :

[t80040000] ready  3.3.7   Wed Aug 16 17:08:13 1995
[t80040000] tm_tickle() # 6 4
[t80040000] tm_tickle() debugmask is 4 (tsk)
[t80040000] tm_tickle() # 1
[t80040000] ht_dump() ser 2 last 3 cnt 2 master 1 cons 1 local 1 narch 2
[t80040000]  hd_dump() ref 1 t0 n "pvmd'" a "" ar "LINUX"
[t80040000]            lo "" so "" dx "" ep "" bx "" wd "" sp 1000
[t80040000]            sa 130.79.74.61:1475 mtu 4096 f 0x0 e 0 txq 0
[t80040000]            tx 1 rx 3 rtt 1.000000
[t80040000]  hd_dump() ref 1 t40000 n "erm1" a "" ar "LINUX"
[t80040000]            lo "" so "" dx "" ep "pvm3/bin/$PVM_ARCH:$PVM_ROOT/bin/$PVM_ARCH" bx "$PVM_ROOT/lib/debugger" wd "/home/vetter
[t80040000]            sa 130.79.74.61:1474 mtu 4096 f 0x0 e 0 txq 0
[t80040000]            tx 1 rx 1 rtt 1.000000
[t80040000]  hd_dump() ref 1 t80000 n "doc" a "" ar "RS6K"
[t80040000]            lo "" so "" dx "" ep "" bx "" wd "" sp 1000
[t80040000]            sa 130.79.68.70:4294 mtu 4096 f 0x0 e 0 txq 0
[t80040000]            tx 4 rx 2 rtt 0.442140
[t80040000] tm_tickle() # 2
[t80040000] task_dump()
[t80040000]      tid     ptid flag    pid soc out     wait   outtid   trctid    sched   es
[t80040000]    40001        0    4  19815   5  -1        0        0        0        0    0
[t80040000]  txq:pkt      src      dst flag    len    ofs
[t80040000] 
[t80040000] tm_spawn() host set:
[t80040000] ht_dump() ser 0 last 3 cnt 2 master 0 cons 0 local 0 narch 2
[t80040000]  hd_dump() ref 2 t40000 n "erm1" a "" ar "LINUX"
[t80040000]            lo "" so "" dx "" ep "pvm3/bin/$PVM_ARCH:$PVM_ROOT/bin/$PVM_ARCH" bx "$PVM_ROOT/lib/debugger" wd "/home/vetter
[t80040000]            sa 130.79.74.61:1474 mtu 4096 f 0x0 e 0 txq 0
[t80040000]            tx 1 rx 1 rtt 1.000000
[t80040000]  hd_dump() ref 2 t80000 n "doc" a "" ar "RS6K"
[t80040000]            lo "" so "" dx "" ep "" bx "" wd "" sp 1000
[t80040000]            sa 130.79.68.70:4294 mtu 4096 f 0x0 e 0 txq 0
[t80040000]            tx 4 rx 2 rtt 0.442140
[t80040000] forkexec() new task t40002 pid 19860 pfd=10
[t80040000] loclconn() accept from 127.0.0.1:1502 sock 11
[t80040000] tm_conn2() reconnect task t40002
[t80040000] work() error reading from t0, marking dead
[t80040000] task_free() t0
[t80040000] forkexec() stat failed <pvm3/bin/RS6K/fork>
[t80040000] forkexec() didn't find <pvm3/bin/RS6K/fork>
[t80040000] task_free() t40003
[t80040000] dm_halt() from (erm1), halting...
[t80040000] work() pvmd halting
[t80040000] pvmbailout(0)
[t80040000] sending FIN|ACK to all pvmds
______________________________________________________________________________

Philippe VETTER                 e-mail : vetter@erm1.u-strasbg.fr

Universite Louis Pasteur
Ecole Nationale Superieure de Physique de Strasbourg
Laboratoire ERM/PHASE

Boulevard Sebastien Brant
67400  Illkirch Graffenstaden

FRANCE
______________________________________________________________________________

