Newsgroups: comp.parallel.pvm
From: ladanyi@cs.cornell.edu (La'szlo' Lada'nyi)
Subject: Re: [HELP}: Bus Error doing PVM_SPAWN()
Organization: Cornell University Computer Science Department
Date: 21 Nov 1995 15:24:24 -0500
Message-ID: <48tcho$4c6@munin.cs.cornell.edu>

pit@veilchen.informatik.rwth-aachen.de (Peter Chertman) writes:

>Antonio Scotti <scotti@centauro.upc.es> writes:

>> Hi,

>  Hello Antonio,

>> does anyone know what may cause a program to crash while executing
>> pvm_spawn()?
>> I checked all of pvm_spawn() parameters but everything looks ok
>> here is the piece of code under investigation:

>>    info = pvm_spawn("pxgj", (char**)0, 0, "", NTASKS-1, &tids[1]); 
>>    if(info != NTASKS-1 || info < 0){
>>      pvm_perror("pxgj");
>>      pvm_exit();
>>      cout<<"pvm_spawn failed: info = "<<info<<" n = "<<n<<endl;
>>      return -1;
>>    }  

>  I don't imply you're an idiot and I don't want to tell you how to create a
>C-program, but the only source of error can be the "&tids[1]" -part. The way
>you wrote your code, "tids" must be of type

>    int* tids[2];
>    tids[1] = new int [NTASKS];

>  or

>    int  tids[2][NTASKS];

>  If you don't need a two dimensional array, try:

>    int tids[NTASKS];
>    pvm_spawn ("pxgj", (char **) 0, 0, "", NTASKS -1, &tids);

>> The other thing is that it doesn't even return an Error message, (apart for the
>> "Bus Error", system message), but it just runs down before a value for 'info'
>> could be produced.

>  It is likely that the spawn call fails when filling a non-existing array.


>  Hope that helped, pit

I don't know what the problem is, but the previous answer is incorrect.
&tids[1] is equivalent to tids+1 and is the same type of pointer as tids is.
Thus the original code (from this point of view) looks ok (assuming tids 
is int*).

If I have to guess where the error is, I'd say here:
info = pvm_spawn("pxgj", (char**)0, 0, "", NTASKS-1, &tids[1]); 
                                       ^^
If you have no preference on which machine the slaves should be started then 
you should just give (char *)NULL instead of "". I can imagine that the "" is
taken as a character string of 0 length and pvm_spawn tries to rsh to the 
machine "". And it is also possible that for the empty machine name rsh
reacts with bus error instead of a normal error message.

As I have said, this is just a guess.


Hope this helps,
   Laci
-- 
----------------------------------------------------------------------
| Laci Ladanyi           | God made one mistake when he created man: |
| ladanyi@cs.cornell.edu |     He wrote self-modifying code ...      |
----------------------------------------------------------------------

