Newsgroups: comp.parallel.pvm
From: glass@tavosf.iso.dec.com (Yossi Glass)
Subject: libpvm: pvmmctl() connect: Connection refused
Organization: Digital Israel
Date: 11 Jul 1995 07:42:08 GMT
Message-ID: <3tta0g$ai8@mrnews.mro.dec.com>

I am running a PVM application on a single system, and having some 
problems:

The application is running fine with one parent and 2 children. When I try
to run it with 5 children, the parent starts, and spawns all 5 children.
However, after few seconds, all children become idle.
 
Looking into the /tmp/pvml.xx file, I see the following messages (for the 
5-children run):

[t80040000] [t40004] libpvm [t40004]: pvmmctl() connect: Connection refused
[t80040000] [t40006] libpvm [t40006]: pvmmctl() connect: Connection refused

[When I start the application with 10-children I get this message for 6 of 
 the children].

When I tried to find out where this error is happening (by printing from the
children's code), it seems that it happens when 2 of the children are calling
the following pvm call:
          call pvmfsend(tids(n),msgtag,info)
  
These two children never return from this message (At least that is what I
think, because they never execute the following two statements: one which
tests for pvm-error, and the other which prints a message to the standard 
output. 

I am attaching the piece of code where this problem happens (or at least I
think that it happens there).

Have you seen anything like this before?

Thanks,
Yossi.

--------------------------------------------------------------------------
Yossi Glass                           phone (972) 9-593254  
Digital Equipment (DEC) Ltd. Israel   fax   (972) 9-542530
email: glass@tavosf.iso.dec.com
--------------------------------------------------------------------------

This is the code (with additional write statements), where the error seems to 
happen (The children for which the error message is printed in the 
/tmp/pvml file never return from the pvmfsend call; Neither do they print
the '**ERROR...' message).

		-----------------------
      do 10 n=1,Nhosts
        if (n.eq.myHostNo) goto 20
         call pvmfinitsend(pvmraw,bufid)
          do k=1,kcut
           do i=1,icut
            call pvmfpack(real8,buf,jmax,imax,info)
           enddo
          enddo
         write (6,*) "BEFORE pvmsend(tids(n), n;tids = "
         write (6,*) "                     ",n,tids(n)
         call pvmfsend(tids(n),msgtag,info)
         IF( info .LT. 0 ) CALL PVMFPERROR( '**ERROR pvmsend', info )
         write (6,*) "AFTER pvmsend(tids(n), n;tids = "
         write (6,*) "                    ",n,tids(n)

		-----------------------





