Newsgroups: comp.parallel.pvm
From: denham@wg.waii.com (Scott Denham)
Subject: PVM hangups
Summary: New user problem with PVM3.3.3
Keywords: hang
Organization: Western Geophysical, Div. of Western Atlas Int'l, Houston, TX
Date: 5 Jan 1995 21:40:25 GMT
Message-ID: <3ehp09$mvb@airgun.wg.waii.com>


     We are a new PVM user, using PVM 3.3.3.  In our application,
a master task running on one node of an SP-2 spawns multiple slave
tasks, each running on a separate node of the SP-2 (one of the
slaves runs on the same node as the master).  The master sends
data to each slave via pvmfinitsend/pvmfsend, then waits for a
message from any slave indicating that its work (which may take
hours or days) is done.  The master then sends a message requesting
that the slave send back its results, which are contained in hundreds
of messages, each with a unique message tag.  The slave then executes

      DO I=1,N
         CALL PVMFINITSEND(PvmDataInPlace)
         CALL PVMFSEND(...MSGTAG(I)...)
      ENDDO

while the master simultaneously executes

      DO I=1,N
         CALL PVMFRECV(...MSGTAG(I)...)
      ENDDO

When all the data from one slave has been received, the master waits
for a message from the next slave whose work is complete, which may
take hours.

     In some cases, when running on dedicated or lightly loaded nodes,
the application completes successfully.  When running in a normal
production job mix, however, the application "hangs" during execution
of the loops shown above when data is being sent back from one of the
slave tasks.  In a typical case, all the data from one slave task is
sent and received successfully, then the hang occurs during the data
transmission for the next slave task.  The slave appears hung in 
send J (for example, J=65), while the master is hung in receive K
where K.LT.J (for example, K=32).  There are no error indications
from any of the PVM calls preceding the ones that hang.

     The PVM User's Guide indicates that this type of thing might
be caused by a memory shortage.  If so, would PVM or the application
be the likely culprit?  Is there a way to increase the memory
available to PVM?  Do you have any suggestions on how to confirm
whether there is a memory problem, or on the best method to debug
this type of problem? 

     Thanks.

          Stan Goldberg (stan.goldberg@wg.waii.com)
          Scott Denham (scott.denham@waii.com)
          Western Geophysical Co.
          Houston, TX


