Newsgroups: comp.parallel.mpi
From: Marcus Dormanns <marcus>
Subject: Problems with non-blocking send/probe
Organization: RWTH -Aachen / Rechnerbetrieb Informatik
Date: 28 Aug 1995 11:28:29 GMT
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <41s98t$au9@news.rwth-aachen.de>

Hi everybody,

I have some problems with non-blocking send/probe operations. Maybe someone
can help me.


platform:
---------

SUN Classic Workstations, running Solaris 2.4, connected by Ethernet.
As far a my system administrator can tell me, it is all configurated 
absolutely normal.
MPI 1.0.10 (before I had 1.0.8)

problem:
--------
I wrote a small test program, including only things necessary to reproduce the 
problem:



#include <mpi.h>
#include <stdlib.h>
#include <stdio.h>
#include <math.h>


void main(int argc, char* argv[])
{
  int nproc,myproc;
  int data[4000];
  int buffer[4000];
  int i;
  MPI_Request request0, request1;
  MPI_Status status;
  int flag;
  long int send = 0;
  long int received = 0;
  double a,b;
  int tag = 0;
     
    
  
  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &nproc);
  MPI_Comm_rank(MPI_COMM_WORLD, &myproc);

  for(i=0;i<4000;i++)
     data[i] = i;
  
  request0=MPI_REQUEST_NULL;
  request1=MPI_REQUEST_NULL;


  
  do
  {
    printf("proc=%i Next round\n",myproc);

    /**************** send messages to the neighbors ******************/
    if (myproc!=0)
    {
      MPI_Wait(&request0,&status);
      printf("proc=%i Before send to the left\n",myproc);
      MPI_Isend(data,4000,MPI_INT,myproc-1,tag,MPI_COMM_WORLD,&request0);
      send++;
    }
    if (myproc!=nproc-1)
    {
      MPI_Wait(&request1,&status);
      printf("proc=%i Before send to the right\n",myproc);
      MPI_Isend(data,4000,MPI_INT,myproc+1,tag,MPI_COMM_WORLD,&request1);
      send++;
    }


    /************** do some dymmy work ********************/
    b = 123.456;
    for (i=0;i<10000;i++)
       a = sqrt(b)+sin(b)/b;

    /************* wait, if the neighbors are to much behind ************/
    if (received<send-4)
    {
      printf("proc=%i I am waiting\n",myproc);
      MPI_Probe(MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&status);
    }


    /**************** receive everything available *******************/
    printf("proc=%i I probe nonblocking for an message\n",myproc);
    MPI_Iprobe(MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&flag,&status);
    while(flag!=0)
    {
      printf("proc=%i I receive one message\n",myproc);
      MPI_Recv((void*)buffer,4000,MPI_INT,MPI_ANY_SOURCE,tag,
	       MPI_COMM_WORLD,&status);
      received++;
      printf("proc=%i I probe nonblocking for an message once more\n",myproc);
      MPI_Iprobe(MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&flag,&status);
    }

    
  } while(1==1);
 
}


It is just a string of processes, sending nonblocking messages to their
neighbors. They only block, if a preceeding send has not finished or if they 
are much faster than their neighbors. They probe for incomming messages and
receive all messages available. The behavior of this program is as follows:
Eventually, one of the processes (although not everytime and sometimes only
after a lot of iterations) enters one of the nonblocking MPI_Isend or 
MPI_Iprobe operations and never returns from it. Consequently, the other
processes block eventually, because they get no messages any longer.
Looking at the process that causes the problem, it is still busy (consuming 
the whole power of the workstation). Attaching the debugger (gdb) to it 
(after it is clear, that the problem occured), the stack locks like:

#0  0xef62a678 in _ioctl ()
#1  0xef7a72cc in msgpeek ()
#2  0xef7a75c0 in _s_soreceivexx ()
#3  0xef7aa8c8 in recv ()
#4  0x56e04 in sock_msg_avail_on_fd () at profile.c:104
#5  0x56c50 in socket_msgs_available () at profile.c:104
#6  0x4bfa4 in p4_messages_available () at profile.c:104
#7  0x41f3c in MPID_P4_check_incoming () at profile.c:104
#8  0x44788 in MPID_P4_post_send () at profile.c:104
#9  0x25758 in MPI_Start () at profile.c:104
#10 0x2538c in MPI_Isend () at profile.c:104
#11 ... somewhere in my program

So, it is busy in _ioctl and never return from it ! Mainly, I tested it with 
3 machines. The bug never occures, if I place all three processes on one 
machine.

I would be very glad, if you could help me or give me a hint, where to search
for the bug (MPI/operating system/my program). Thanks in advance.

Regards,

Marcus.

----------------------------------------------------------------------------
|  _      :  Marcus Dormanns, Chair for Operating Systems
|_|_`__   :  RWTH Aachen, Kopernikusstr. 16, D-52056 Aachen
  | |__)  :  Tel.   : +49-241/80-7617  |  Fax : +49-241/8888-339
    |__)  :  e-Mail :  marcus@lfbs.rwth-aachen.de
          :  http://www.lfbs.rwth-aachen.de/~marcus/marcus.html


