Newsgroups: comp.parallel.mpi
From: fazilah@scs.leeds.ac.uk (F Haron)
Subject: Never received what was sent.
Organization: The University of Leeds, School of Computer Studies
Date: Tue, 4 Jun 1996 12:49:26 +0100 (BST)
Message-ID: <1996Jun4.114926.16056@leeds.ac.uk>

Hi,

I ported my Divide and Conquer kernel (in C & MPI) from the SGI
Power Challenge to the T3D.  The test application used is mergesort
- sort array of integers.  The program works fine on the SGI shared
memory but failed on the T3D.

From the debugger, the processes stop at the following
communication point:

        a) Process 0 - send the root problem to process 2 and waits
           for the final result.  It is behaving as it should.

        b) Process 1 - keeps the id of processes that need more work.
           It sends the id of idle process to any worker process that
           make a request. This also behaves as it should.

        c) Process 2 - 1st worker process received the root problem and
           keep spliting it until it is overloaded and request
           from process 1 the id of an underloaded processes. It sent
           a request sucessfully (ie process 1 received it) BUT never
           get the reply eventhough process 1 has sent it. The 
           communication between process 1 & 2 at this point is
           blocking send/recv.

        d) Process 3 - waiting to receive work. Behaves as it should.

What are the possible causes of the failed communication? The tags and
destinations are right.  And the system worked fine on the SGI Power
Challenge but seem to face communication problem on the T3D.  The
processes are not dead - the program seem to be hanging forever.


Thanks.


Fazilah

ps Please email your answer directly to me.

