Newsgroups: comp.parallel.mpi
From: lusk@donner.mcs.anl.gov (Rusty Lusk)
Subject: Re: causality violation
Organization: Argonne National Laboratory
Date: 18 Dec 1996 16:47:49 GMT
Message-ID: <59977l$4si@milo.mcs.anl.gov>

In article <596j00$ubg@www.univie.ac.at>, hejc@apap2.pap.univie.ac.at writes:
|> Hi,
|> 
|>  I often use 'nusphot' or 'VAMPIR' to investigate my
|> MPI programs. Two-point communications are displayed with
|> arrows, pointing from the sender process to the receiver
|> process. 
|>  In some of my programs, the receiving of the message occurs
|> before the sending, and the arrows are pointing to the past, which
|> should be impossible.
|> 
|>  Can somebody explain me what's the origin of this 'causality
|> violation'?
|>  Is it a failure of the tracefile generation, or a failure
|> of the MPI program, and what conclusions can be drawn from the
|> occurrence of these time-reversed messages?

At least for nupshot, all it means is that the clock adjustments that are
attempted at the end of the run, when separate log files from the various
processes are merged, didn't get it exactly right.  This not a trivial 
thing to do, and although mpich makes a stab at it, it can be thwarted by
various combinations of conditions.  It is an area that we are always trying
to improve.  But it does not indicate any sort of deep error.  Try for a
mental shifting of the time lines in your nupshot view.  A future version
might allow you to actually slide them around on the screen, but not yet.

- Rusty 

