Systems which must use point-to-point communication for MPI_Allreduce 
will probably display a log(p) behavior for
the cost of an MPI_Allreduce.  Systems that can use either a special network or
shared memory may have faster reductions with different scaling.
<P>
Some of these optimizations (in particular, special networks) apply only to
MPI_COMM_WORLD or a communicator that contains the same processes as
MPI_COMM_WORLD.  
