In some MPI implementations, the blocking operations have lower latency than
the nonblocking operations.  This is due to the additional cost of allocating
setting, and freeing an MPI_Request.
