Use MPI_Wtime to benchmark the performance of the system memcpy routine on
your system.  Generate a table for 1, 2, 4, 8, ..., 524288 integers showing
the number of bytes, time to send, and the rate in Megabytes per second.
Use unaligned data items; that is, make sure that the low-order bits of the
source and destination addresses are different.  Also, ensure that the 
source and destination are "well separated" in memory.
<P>
You should perform enough memcpy operations to take a good fraction of a
second; the sample solution does <CODE>100000/size</CODE> iterations for
<CODE>size</CODE> integers.  It also repeats the test 10 times and reports the
best time.
