In many cases, it is impossible to divide a parallel data structure so that
each process has exactly the same amount of data.  It may not even be
desirable, if the amount of work to be done varies.   Modify your code so that
each process can have a different number of rows of the distributed mesh.
