Newsgroups: comp.parallel.pvm
From: llewins@msmail4.hac.com (Lloyd J Lewins)
Subject: BUG: MPI Rev 2.0 - Sun Host with more than four Node processes
Organization: Hughes Aerospace Electronics Co.
Date: Fri, 20 Jan 1995 17:23:00 -0800
Message-ID: <llewins-2001951723000001@x-147-16-96-30.es.hac.com>

Problem Name: 
   Host remote physical memory SMB mapping limitation.

Problem Ref. Number: 
   10002

Affected MPI Versions: 
   2.0

Planned Fix:
   Will be fixed in Version 2.1

Affected Applications: 
   Any program which has an MPI host, and uses more than four MPI nodes.

Problem Symptoms:
   The following error messages may be caused by this problem:

   a) "[16]Error in mcshrd_obj.c at 207 - SMR_ALLOC_ERROR:
            Unable to allocate additional page registers(smb_map)"

   b) "[4]Error in mcrecv.c at 715 - Invalid Transport"

   c) "[4]Error in mcqueue.c at 94 - Queues mingled"

   d) "[4]Error in mcqueue.c at 113 - Queues mingled"

   e) "Unexpected termination of the host program, and a "core" dump.

Workarounds:
    Don't include an MPI host in any program executed on more than four
MPI nodes.

Problem Description:
    On the host, SMBs, even those mapped as "smb_group", are mapped into
virtual memory independantly. To support direct memory copy, MPI version
2.0 creates and maps an SMB overlaying the entire physical RAM of every
other processor executing an MPI process. I.e., if each processor contains
8Mbytes of RAM, and an MPI program is running on sixteen nodes, then
sixteen 8MByte SMBs will be mapped into the virtual memory of the host,
consuming 128MBytes of virtual and VME address space. This appears to
exceed (sometimes silently) the resources available on the host.

   The proposed solution is to no longer map the physical SMB of each node
MPI process into the host. Thus all messages between the host and the
nodes will use "double copy", and will no longer ever use the more
optimial "direct copy" protocol.

--------------------------------------------------------------------------
Lloyd J Lewins                                  Mail Stop: RE/R1/B507
Hughes Aerospace and Electronics Co.            P.O. Box 92426
                                                Los Angeles, CA 90009-2426
Email: llewins@msmail4.hac.com                  USA
Tel: 1 (310) 334-1145
Any opinions are not neccessarily mine, let alone my employers!!

