Newsgroups: comp.parallel.pvm
From: raja@osc.edu (Raja Daoud)
Subject: MPI on LAM (scratch-n-sniff)
Keywords: LAM, MPI, cluster computing
Organization: Ohio Supercomputer Center
Date: 15 Jun 1994 12:56:24 -0400
Message-ID: <2tnbro$kvu@hawkeye.osc.edu>


Getting Started with MPI on LAM


LAM is a simple yet powerful environment for running and monitoring
MPI applications on clusters.  The few essential steps in LAM operations
are covered below.


Booting LAM

The user creates a file listing the participating machines in the cluster.

% cat lamhosts
# a 2-node LAM
tbag.osc.edu
alex.osc.edu

Each machine will be given a node identifier (nodeid) starting with 0 for
the first listed machine, 1 for the second, etc.

The recon tool verifies that the cluster is bootable.

% recon -v lamhosts
recon: testing n0 (tbag.osc.edu)
recon: testing n1 (alex.osc.edu)

The lamboot tool actually starts LAM on the specified cluster.

% lamboot -v lamhosts
LAM - Ohio Supercomputer Center
hboot n0 (tbag.osc.edu)...
hboot n1 (alex.osc.edu)...

Lamboot returns to the UNIX shell prompt.  LAM does not force a canned
environment or a "LAM shell".  The tping command builds user confidence
that the cluster and LAM are running.

% tping -c1 N
  1 byte from 2 nodes: 0.009 secs


Compiling MPI Programs

Refer to "MPI: It's Easy to Get Started" to see a simple MPI program.
Hcc (hf77) is a wrapper for the C (F77) compiler that links LAM libraries.
The MPI library is explicitly linked.

% hcc -o foo foo.c -lmpi
% hf77 -o foo foo.f -lmpi


Executing MPI Programs

A MPI application is started by one invocation of the mpirun command.
A SPMD application can be started on the mpirun command line.

% mpirun -v n0-1 foo
2445 foo running on n0 (o)
361 foo running on n1

An application with multiple programs must be described in an application
schema, a file that lists each program and its target node(s).

% cat appfile
# 1 master, 2 slaves
master n0
slave n0-1

% mpirun -v appfile
3292 master running on n0 (o)
3296 slave running on n0 (o)
412 slave running on n1


Monitoring MPI Applications

The full MPI synchronization status of all processes and messages can be
displayed at any time.  This includes the source and destination ranks,
the message tag, the communicator, and the function invoked.

% mpitask
NODE     PID     RANK        STATE                                 PROGRAM
n0 (o)   2445    0 {0}       MPI_Recv(ANY, ANY, 0)                 foo
n1       361     1 {1}       R                                     foo

Process rank 0 is blocked receiving a message from any source rank and
any message tag, using the MPI_COMM_WORLD communicator (COMM ID 0).
Process rank 1 is currently executing.

% mpimsg
NODE      SRC RANK    MSG TAG     DEST RANK   COMM ID     LENGTH
n1        0 {0}       45          1 {1}       0           4      

A message sent by process rank 0, to process rank 1, is buffered and
waiting to be received.  Its tag is 45.  It is 4 bytes long, and was
sent using the MPI_COMM_WORLD communicator (COMM ID 0).


Cleaning LAM

All user processes and messages can be removed, without rebooting.

% lamclean -v
cleaning nodes 0...1...


Terminating LAM

The wipe tool removes all traces of the LAM session on the network.

% wipe -v lamhosts
tkill n0 (tbag.osc.edu)...
tkill n1 (alex.osc.edu)...

-=-
Raja Daoud				Ohio Supercomputer Center
raja@osc.edu				Trollius Project

