Newsgroups: comp.parallel.mpi
From: jareed@gamera.syr.edu (Judith Ann Reed)
Subject: SP2 specific problem - can't allocate nodes
Organization: Syracuse University, Syracuse
Date: 2 Nov 1995 21:29:13 GMT
Message-ID: <47bd79$4eq@newstand.syr.edu>

Greetings. We recently installed mpich-1.0.11 on our 12-node SP2, and
configured it to use the MPL on the SP2. The problem is that sometimes
it works, and sometimes it doesn't, and there seems to be no pattern to
the problem.
I've appended output from:
	1. jm_status -n control_ws_name -P
	2. jm_status -n control_ws_name -j
which indicate that the pools are there, and that there are no active jobs
in them.
However, I believe the error we are getting:

$ /usr/npac/bin/mpirun -np 2 hello
ERROR: 0031-124  Couldn't allocate nodes for parallel execution.  Exiting ...
ERROR: 0031-603  Resource Manager allocation for task: 0, node: 
merlin1.npac.syr
.edu, rc = JM_PARTIONCREATIONFAILURE
ERROR: 0031-635  Non-zero status -1 returned from pm_mgr_init

indicates it can't access any pool - at least that's what the "Parallel 
Environment ... Diagnosis" handbook seems to say:

0031-124 - The requested nodes were not available from the Resource manager.

I've tried stopping and starting the resource manager, and even rebooting 
the control workstation, all to no avail.

Can anyone suggest what might cause this problem, and what to do about it?
This all worked fine late yesterday, and nothing has changed that I know 
about since then.
Thanks in advance for any advice you can send!

Judith Reed
Northeast Parallel Architecture Center
judith@npac.syr.edu
jareed@syr.edu
 
---------------------------------- 
1. 
----------------------------------
Pool 0:    us_pool_0
  Subpool: GENERAL
    Node:  merlin1.npac.syr.edu
    Node:  merlin2.npac.syr.edu
    Node:  merlin3.npac.syr.edu
    Node:  merlin4.npac.syr.edu
Pool 1:    us_pool_1
  Subpool: GENERAL
    Node:  merlin5.npac.syr.edu
    Node:  merlin6.npac.syr.edu
    Node:  merlin7.npac.syr.edu
    Node:  merlin8.npac.syr.edu
Pool 2:    us_pool_2
  Subpool: GENERAL
    Node:  merlin10.npac.syr.edu
    Node:  merlin11.npac.syr.edu
    Node:  merlin12.npac.syr.edu
    Node:  merlin9.npac.syr.edu
----------------------------------
2.
----------------------------------
No job data found
-- 
 Judith Reed - sysmgr - Northeast Parallel Architecture Center
 jareed@syr.edu
 judith@npac.syr.edu
 "Old enough to be amazed at the technologies I encounter daily"

