Newsgroups: comp.parallel
From: jmartin@kaiwan.com (Jay M Martin)
Subject: Process Checkpointing/Migration Software for Sun Solaris
Organization: KAIWAN Internet (310-527-4279,818-756-0180,909-785-9712,714-638-4133,805-294-9338)
Date: 18 Mar 1995 17:35:45 -0800

Does anyone know of good software or method that can checkpoint /
migrate or stop sequential jobs?  The relevance of this to parallel
processing is that parallel machines are usually shared. Others not
doing parallel processing execute long running sequential
simulations that they don't want to checkpoint (too busy/lazy). The
parallel processing users need to benchmark their parallel programs (to
find maximum speedup) and we can't really do this with a bunch of
sequential jobs slowing down the nodes and doing IO.  So what we need
is a way to checkpoint these jobs, maybe bring down the system, run
the parallel jobs, and then restart the sequential simulations.  I
have looked into Condor but they currently don't support Sun Solaris.
It would also be nice to be able to stop sequential and parallel
jobs on the IBM SP2 (To also do parallel program benchmarking).

Thanks, Jay Martin.

