Newsgroups: comp.parallel
From: golden@cs.uno.edu (Golden G. Richard III)
Subject: Summary of responses on recoverable processes in PVM
Organization: University of New Orleans (Computer Science)
Date: 29 May 1995 15:16:22 GMT
Message-ID: <3qcog6$21a@usenet.srv.cis.pitt.edu>

As promised, here is a (slightly trimmed) summary of the replies I
received concerning recoverable processes under PVM.  Thanks to all
who responded.


Cheers,

--Golden


-----------------------------------------------------------------------
From: L Silva [TRACS] <luis@epcc.ed.ac.uk>

Hi,

If you are looking for an implementation of checkpointing
for PVM, then take a look at Fail-Safe PVM (FSPVM) that
was implemented by Juan Leon, from Carnegie Mellon Univ.
You can try to email: Juan.Leon@cs.cmu.edu  or
Juan.Leon@SHELLEY.SP.CS.CMU.EDU

Cheers,
  Luis
-----------------------------------------------------------------------
From: suresh@lanl.gov (Suresh Damodaran-Kamal)

You may get useful info by emailing to jx@cs.brown.edu
He has implemented some algos.

-Suresh
-----------------------------------------------------------------------
From: prouty@cse.ogi.edu (Robert Prouty)

Greetings!

In a posting to comp.parallel.pvm, you expressed interest in recoverable
processes under PVM.

We are currently developing an enhanced version of PVM which we call MIST.
The system currently performs global scheduling and transparent process
migration.  We are in the process of extending the system to perform
transparent checkpointing.  We have only a preliminary prototype of the
checkpointing version working, and have not formally written anything up
yet.  We hope to present a paper at this year's PVM Users' Group Meeting
which will introduce everything, including checkpointing.

Anyway, even though we have no papers on our work yet, our web site at
http://www.cse.ogi.edu/DISC/projects/mist/ may be of use to you, as we have
pointers to many other systems like Juan Leon's Fail-Safe PVM.  Here's a
"tree of links" from our main page that may be useful:

    Related Research Projects - PVM    - Fail-Safe PVM
                              - Papers
							  - other  - Checkpointing info

 - Rob
-----------------------------------------------------------------------

Hello Golden

I work at the moment for a tool for recoverable processes under PVM based on
independent checkpointing and adaptive message logging. It will be my thesis
at computational physics.

Feel free to mail for more information.


Bye,

Laurentios S.
-- 
===============================================================================
Laurentios Servissoglou (Tools for parallel programming, computational physics)

Zentrum fuer Datenverarbeitung
Universitaet Tuebingen
Brunnenstr. 27                  E-Mail:
D-72074 Tuebingen               servissoglou@zdv.uni-tuebingen.de
Fed. Rep. of Germany            Tel.: (49) 7071 / 29-5915
===============================================================================
-------------------------------------------------------------------------

From: Georg Stellner <stellner@informatik.tu-muenchen.de>

Currently there is quite a lot of work going on in that direction. I
personally know of
	* FailSafe PVM (Juan Leon) which focuses on fault tolerance
	* MPVM from Steve Otto et al. which focuses on migrating processes
	* DynamicPVM by Peter Sloot et al. which also focuses on migration
	* and finally there is myself working on that topic

You might check my WWW homepage for a paper I gave during last year's
First European PVM User Group Meeting.

Cheers, Georg

[The page mentioned is http://wwwbode.informatik.tu-muenchen.de/~stellner]
-------------------------------------------------------------------------

From: Jian Xu <jx@cs.brown.edu>

Hi Golden,

People at HPL (Palo Alto, CA) have implemented a checkpointing and
restart mechanism on top of PVM. Their system also allows 
program execution be replayed (for debugging). You may want to 
contact Milon Mackey (mackey@hpl.hp.com) for details.
They have also published a paper on this in the PVM user meeting
but I don't have the reference handy. 

Good luck!

Jian
-------------------------------------------------------------------------

From: jeong@SHASHA.CS.NYU.EDU (Karpjoo Jeong)

Hi,

A fault tolerant PVM version called 'Fail-safe PVM' is developed at CMU. 
The system is based on coordinated checkpointing.
If you are interested, then you may send email to Juan_Leon@cs.cmu.edu,
or write to:
	Juan Leon
	School of Computer Science
	Carnegie Mellon University
	5000 Forbes Ave
	Pittsburgh, PA 15213-3891.

A technical report on Fail-safe PVM is also available via ftp.
The number is CMU-CS-93-124.

I hope that this will help.

-Karp Jeong
-------------------------------------------------------------------------

From: crispin@cse.ogi.edu (Crispin Cowan)

I built my thesis (HOPE:  A Programming Model for Optimism) on top of
PVM.  HOPE does both process checkpointing and message logging to
enable consistent rollback of distributed applications, and can be used
to build a recoverable system.  You can read about it in the papers on
my WWW page at:

http://www.cse.ogi.edu/~crispin/

I presented a talk on my PVM work at the PVM User's Group meeting in
1994.

Juan Leon (Juan_Leon@SHELLEY.SP.CS.CMU.EDU) is working on a pessimistic
recovery mechanism for PVM based on globally synchronized checkpoints.
His report appeared at the PVM User's Group meeting in 1994.

The MIST project here at OGI is working on a general-purpose recovery
mechanism for the next generation of PVM.  I don't know whether the WWW
page has details on the recovery mechanism, but you can read about MIST in
general on their WWW page at:

http://www.cse.ogi.edu/DISC/projects/mist/

Crispin
-----
-------------------------------------------------------------------------

From: Georg Stellner <stellner@informatik.tu-muenchen.de>

There is some work in that direction currently being done. Also myself is
working in that area. As a starting point you might have a look at my 
web homepage.

Cheers, Georg

[The page mentioned is http://wwwbode.informatik.tu-muenchen.de/~stellner]
-------------------------------------------------------------------------


--
Golden G. Richard III                Asst. Professor, Dept. of Computer Science
golden@cs.uno.edu                    University of New Orleans (504-286-6045)
finger: golden@cs.cs.uno.edu         WWW: http://www.cs.uno.edu/~golden
  My sick and twisted opinions are my own, of course.  It's more fun that way.

