Newsgroups: comp.parallel
From: jcar@mercurio.uc.pt (Joao Carreira)
Subject: Re: Fault tolerance in parallel computing
Organization: Lab. de Informatica e Sistemas
Date: 25 Sep 1995 14:41:07 GMT
Message-ID: <446f23$2bl@usenet.srv.cis.pitt.edu>

In article <43k04t$27j@usenet.srv.cis.pitt.edu> ml7@irz301.inf.tu-dresden.de (Morris Lindenkreuz) writes:
>Path: gnu.mat.uc.pt!news.rccn.net!Portugal.EU.net!EU.net!uunet!in1.uu.net!newsfeed.pitt.edu!bigrigg
>From: ml7@irz301.inf.tu-dresden.de (Morris Lindenkreuz)
>Newsgroups: comp.parallel
>Subject: Fault tolerance in parallel computing
>Date: 18 Sep 1995 14:36:13 GMT
>Organization: Dept. of Computer Science, TU Dresden, Germany
>Lines: 9
>Approved: bigrigg@cs.pitt.edu
>Message-ID: <43k04t$27j@usenet.srv.cis.pitt.edu>
>NNTP-Posting-Host: homer.cs.pitt.edu
>Originator: bigrigg@homer.cs.pitt.edu


>I want to get some information about fault tolerance in the most
>recent parallel computers.  Who has experiences with
>checkpoint/restart, memory errors, reconfiguration of the
>communication network etc. in realized parallel architectures? 

>Please tell me about it.

>Morris Lindenkreuz
>e-mail: ml7@irz.inf.tu-dresden.de


Hi Morris

You can get some papers about fault tolerant issues in parallel
computing from our Web server:

http://pandora.uc.pt

They are related mainly with Error detection and validation.
You should also check the FTMPS project home page (Fault Tolerant
Massively parallel Systems):

http://www.esat.kuleuven.ac.be/~vounckx/ftmps.html


Regards
Joao

