Coordinated checkpoint from message payload in pessimistic sender-based message logging

Mehdi Aminian, Mohammad K. Akbari, Bahman Javadi

Research output: Chapter in Book / Conference PaperConference Paperpeer-review

2 Citations (Scopus)

Abstract

Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category techniques have been introduced to make these systems fault-tolerant. The first one is checkpoint-based technique and the other one is called log-based recovery protocol. Sender-based pessimistic logging which falls in the second category is harnessing from huge amount of messages payloads which must be kept in volatile memory. In this paper we present a Coordinated Checkpoint from Message Payload (CCMP) to reduce the aforementioned overhead. The proposed method was examined by MPICH-V2, a public domain platform implementing pessimistic logging with uncoordinated checkpoint. Experimental results demonstrated the reduction of run-time for NPB benchmarks in both fault-free and faulty environments.

Original languageEnglish
Title of host publication20th International Parallel and Distributed Processing Symposium, IPDPS 2006
PublisherIEEE Computer Society
ISBN (Print)1424400546, 9781424400546
DOIs
Publication statusPublished - 2006
Externally publishedYes
Event20th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2006 - Rhodes Island, Greece
Duration: 25 Apr 200629 Apr 2006

Publication series

Name20th International Parallel and Distributed Processing Symposium, IPDPS 2006
Volume2006

Conference

Conference20th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2006
Country/TerritoryGreece
CityRhodes Island
Period25/04/0629/04/06

Fingerprint

Dive into the research topics of 'Coordinated checkpoint from message payload in pessimistic sender-based message logging'. Together they form a unique fingerprint.

Cite this