The occurrence of faults in multicomputers with hundreds or thousands of nodes is a likely event that can be dealt with hardware or software fault-tolerant approaches. This paper presents a unifying model that describes software reconfiguration strategies for parallel applications with regular computational pattern. We show that most existing strategies can be obtained as instances of the proposedthreshold-basedreconfiguration meta-algorithm. Moreover, this approach is useful to discover several yet unexplored strategies among which we consider the class of theadaptive threshold-basedstrategies. The performance optimization analysis demonstrates that these strategies, applied to data-parallel regular computations, give optimal results for worst fault patterns. A wide spectrum of simulations, where the system parameters have been settled to those of actual multicomputers, confirms that adaptive threshold-based strategies yield the most stable performance for a variety of workloads, independently of the number and pattern of faults.
Threshold-based reconfiguration strategies for gracefully degradable parallel computations / Colajanni, Michele; Grassi, V.; Angelaccio, M.. - In: JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING. - ISSN 0743-7315. - STAMPA. - 55 (1):(1998), pp. 138-151.
Threshold-based reconfiguration strategies for gracefully degradable parallel computations
COLAJANNI, Michele;
1998
Abstract
The occurrence of faults in multicomputers with hundreds or thousands of nodes is a likely event that can be dealt with hardware or software fault-tolerant approaches. This paper presents a unifying model that describes software reconfiguration strategies for parallel applications with regular computational pattern. We show that most existing strategies can be obtained as instances of the proposedthreshold-basedreconfiguration meta-algorithm. Moreover, this approach is useful to discover several yet unexplored strategies among which we consider the class of theadaptive threshold-basedstrategies. The performance optimization analysis demonstrates that these strategies, applied to data-parallel regular computations, give optimal results for worst fault patterns. A wide spectrum of simulations, where the system parameters have been settled to those of actual multicomputers, confirms that adaptive threshold-based strategies yield the most stable performance for a variety of workloads, independently of the number and pattern of faults.Pubblicazioni consigliate
I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris