Error prevention and recovery mechanisms in the ESIP platform

Year: 2010
Type of Publication: In Proceedings
Keywords: Grid based execution, Grid workflows, Error recovery
Authors: Bâcu, Victor; Rodilă, Denisa; Mihon, Dănuț; Ştefănuţ, Teodor; Gorgan, Dorian
The execution of complex processes defined as workflows in the Grid environment is often exposed to failures. The problems that can appear and that will lead to an execution failure are system heterogeneity, overloaded resources such as ones at the WMS level, inaccessible services or components, and as well communication links among Grid nodes. Workflow management and execution systems must be able to identify errors and to solve them in a transparent manner for users. Even if failures occur, the system should offer a high successful rate. This paper highlights and experiments different error prevention mechanisms supported by the ESIP platform.