Workflow engines are commonly used to orchestrate large-scale scientific computations such as, but not limited to weather, climate, natural disasters, food safety, and territorial management. However, to implement, manage, and execute real-world scientific applications in the form of workflows on multiple infrastructures (servers, clusters, cloud) remains a challenge. In this paper, we present DagOnStar (Directed Acyclic Graph OnAnything), a lightweight Python library implementing a workflow paradigm based on parallel patterns that can be executed on any combination of local machines, on-premise high performance computing clusters, containers, and cloud-based virtual infrastructures. DagOnStar is designed to minimize data movement to reduce the application storage footprint. A case study based on a real-world application is explored to illustrate the use of this novel workflow engine: a containerized weather data collection application deployed on multiple infrastructures. An experimental comparison with other state-of-the-art workflow engines shows that DagOnStar can run workflows on multiple types of infrastructure with an improvement of 50.19% in run time when using a parallel pattern with eight task-level workers.

An efficient pattern-based approach for workflow supporting large-scale science: The DagOnStar experience

Di Luccio D.
Data Curation
;
Montella R.
Conceptualization
2021

Abstract

Workflow engines are commonly used to orchestrate large-scale scientific computations such as, but not limited to weather, climate, natural disasters, food safety, and territorial management. However, to implement, manage, and execute real-world scientific applications in the form of workflows on multiple infrastructures (servers, clusters, cloud) remains a challenge. In this paper, we present DagOnStar (Directed Acyclic Graph OnAnything), a lightweight Python library implementing a workflow paradigm based on parallel patterns that can be executed on any combination of local machines, on-premise high performance computing clusters, containers, and cloud-based virtual infrastructures. DagOnStar is designed to minimize data movement to reduce the application storage footprint. A case study based on a real-world application is explored to illustrate the use of this novel workflow engine: a containerized weather data collection application deployed on multiple infrastructures. An experimental comparison with other state-of-the-art workflow engines shows that DagOnStar can run workflows on multiple types of infrastructure with an improvement of 50.19% in run time when using a parallel pattern with eight task-level workers.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11367/101116
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 4
social impact