The increasing complexity and scale of data-intensive scientific workflows necessitate advancements in workflow engines (WFEs) to handle real-time data streams and reduce input/output (I/O) bottlenecks. This paper introduces an innovative approach to enhancing the DAGonStar scientific workflow engine by integrating CAPIO, a middleware capable of injecting I/O streaming capabilities into traditional scientific workflows and optimized for high-speed data access and low latency. By combining DAGonStar’s robust task orchestration and dependency management with CAPIO, we aim to significantly improve scientific workflows’ performance and scalability. We present the design and implementation of this integration, detailing the architectural modifications required to enable seamless interaction between DAGonStar and CAPIO. The paper includes comprehensive benchmarks and performance evaluations demonstrating the impact of CAPIO on workflow execution times and data handling efficiency. Our findings indicate that the enhanced DAGonStar, equipped with CAPIO, offers a powerful solution for managing and processing large-scale, real-time data streams, thereby advancing the capabilities of scientific computing infrastructure.
Extending a Scientific Workflow Engine with Streaming I/O Capabilities: DAGonStar and CAPIO
Perrotta S.
;Giuseppe De Vita
;Mellone G.
;Salvi G.
;Lapegna M.
;Ciaramella A.
2025-01-01
Abstract
The increasing complexity and scale of data-intensive scientific workflows necessitate advancements in workflow engines (WFEs) to handle real-time data streams and reduce input/output (I/O) bottlenecks. This paper introduces an innovative approach to enhancing the DAGonStar scientific workflow engine by integrating CAPIO, a middleware capable of injecting I/O streaming capabilities into traditional scientific workflows and optimized for high-speed data access and low latency. By combining DAGonStar’s robust task orchestration and dependency management with CAPIO, we aim to significantly improve scientific workflows’ performance and scalability. We present the design and implementation of this integration, detailing the architectural modifications required to enable seamless interaction between DAGonStar and CAPIO. The paper includes comprehensive benchmarks and performance evaluations demonstrating the impact of CAPIO on workflow execution times and data handling efficiency. Our findings indicate that the enhanced DAGonStar, equipped with CAPIO, offers a powerful solution for managing and processing large-scale, real-time data streams, thereby advancing the capabilities of scientific computing infrastructure.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.