Journal – Paper J5
Abstract
Scientific experiments performed in the eScience domain require special tooling, software, and workflows that allow researchers to link, transform, visualise and interpret data. Recent studies report that such experiments often cannot be replicated due to differences in the underlying infrastructure. The provenance collection mechanisms were built into workflow engines to increase research replicability. However, the traces do not contain the execution context that consists of software, hardware and external services used to produce the result which may change between executions.
The problem thus remains on how to identify such context and how to store such data. To address this challenge we propose the context model that integrates ontologies which describe workflow and its environment. It includes not only high level description of workflow steps and services but also low level technical details on infrastructure, including hardware, software, and files. In this paper we discuss which ontologies that compose the context model must be instantiated to enable verification of a workflow re-execution. We use a tool that monitors a workflow execution and automatically creates the context model. We also authored the VPlan ontology that enables modelling validation requirements. It contains a controlled vocabulary of metrics that can be used for quantification of requirements. We evaluate the proposed ontologies on five Taverna workflows that differ in the degree on which they depend on additional software and services.
The results show that the proposed ontologies are necessary and can be used for verification and validation of scientific workflows re-executions in different environments without the necessity of accessing the original environment at the same time. Thus the scientists can state whether the scientific experiment is replicable.
Leave a Reply (Click here to read the code of conduct)