A virtual infrastructure defined through the paradigm of software as code. The infrastructure comprises a distributed storage, a processing cluster, an authentication service, a container repository and a set of template applications for accessing and processing the data. It is described in Ansible and RADL (Resource and Application description Language) and can be instantiated using Infrastructure Manager (www.grycap.upv.es/im) in a wide range of cloud backends.
This result has been implemented and demonstrated as the backend which supports the storage and processing of the Medical Imaging and Clinical data of the CHAIMELEON Repository.
This virtual infrastructure provides dataset traceability, offering a proxy service to annotate datasets, machine learning models and accesses. Our traceability solution addresses the whole cycle from dataset creation to publication of models. This provides traceability for data owners and data scientists.


Cancer