PoS - Proceedings of Science
Volume 415 - International Symposium on Grids & Clouds 2022 (ISGC2022) - Data Management & Big Data Session
Open-source and cloud-native solutions for managing and analyzing heterogeneous and sensitive clinical Data
D. Spiga*, D. Ciangottini, A. Costantini, S. Cutini, C. Duma, J. Gasparetto, P. Lubrano, B. Martelli, E. Ronchieri, D. Salomoni, G. Sergi, L. Storchi and M. Tracolli
Full text: pdf
Published on: September 28, 2022
Abstract
The requirement for an effective handling and management of heterogeneous and possibly confidential data continuously increases within multiple scientific domains.
PLANET (Pollution Lake ANalysis for Effective Therapy) is a INFN-funded research initiative aiming to implement an observational study to assess a possible statistical association between environmental pollution and Covid19 infection, symptoms and course. PLANET is built on a "data-centric" based approach that takes into account clinical components, environmental and pollution conditions, complementing primary data and many eventual confounding factors such as population density, commuter density, socio-economic metrics and more. Besides the scientific one, the main technical challenge of the project is about collecting, indexing, storing and managing many types of datasets while guaranteeing FAIRness as well as adherence to the prescribed regulatory frameworks, such as those granted by the General Data Protection Regulation, GDPR.
In this contribution we describe the developed open-source DataLake platform, detailing its key features: the event-based storage system provided by MinIO, which allows automatic metadata processing; the data-ingestion pipeline implemented via Argo Workflows; the GraphQL interface to query object metadata; finally, the seamless integration of the platform within a compute multi-user environment, showing how all these frameworks are integrated in the Enhanced PrIvacy and Compliance (EPIC) Cloud partition of the INFN Cloud federation.
DOI: https://doi.org/10.22323/1.415.0022
How to cite

Metadata are provided both in "article" format (very similar to INSPIRE) as this helps creating very compact bibliographies which can be beneficial to authors and readers, and in "proceeding" format which is more detailed and complete.

Open Access
Creative Commons LicenseCopyright owned by the author(s) under the term of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.