Press [ esc ] or close+

EOSCpilot Science Demonstrator: EGA - FAIR Genomic datasets

SCIENTIFIC OBJECTIVES OF THE DEMONSTRATOR

The European Genome-phenome Archive (EGA) is a repository that facilitates access and management for long-term archival of bio-molecular data. Enhancing data analysis reproducibility and exploring new added-value services by leveraging EOSC resources are the main objectives of this SD. Applying the FAIR principles (Findability, accessibility, interoperability and reusability) to our data sets and information associated is a great mission we have accepted from the community

  • A set of results data has been reproduced using a portable version of the pipeline.
  • The same result set has been updated by re-analyzing it with a current version.
  • FAIR-fied metadata on both result sets is available at a testing EGA server and/or at an appropriate repository.

MAIN ACHIEVEMENTS

  • A set of results data has been reproduced using a portable version of the pipeline.
  • The same result set has been updated by re-analyzing it with a current version of the pipeline and the reference data.
  • FAIRfied metadata on both result sets is available at a testing EGA server and/or at an appropriate repositor

IMPACT

This pilot will have a pragmatic impact by demonstrating how to make analyses portable (tools and workflows), how to increase findability, by using persistent identifiers, how to leverage security technologies for sensible data, how to deploy the workflow into a cloud and how to make data FAIR. It will also have a long term impact by increasing the usability of EGA hosted data by assuring to potential users that up-to-date versions of assured quality are available to download.

The success of the project will be monitored using well-defined user cases and insuring their reproducibility across sites and platforms. This monitoring will occur through space (i.e. across sites) and time (i.e. reproduction and updating of existing results).
The potential scientific and socio-economical impact is extremely significant at a time when insilico analysis is being routinely deployed in a medical context with this approach expected to dominate the so-called precision medicine in the next decade.

RECOMMENDATIONS FOR THE IMPLEMENTATION

Being in possession of huge amounts of data is a first step but not enough to achieve the main goal: foster research. There exists a need for adding usefulness to the bio-molecular data the repositories currently store. The EOSC project is a unique framework to add this necessary layer of standardization and interoperability while unifying and discovering the files and associated pipelines.

Learn more of this Science Demonstrator on the EOSCpilot website.


The EOSC portal has been jointly developed and maintained by the eInfraCentral, EOSC-hub, EOSCpilot and OpenAIRE-Advance projects funded by the European Union’s Horizon 2020 research and innovation programme with contribution of the European Commission.