To understand the current state of marine systems and their pressures, it is essential to have comprehensive and accessible marine biodiversity data. Protecting and restoring biodiversity is one of three objectives of the Horizon Europe Mission to Restore Our Ocean and Waters by 2030, enabling the EU to reach its Green Deal and Biodiversity 2030 targets. Within this context, the EU aims to build a digital environment, the EU Digital Twin Ocean (EU DTO), that allows to create a digital replica of ocean processes to improve our understanding, predict their response to changes in the system, and to simulate alternative scenarios, which will ultimately lead to making better informed decisions.
Despite previous efforts from the EU, large amounts of biodiversity data do not find their way into repositories due to a variety of reasons, for example of a technical nature, when systems are not able to accommodate for the biodiversity data being produced, due to the way they were originally designed. Various instruments and new techniques such as DNA-based observations, plankton imaging observations, passive acoustics, or biologging data, can produce vast amounts of data and help to optimise the resources to collect and analyse biodiversity time series data. Nonetheless, the flow of these data into data repositories or integrators is at distinct stages of development and for some, the guidance is still a work in progress, subject to frequent changes as more information becomes available.
DTO-BioFlow is establishing data flows for a number of new biodiversity data types produced using different techniques and instruments and that still do not have established dataflows to make them available in data repositories in the long term.
They include genomics observation networks; plankton imaging observation networks; fish, mammal and bird biologging networks; cetacean passive acoustic observation networks; and other relevant biodiversity data sources.
The DTo-BioFlow Blueprint document describes how we envision to set up the general framework for data flows towards the EU DTO for each of the data types. (LINK?)
DTO-BioFlow connection to EMODnet Biology
The European Marine Observation and Data Network (EMODnet) Biology is the European Union's service providing free and open access to in situ marine biodiversity data and data products. It is a data integrator that adheres to INSPIRE, Open Geospatial Consortium (OGC) standards for metadata and data and complies with the FAIR principles. EMODnet Biology uses the EurOBIS (European Ocean Biodiversity Information System) data infrastructure to provide for data management. The two initiatives are therefore intrinsically connected and allow for data to reach a wider network of stakeholders through the use of a common data sharing infrastructure with OBIS and GBIF, the Global Biodiversity Information System.
Within DTO-BioFlow, data pipelines for each of the data types addressed in the project will be established to maintain a sustained flow of biodiversity data towards EMODnet Biology, and ultimately into the EU DTO infrastructure.
The status of the data flows towards EMODnet Biology varies according to the diverse types of data addressed by DTO-BioFlow. In some cases, there is no data flow implemented, in other cases a data flow is already in place and will be subject to improvements during the project’s lifetime.
EU Digital Twin of the Ocean (EU DTO) infrastructure
The main objective of the project EDITO-Infra –EU Public Infrastructure for the European Digital Twin Ocean, is to build the EU public infrastructure backbone for the European Digital Twin of the Ocean (EU DTO). This will be done by upgrading, combining, and integrating key service components of the existing EU ocean observing, monitoring and data programmes, namely Copernicus Marine Service (CMEMS) and EMODnet into a single digital framework. In this context,
EDITO-Infra is both the name of the project and the name of the platform that is being developed. As such, EDITO-Infra will provide the foundation for further developments of the EU DTO, hosting the deployment of multiple DTO applications from ongoing and future digital twin projects.
What type of data will become available?
Genomics
Current status: there is no established genomics data flow to EMODnet Biology.
EMODnet Data and Data Products Processing Level: L3
Data types produced: occurrence data of biological taxa in delimiter-separated tabular format, which will be annotated with literature information for species of interest, and environmental metadata, which can be retrieved from ENA BioSamples or from other repositories (e.g. PANGAEA).
Related Data Project: XXXXX
“Genomic data” is an umbrella term used in this document to cover any kind of nucleotide sequence data, irrespective of whether it includes whole genomes, whole transcriptomes, or only specific loci. It therefore may refer to the nucleotide sequence information of individual organisms (e.g. for genetic barcoding), as well as of multiple organisms simultaneously by applying the prefix “meta-”. Whole genome sequence data of e.g. microbial community samples will consequently be called “metagenomic” data. In cases where multiple organisms are assessed simultaneously but the focus are individual loci (most commonly marker loci used for taxonomic identification of species), the term “metabarcoding” is used.
Genomic data may include taxonomic and/or functional annotations.
In the scope of this project, we mainly focus on taxonomically annotated nucleotide sequence data derived from environmental DNA (eDNA) samples subjected to either metabarcoding or metagenomic sequencing or both.
Plankton imaging
Biologging
Passive Acoustic
Other networks