To understand the current state of marine systems and their pressures, it is essential to have comprehensive and accessible marine biodiversity data. Protecting and restoring biodiversity is one of three objectives of the Horizon Europe Mission to Restore Our Ocean and Waters by 2030, enabling the EU to reach its Green Deal and Biodiversity 2030 targets.
The European Union aims to build a digital environment, the EU Digital Twin Ocean (EU DTO), that allows to create a digital replica of ocean processes to improve our understanding, predict their response to changes in the system, and to simulate alternative scenarios, which will ultimately lead to making better informed decisions.
Despite previous efforts from the EU, large amounts of biodiversity data do not find their way into repositories due to a variety of reasons, including technical barriers. Many systems were not originally designed to handle the complexity of the biodiversity data being generated today. New methods and instruments, such as DNA-based observations, plankton imaging, passive acoustics, and biologging, can produce vast amounts of data and improve the collection and analysis of biodiversity time series. Nonetheless, the flow of these data into data repositories or integrators is at distinct stages of development and for some, the guidance is still a work in progress, subject to frequent changes as more information becomes available.
A data integrator refers to a facility that allows the storage of data and that provides tools that allow for the data to be analyzed. The terms integrator and aggregator are used interchangeably throughout the document. In contrast, a data repository is a facility that allows for data storage but does not provide functionalities for data analysis within the platform.
DTO-BioFlow is establishing data flows for several new biodiversity data types produced using different techniques and instruments which do not yet have established dataflows to make them available in long-term data repositories.
These include data from genomics observations, plankton imaging observations, fish, mammal and bird biologging, cetacean passive acoustic observations , and other relevant biodiversity data sources.
The European Marine Observation and Data Network (EMODnet) Biology is the European Union's service providing free and open access to in situ marine biodiversity data and data products. It is a data integrator that adheres to INSPIRE, Open Geospatial Consortium (OGC) standards for metadata and data and complies with the FAIR principles. EMODnet Biology uses the EurOBIS (European Ocean Biodiversity Information System) data infrastructure for providing data management. The two initiatives are therefore intrinsically connected and allow for data to reach a wider network of stakeholders using a common metadata sharing infrastructure with OBIS (Ocean Biodiversity Information System) and GBIF (Global Biodiversity Information System).
The status of the data flows towards EMODnet Biology varies according to the diverse types of data addressed by DTO-BioFlow. In some cases, there is no data flow implemented, in other cases a data flow is already in place and will be subject to improvements during the project’s lifetime[1].
The European Marine Observation and Data Network (EMODnet) Biology is the European Union's service providing free and open access to in situ marine biodiversity data and data products. It is a data integrator that adheres to INSPIRE, Open Geospatial Consortium (OGC) standards for metadata and data and complies with the FAIR principles. EMODnet Biology uses the EurOBIS (European Ocean Biodiversity Information System) data infrastructure for providing data management. The two initiatives are therefore intrinsically connected and allow for data to reach a wider network of stakeholders using a common metadata sharing infrastructure with OBIS (Ocean Biodiversity Information System) and GBIF (Global Biodiversity Information System). Within DTO-BioFlow, data pipelines for each of the data types addressed in the project will be established to maintain a sustained flow of biodiversity data towards EMODnet Biology, and ultimately into the EU DTO infrastructure. The status of the data flows towards EMODnet Biology varies according to the diverse types of data addressed by DTO-BioFlow. In some cases, there is no data flow implemented, in other cases a data flow is already in place and will be subject to improvements during the project’s lifetime[1].
Genomic data is an umbrella term used to cover any kind of nucleotide sequence data, irrespective of whether it includes whole genomes, whole transcriptomes, or only specific loci. It therefore may refer to the nucleotide sequence information of individual organisms (e.g. for genetic barcoding), as well as of multiple organisms simultaneously by applying the prefix “meta-”. Whole genome sequence data of e.g. microbial community samples will consequently be called “metagenomic” data. In cases where multiple organisms are assessed simultaneously but the focus are individual loci (most commonly marker loci used for taxonomic identification of species), the term “metabarcoding” is used.
Genomic data may include taxonomic and/or functional annotations.
In the scope of this project, we mainly focus on taxonomically annotated nucleotide sequence data derived from environmental DNA (eDNA) samples subjected to either metabarcoding or metagenomic sequencing or both.
Plankton imaging data covers all data generated by quantitative imaging instruments - instruments that provide a large amount of images, in a consistent manner, to reliably extract quantitative information, such as concentrations and/or biovolumes. These observations of planktonic organisms allow us to better quantify their role as key trophic and functional links in open ocean ecosystems. In addition, the in situ instruments or samples are processed shortly after collection can provide information on so-called marine snow, which constitutes 80% to 90% of the particles in seawater. This marine snow plays a crucial role in marine ecosystems by facilitating the transfer of carbon to the deep ocean through the biological pump mechanism.
A wide variety of instruments can collect imaging data in situ or in the laboratory, each one with its own workflow. It is expected that data from plankton and particle imaging will flow into the EU DTO mostly via data integrators/repositories that are deployed across several in situ observatories and networks, notably including those overseen by DTO-BioFlow partners. The improved flows of data from plankton imaging expected from DTO-BioFlow will enhance biomonitoring efforts and global carbon flux estimations.
Biologging data are data of animal positions/presences obtained by animal-borne electronic devices. For this project, we consider biologging data from animals tagged in Europe, with mainly marine positions. They can be divided into three types:
Passive Acoustic Monitoring (PAM) of sounds made by marine animals is an important method for estimating the distribution, density, and abundance of species that vocalise. It is particularly useful for animals that frequently produce species-specific vocalisations. Harbour porpoises (Phocoena phocoena) are excellent candidates as seldom does a minute go by without a porpoise producing species-specific echolocation clicks. The biological interpretation of acoustic data requires detecting signals produced by the animals, and the development and evaluation of detectors for classification is an active area of research. We are proposing the primary passive acoustic data to flow from ETN into EMODnet Biology.
DTO BioFlow is considering other networks and data sources. These include species occurrence data from global platforms not yet integrated into EMODnet Biology, gridded species distributions from various projects, reporting data relevant to EU Directives, as well as data from industry, citizen science, and literature.