by Florian D. Schneider, 16 December 2019
A couple of the OTN members have been there and some are certainly more involved in the initiatives for global data standards than I am. Please read this as a rendering of my personal notes and ideas, which should be taken as very raw suggestions. The sole purpose of this draft is to spark a discussion and draft a vision for a OTN web of trait ontologies.
During the conference, a workshop on ‘Vocabularies for Values’ organised by Paula Zermoglio and the TDWG Interest Group on Biodiversity Data Quality made me think about one of the tasks addressed within the OTN: facilitating the linkage of trait data to globally accessible definitions. The workshop was mostly concerned with developing controlled vocabularies of Darwin Core terms, some of which are part of the ETS. The task group on Vocabularies for Values arose from the need to facilitate the evaluation of quality of entries in DwC data, which is currently only possible after a tedious harmonization process of values. But discussions also went to the need of many participants to define vocabularies for their own project contexts.
The most generic term for a list of standardised definitions of concepts is a ‘terminology’. A ‘controlled vocabulary’, i.e. a fixed list of terms that can be used for entries in a field or as column labels.
A ‘thesaurus’ is enriched by defining the relationships between terms, i.e. classes of terms or synonyms, broader or narrower terms. An ‘ontology’ includes a machine readable specification of the terms, concepts and relationships. A ‘standard’ is a description of a best-practice, primarily in a human-readable way but if aiming for integration in data management schemes, it would also include a machine readable specification. TDWG has formulated requirements for standards in the TDWG Standards Documentation Specification (TDWG SDS). Standards do not have to include any form of controlled vocabulary (best example is the TDGW SDS itself), but may include one or several vocabularies, thesauri or ontologies.
The TDWG policy seems to be to speak only of a ‘standard’ if the document underwent a TDWG ratification process. The TDWG SDS suggests that standards should differentiate between ‘normative’ elements and documents and ‘non-normative’ elements. The prior are undergoing a ratification process for each single edit. The latter can be edited more informally, to minimize the organisational procedures for alterations that are not of any operational effect.
1. Defining the goal for OTN
In my opinion, it is illusory to aim for a single global trait ontology, mostly for social reasons of how scientific communities work with trait data. The existing trait-based fields are communities of practice, revolving around a certain taxonomic group or functional component of ecosystems, like the soil macrofauna, grassland vegetation, or bird morphology. Also, some of these communities already have started to develop vocabularies or thesauri or more advanced semantic ontologies, like for agricultural plant traits, or vertebrate morphology, which address particular problem-centered challenges. Vocabularies could also be defined to organise data within an existing data pool, like TOP compiles trait definitions for plants based on the enormous variable input for the TRY database. Another example are initiatives that are aiming to systematically extract quantitative information on leaf dimensions from photographs. Rather than unifying those initiatives, the OTN should aim for working towards a network of complementary but compatible controlled vocabularies, thesauri and ontologies. They can be allowed to evolve in parallel and pragmatically on the topics identified by the research communities. The main approach here should be to work within a community of practice to develop sufficiently sharp and practical vocabularies, but with the overarching idea of trait-data within the semantic web. These various terminologies would ideally evolve successively into a generally applicable scheme of sufficiently different, field-specific ontologies that provide terms for the particular use-cases, but also link across the domains of knowledge.
2. A standard for developing standards
The Open Traits Network initiative for ontologies could start with defining and publish a standard for how to define and publish trait terminologies for particular communities of practice. The aim of these field-specific terminologies would be to define unambiguous trait concepts for the given research context and problem at hand. Further the vocabularies should link, for each trait concept, the measurement and collection practice (methods of practice) with the stored values (controlled vocabularies for each trait). The OTN standard should provide sufficient instructions to create a solid resource. It should also suggest practices of versioning of terms and the entire vocabulary, and on cross-referencing with other ontologies.
To achieve that, the standard might follow the TDWG SDS and also suggest or require the vocabularies to be constructed according to the TDWG SDS. Among other things, this would define a minimum quality human readable and machine readable resources. The latter are typically written in RDF language using SKOS schema and DwC or Dublin Core terms. The OTN standard could include a suggested structure for these documents.
The OTN Standard for controlled vocabularies should also classify different levels of depths of detail. Detail may include synonyms, similar or related terms, and hierarchical relationships (broader, narrower) in the same or other ontologies. The definitions of individual terms should make use of broader ontologies, like morphological (eg. for terms like leaf, plant, fruit, wing, egg) or units ontologies.
Finally, the Standard should provide recommendations on how to publish these trait vocabularies (these recommendations would probably be dependent on the purpose and generality of the vocabulary; see below)
3. Quality control and ratification of vocabularies
Ratification by the TDWG for trait ontologies is unlikely, as the debated vocabularies are not of global geographic or taxonomic concern. As already discussed during the initial OTN Workshop, the OTN could implement a scheme of ratifying trait vocabularies, thesauri or ontologies. We may differentiate between 1) simple controlled vocabularies, 2) thesauri, and 3) ontologies, and between the application level ranging from i) project specific, ii) taxon specific, and iii) global validity also define a level of maturity being in a) draft stage, b) under evaluation/testing, and c) ratified.
The OTN could organise a database or repository to publish vocabularies (thereby also considering the use of OBO foundry), as well as a mode to assess and validate (with badges or labels), and continuously revise these ontologies (this should be minimally prescriptive about the frequency and involvement and openness of a community). Here, on the overarching level, we would also work on strengthening the cross-referencing of ontologies, e.g. by suggesting synonyms or related terms, and paths of trait inference (e.g. a statement on digits allows to infer the presence of limbs, and an assignment to vertebrata) during the ratification process.
OTN could also think about practicality of ontologies: When is writing a new ontology justified, when is it filling an own niche, and when is it redundant because widely overlapping or included in another ontology? Also, co-existence of ontologies might be problematic if they are contradictory in their definitions of relationships between apparently compatible terms.
4. Meta-Evaluation and utilization
At the level of OTN, we could continuously evaluate the coverage of the trait ontologies for all domains of life, for fields of research, quantify their overlap and redundancy or possible conflicts. We could also keep record of their application in published data.
With web technology existing in initiatives of the biodiversity informatics community, the network of ontologies can be patched into a meta-ontology that enables the reading of all trait datasets that have been created with one of the ontologies. However, this would require to deal with the coexistence of conflicting or incompletely overlapping terms.
This toolchain would then potentially feed into the Essential Biodiversity Variables (EBV) initiative, which aims to link more or less systematic field survey data with the required indicators and reports required on the policy side through an expansive system of data management. A suite of functional traits have been defined in the EBV Framework that promise key insights into the state and functioning of ecosystems (Kissling et al. 2018 Nature Ecology & Evolution 2: 1531–1540.). With an ontology network for traits, the trait-data cosmos would be made available to these analyses and functional indicators could be more easily derived from field data on community assemblages and systematic monitoring data.
Strategies for the development of community-based trait-ontologies by Florian D. Schneider is licensed under a Creative Commons Attribution 4.0 International License.
All of these ideas should be taken as suggestions. The sole purpose of this draft is to provide a first vision and spark a discussion for the OTN strategy towards trait ontologies.
Florian D. Schneider is working at ISOE Institute for Social-ecological Research, Frankfurt am Main, Germany. He investigates knowledge integration in cross-disciplinary cooperations, particularly on topics of biodiversity and insect decline due to human land use.
Researchgate: Florian Schneider
all photographs are public domain released by Naturalis Biodiversity Center