A workflow to create trait databases from collections of textual taxonomic descriptions

David Coleman, Rachael V. Gallagher, Daniel Falster, Herve Sauquet, Elizabeth Wenk

Research output: Contribution to journalArticlepeer-review

Abstract

There is a wealth of information about the characteristics (traits) of organisms within collections of taxonomic descriptions of plants and animals called a ‘Flora’ or ‘Fauna’ of a region. However, such knowledge is usually encoded as text paragraphs, and is thus unavailable for immediate analysis. In order to make use of the knowledge embedded in taxon descriptions, text must be organised into standardised, queryable datasets. Despite the recent development of natural language processing (NPL) tools to analyse taxonomic descriptions to extract trait values, the complexity and specificity of these methods currently limits broad application. Accessible and flexible methods for extracting traits across large numbers of taxonomic descriptions are therefore needed. Here we present such an R-based workflow, which can be adapted for use on any organismal group using a language familiar to researchers in the biological sciences. We document a way to (1) assemble tens of thousands of taxonomic descriptions into a standardised format, (2) split the taxon descriptions into different topics, (3) extract trait values as defined by the user, and (4) assign traits described at the genus and family level to lower level taxa to maximise trait coverage. As a case study, we apply the workflow to a collection of taxonomic descriptions drawn from Australia's state and national floras and describe useful techniques for creating workflows and thereby research-grade trait datasets. Using this method, we were able to extract 615,812 trait values from 38 different plant traits. Trait data collated using this method are freely available as part of the AusTraits trait database and have already contributed to analyses in several scientific publications.
Original languageEnglish
Article number102312
Number of pages9
JournalEcological Informatics
Volume78
DOIs
Publication statusPublished - Dec 2023

Fingerprint

Dive into the research topics of 'A workflow to create trait databases from collections of textual taxonomic descriptions'. Together they form a unique fingerprint.

Cite this