GBIF

Backbone Archive

DOI

Backbone 14th September 2021

In this 2021-09-14 edition of the GBIF Backbone we have adressed various issues both in content but also in development work that lead to a better overall taxonomy. Further details can be seen in the 2021 Septermber column of our Github project board. Here we want to point out some major achievements:

Data source changes

The Catalogue of Life (COL) Checklist version August 2021 has been used as the basis and for most of the hierarchy. In addition to the prokaryotic kingdoms Bacteria and Archaea as well as the Fabaceae plant family, this Backbone also blocks all of the insect order Diptera and instead makes use of Systema Dipterorum directly. This was done as the version of Systema Dipterorum integrated into COL suffered serious problems and is only seen as a temporary solution we will likely rollback in the next Backbone edition.

3 Columbian national lists (birds, plants, vertebrates) and the United Kingdom Species Inventory (UKSI) are new additions to our source list.

Many other sources have seen considerable updates. In particular Palaeobiology Database and Index Fungorum were updated to their latest version, something we have not done for several years. As always Plazi has been adding many new publications containing many new species descriptions.

Name matching

We improved the name matching code, better scoring rank differences and altering fuzzy matches to snap to the genus instead in cases when the fuzzy match yield a bionomial with a slightly different genus. See portal-feedback#2930

The matching of GTDB OTU names was improved which before resulted in no matches in many cases even if the exact name was given. The matching of those names is now also case insensitive.

Name parser

The name parser library was updated to better parse various authorships that have been reported to fail before. This directly also improves our name matching.

Algorithm improvements

The Backbone Building algorithm now allows to configure an optional taxonomic scope for each source. This allows to better snap names in those sources to their expected part of the tree.