GBIF Backbone Archive
The GBIF backbone taxonomy can be downloaded for all released versions since 2009. The more recent versions are provided in 2 different formats:
- 1. An entire Darwin Core Archive including the core taxonomy, descriptions, species profiles and vernacular names
- 2. A simple, single tab delimited text file
Darwin Core Archive
A Darwin Core Archive is essentially a set of CSV files with a simple descriptor (meta.xml) to give semantics to data files and their columns.
The latest backbone archive uses a Taxon core with the following extensions:
For every name record in the archive the dwc:datasetID points to the dataset in the GBIF registry that has been used as the primary source reference when the backbone was generated.
The dataset subfolder contains Ecological Markup Language files for each of those constituent datasets.
Simple Tab Delimited File
The gzipped, tab delimited file is generated by Postgres and is well suited for database imports.
It's columns matches the table defined in this Postgresql DDL file so once the file is extracted and the table has been created it can be imported like this from within psql:
\copy backbone from 'backbone-current-simple.txt'
The file was generated executing the query:
\copy (SELECT u.id, u.parent_fk, u.basionym_fk, u.is_synonym, u.status, u.rank, u.nom_status, u.constituent_key, u.origin, u.source_taxon_key, u.kingdom_fk, u.phylum_fk, u.class_fk, u.order_fk, u.family_fk, u.genus_fk, u.species_fk, n.id as name_id, n.scientific_name, n.canonical_name, n.genus_or_above, n.specific_epithet, n.infra_specific_epithet, n.notho_type, n.authorship, n.year, n.bracket_authorship, n.bracket_year, cpi.citation as name_published_in, u.issues FROM name_usage u JOIN name n ON u.name_fk=n.id LEFT JOIN citation cpi ON u.name_published_in_fk=cpi.id WHERE u.dataset_key=nubKey() and u.deleted IS NULL ) to 'backbone-simple.txt'