GBIF

Backbone Archive

GBIF Backbone Archive

The GBIF backbone taxonomy can be downloaded for all released versions since 2009. The more recent versions are provided in 2 different formats:

  1. An entire Darwin Core Archive including the core taxonomy, descriptions, species profiles and vernacular names
  2. A simple, single tab delimited text file

Darwin Core Archive

A Darwin Core Archive is essentially a set of CSV files with a simple descriptor (meta.xml) to give semantics to data files and their columns.

The latest backbone archive uses a Taxon core with the following extensions:

For every name record in the archive the dwc:datasetID points to the dataset in the GBIF registry that has been used as the primary source reference when the backbone was generated.

The dataset subfolder contains Ecological Markup Language files for each of those constituent datasets.

Simple Tab Delimited File

The gzipped, tab delimited file is generated by Postgres and is well suited for database imports.

Its columns match the table defined in this Postgresql DDL file so once the file is extracted and the table has been created it can be imported like this from within psql:

    \copy backbone FROM 'simple.txt'

The file was generated executing the query:

    \copy (SELECT u.id, u.parent_fk, u.basionym_fk, u.is_synonym, u.status, u.rank, u.nom_status,
     u.constituent_key, u.origin, u.source_taxon_key, u.kingdom_fk, u.phylum_fk, u.class_fk,
     u.order_fk, u.family_fk, u.genus_fk, u.species_fk, n.id as name_id, n.scientific_name,
     n.canonical_name, n.genus_or_above, n.specific_epithet, n.infra_specific_epithet, n.notho_type,
     n.authorship, n.year, n.bracket_authorship, n.bracket_year, cpi.citation as name_published_in, u.issues 
    FROM name_usage u 
     JOIN name n ON u.name_fk=n.id 
     LEFT JOIN citation cpi ON u.name_published_in_fk=cpi.id 
    WHERE u.dataset_key=nubKey() and u.deleted IS NULL
    ) to 'backbone-simple.txt'

All historically existing records that have been deleted at one point are also available in the same simple format: simple-deleted.txt.gz. Using this dump requires the currently existing records to be present as the deleted records point to these. The files was generated with:

    \copy (SELECT u.id, u.parent_fk, u.basionym_fk, u.is_synonym, u.status, u.rank, u.nom_status,
     u.constituent_key, u.origin, u.source_taxon_key, u.kingdom_fk, u.phylum_fk, u.class_fk,
     u.order_fk, u.family_fk, u.genus_fk, u.species_fk, n.id as name_id, n.scientific_name,
     n.canonical_name, n.genus_or_above, n.specific_epithet, n.infra_specific_epithet, n.notho_type,
     n.authorship, n.year, n.bracket_authorship, n.bracket_year, cpi.citation as name_published_in,
     u.issues
    FROM name_usage u
     JOIN name n ON u.name_fk=n.id
     LEFT JOIN citation cpi ON u.name_published_in_fk=cpi.id
    WHERE u.dataset_key=nubKey() and u.deleted IS NOT NULL)
    to 'simple-deleted.txt'