Skip to contents

Overview

taxdiv is an R package for computing taxonomic diversity indices from ecological community data. It provides a unified framework that brings together classical diversity measures, Clarke & Warwick’s taxonomic distinctness family, and the Deng entropy-based approach of Ozkan (2018) — all in a single package.

Traditional diversity indices such as Shannon and Simpson treat all species as equally distinct. However, a community of 10 species from 10 different families is taxonomically more diverse than 10 species from the same genus. taxdiv accounts for this taxonomic structure using two complementary frameworks:

  • Clarke & Warwick (1995, 1998, 2001) — path-length-based taxonomic distinctness measures (Delta, Delta*, AvTD, VarTD) with simulation-based significance testing
  • Ozkan (2018) — Deng entropy-based taxonomic diversity (pTO) using Dempster-Shafer evidence theory and a slicing procedure, producing 8 complementary indices

Why taxdiv?

Feature vegan ape taxdiv
Shannon / Simpson yes yes
Clarke & Warwick full suite (Delta, Delta*, AvTD, VarTD) partial yes
Ozkan pTO (8 indices, Run 1+2+3) yes
Simulation-based significance testing (funnel plots) yes
Taxonomic rarefaction with bootstrap CI yes
Stochastic resampling + sensitivity analysis yes
Excel-to-results in one command yes
Bias-corrected Shannon (Miller-Madow, Grassberger, Chao-Shen) yes

Installation

# install.packages("devtools")
devtools::install_github("mgorgoz/taxonomic-diversity-r")

Quick Start

1. From R vectors

library(taxdiv)

# Species abundances
community <- c(
  Quercus_robur       = 15,
  Pinus_nigra         = 8,
  Fagus_orientalis    = 12,
  Abies_nordmanniana  = 5,
  Juniperus_excelsa   = 3
)

# Taxonomic hierarchy
tax_tree <- build_tax_tree(
  species = names(community),
  Genus   = c("Quercus", "Pinus", "Fagus", "Abies", "Juniperus"),
  Family  = c("Fagaceae", "Pinaceae", "Fagaceae", "Pinaceae", "Cupressaceae"),
  Order   = c("Fagales", "Pinales", "Fagales", "Pinales", "Pinales")
)

# All 14 indices at once
compare_indices(community, tax_tree)

# Ozkan pTO — 8 values matching Excel macro output (Run 1+2+3)
ozkan_pto(community, tax_tree)

2. From Excel — one command

library(taxdiv)
library(readxl)

# Read your Excel file
data <- as.data.frame(read_excel("my_data.xlsx"))

# Compute all indices for all sites — automatic column detection
batch_analysis(data)

Output (16 columns):

  Site  N_Species Shannon Simpson  Delta Delta_star  AvTD  VarTD   uTO    TO uTO_plus TO_plus uTO_max TO_max uTO_plus_max TO_plus_max
  A1        6    1.494   0.757  1.622      2.138 2.333  0.667  2.14  3.49    3.891   5.244    2.14   3.49        3.891       5.244
  A2        5    1.577   0.784  1.719      2.243 2.500  0.500  1.98  3.21    3.456   4.872    1.98   3.21        3.456       4.872

A ready-to-use Excel template is included: inst/templates/taxdiv_template.xlsx

Features

Diversity Indices (26 exported functions)

Category Functions Description
Classical shannon(), simpson() Shannon-Wiener H’ (with 3 bias corrections), Gini-Simpson
Clarke & Warwick delta(), delta_star(), avtd(), vartd() Taxonomic diversity (Delta), taxonomic distinctness (Delta*), average taxonomic distinctness (AvTD), variation in taxonomic distinctness (VarTD)
Ozkan pTO ozkan_pto(), pto_components() 8 Deng entropy-based indices: uTO, TO, uTO+, TO+ (all levels) and their max-informative-level variants
Ozkan Pipeline ozkan_pto_resample(), ozkan_pto_sensitivity(), ozkan_pto_jackknife(), ozkan_pto_full() Stochastic resampling (Run 2), sensitivity analysis (Run 3), jackknife leave-one-out, full pipeline
Batch batch_analysis(), compare_indices() Multi-site analysis from data frame, multi-community comparison
Simulation simulate_td() Random subsampling from species pool for significance testing
Rarefaction rarefaction_taxonomic() Bootstrap rarefaction curves for 8 different indices
Distance tax_distance_matrix(), build_tax_tree(), deng_entropy_level() Taxonomic distance matrices, tree construction, per-level Deng entropy

Visualization (7 plot types)

Function Plot Type
plot_funnel() Funnel plot for AvTD/VarTD significance testing
plot_rarefaction() Rarefaction curves with bootstrap confidence intervals
plot_iteration() Stochastic resampling iteration trajectories
plot_radar() Radar/spider chart for multi-community comparison
plot_heatmap() Taxonomic similarity heatmap
plot_bubble() Bubble plot of community composition
plot_taxonomic_tree() Dendrogram of taxonomic hierarchy

S3 Class System

All main output objects have dedicated print() and summary() methods:

result <- batch_analysis(data)
print(result)    # Clean formatted output
summary(result)  # Min/max/mean/SD per index across sites

result <- ozkan_pto(community, tax_tree)
print(result)    # All 8 pTO values + Deng entropy by level

Example Datasets

Dataset Description
anatolian_trees Anatolian tree species with full taxonomic hierarchy
gazi_comm Community abundance data from Gazi University campus
gazi_gytk Taxonomic classification for Gazi campus species

Excel Macro Equivalence

taxdiv produces the same 8 Ozkan pTO values as the TD_OMD Excel macro:

Excel Macro (TD_OMD)       R function output
────────────────────────────────────────────
Run 1:  uT0+           ->  uTO_plus
        T0+            ->  TO_plus
Run 2:  uT0            ->  uTO
        T0             ->  TO
Run 3:  uT0+max        ->  uTO_plus_max
        T0+max         ->  TO_plus_max
        uT0max         ->  uTO_max
        T0max          ->  TO_max

The “max” variants use only informative taxonomic levels where Deng entropy > 0, matching the Excel macro’s Run 3 behavior.

Theoretical Background

The Problem: Why Species Counts Are Not Enough

Consider two forest plots, each containing 10 species. In the first plot, all 10 species belong to the same genus. In the second, they span 5 families across 3 orders. Standard indices like Shannon and Simpson would assign identical diversity scores to both — yet the second community is clearly more diverse in an evolutionary and functional sense. Taxonomic diversity indices solve this problem by incorporating the hierarchical relationships among species.

Ozkan pTO Method

Ozkan (2018) introduced a method that uses Deng entropy — a generalization of Shannon entropy rooted in Dempster-Shafer evidence theory — to measure how species are distributed across a taxonomic hierarchy.

The method works in three stages:

Run 1 — Deterministic calculation: At each taxonomic level (genus, family, order, etc.), Deng entropy measures how evenly species are grouped. A level where all species fall into one group contributes zero entropy (no diversity), while a level where species are spread evenly across many groups contributes high entropy. The product of these level-wise entropies gives pTO (taxonomic diversity). When species-level Shannon entropy is also included in the product, the result is pTO+ (taxonomic distance), which captures both taxonomic structure and within-community evenness.

Run 2 — Stochastic resampling (slicing): Species are removed one at a time, starting with the least abundant. After each removal, all indices are recalculated. This reveals each species’ contribution to overall diversity: removing a “happy” species decreases diversity (it was contributing positively), while removing an “unhappy” species increases diversity (it was redundant in the taxonomic structure). The maximum pTO value across all slicing steps represents the community’s optimal taxonomic organization.

Run 3 — Max-informative level variants: Some taxonomic levels may carry no information (e.g., when all species share the same order, Deng entropy at that level is zero). Run 3 repeats the calculations using only the levels where Deng entropy is greater than zero, producing the _max variants of each index.

This three-stage pipeline yields 8 complementary indices: uTO, TO, uTO_plus, TO_plus, and their _max counterparts.

Clarke & Warwick Taxonomic Distinctness

Clarke & Warwick (1995, 1998, 2001) proposed a family of indices based on the pairwise taxonomic path length between species in a classification tree.

  • Delta (Δ) — average taxonomic distance between two randomly chosen individuals, weighted by abundance
  • Delta* (Δ*) — same as Delta but excluding same-species pairs, isolating the pure taxonomic component
  • AvTD (Δ+) — average taxonomic distinctness based on presence/absence only. Because it does not depend on abundance, AvTD is independent of sample size, making it ideal for comparing datasets collected with different sampling efforts
  • VarTD (Λ+) — the variance in taxonomic path lengths. High VarTD indicates an uneven taxonomic tree (some branches are deep, others shallow)

To assess whether an observed AvTD or VarTD value is statistically significant, simulate_td() draws random subsets of species from a master species pool and computes expected distributions. plot_funnel() then plots these as 95% confidence funnels — if a community falls below the funnel, its taxonomic diversity is significantly lower than expected by chance.

References

Primary Methods

  • Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346. doi: 10.18182/tjf.441061

  • Deng, Y. (2016). Deng entropy. Chaos, Solitons & Fractals, 91, 549-553. doi: 10.1016/j.chaos.2016.08.011

Taxonomic Distinctness

  • Warwick, R.M. & Clarke, K.R. (1995). New ‘biodiversity’ measures reveal a decrease in taxonomic distinctness with increasing stress. Marine Ecology Progress Series, 129, 301-305. doi: 10.3354/meps129301

  • Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35(4), 523-531. doi: 10.1046/j.1365-2664.1998.3540523.x

  • Clarke, K.R. & Warwick, R.M. (1999). The taxonomic distinctness measure of biodiversity: weighting of step lengths between hierarchical levels. Marine Ecology Progress Series, 184, 21-29. doi: 10.3354/meps184021

  • Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation in taxonomic distinctness. Marine Ecology Progress Series, 216, 265-278. doi: 10.3354/meps216265

Classical Diversity

Evidence Theory

  • Dempster, A.P. (1967). Upper and lower probabilities induced by a multivalued mapping. The Annals of Mathematical Statistics, 38(2), 325-339. doi: 10.1214/aoms/1177698950

  • Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ.

Bias Correction

  • Chao, A. & Shen, T.-J. (2003). Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10, 429-443. doi: 10.1023/A:1026096204727

Package Status

Metric Value
R CMD check 0 errors, 0 warnings, 0 notes
Unit tests 610 passing
Exported functions 26
S3 methods 13 (print, summary, plot)
R source files 19
Test files 12
Example datasets 3
Vignettes 7 (6 English + 1 Turkish)

Roadmap

Contributing

Contributions are welcome! If you encounter a bug or have an idea for a new feature, please open an issue using our templates:

  • Bug Report — for errors, unexpected behavior, or incorrect results
  • Feature Request — for new indices, visualizations, or enhancements

For general questions about the package or taxonomic diversity methods, feel free to start a GitHub Discussion or open a blank issue.

Citation

citation("taxdiv")
Gorgoz MM, Ozkan K, Negiz MG (2026). taxdiv: Taxonomic Diversity Indices
Using Deng Entropy. R package version 0.1.0.
https://github.com/mgorgoz/taxonomic-diversity-r

Ozkan K (2018). "A new proposed measure for estimating taxonomic diversity."
Turkish Journal of Forestry, 19(4), 336-346. doi:10.18182/tjf.441061.

License

MIT