Overview
taxdiv is an R package for computing taxonomic diversity indices from ecological community data. It provides a unified framework that brings together classical diversity measures, Clarke & Warwick’s taxonomic distinctness family, and the Deng entropy-based approach of Ozkan (2018) — all in a single package.
Traditional diversity indices such as Shannon and Simpson treat all species as equally distinct. However, a community of 10 species from 10 different families is taxonomically more diverse than 10 species from the same genus. taxdiv accounts for this taxonomic structure using two complementary frameworks:
- Clarke & Warwick (1995, 1998, 2001) — path-length-based taxonomic distinctness measures (Delta, Delta*, AvTD, VarTD) with simulation-based significance testing
- Ozkan (2018) — Deng entropy-based taxonomic diversity (pTO) using Dempster-Shafer evidence theory and a slicing procedure, producing 8 complementary indices
Why taxdiv?
| Feature | vegan | ape | taxdiv |
|---|---|---|---|
| Shannon / Simpson | yes | – | yes |
| Clarke & Warwick full suite (Delta, Delta*, AvTD, VarTD) | – | partial | yes |
| Ozkan pTO (8 indices, Run 1+2+3) | – | – | yes |
| Simulation-based significance testing (funnel plots) | – | – | yes |
| Taxonomic rarefaction with bootstrap CI | – | – | yes |
| Stochastic resampling + sensitivity analysis | – | – | yes |
| Excel-to-results in one command | – | – | yes |
| Bias-corrected Shannon (Miller-Madow, Grassberger, Chao-Shen) | – | – | yes |
Installation
# install.packages("devtools")
devtools::install_github("mgorgoz/taxonomic-diversity-r")Quick Start
1. From R vectors
library(taxdiv)
# Species abundances
community <- c(
Quercus_robur = 15,
Pinus_nigra = 8,
Fagus_orientalis = 12,
Abies_nordmanniana = 5,
Juniperus_excelsa = 3
)
# Taxonomic hierarchy
tax_tree <- build_tax_tree(
species = names(community),
Genus = c("Quercus", "Pinus", "Fagus", "Abies", "Juniperus"),
Family = c("Fagaceae", "Pinaceae", "Fagaceae", "Pinaceae", "Cupressaceae"),
Order = c("Fagales", "Pinales", "Fagales", "Pinales", "Pinales")
)
# All 14 indices at once
compare_indices(community, tax_tree)
# Ozkan pTO — 8 values matching Excel macro output (Run 1+2+3)
ozkan_pto(community, tax_tree)2. From Excel — one command
library(taxdiv)
library(readxl)
# Read your Excel file
data <- as.data.frame(read_excel("my_data.xlsx"))
# Compute all indices for all sites — automatic column detection
batch_analysis(data)Output (16 columns):
Site N_Species Shannon Simpson Delta Delta_star AvTD VarTD uTO TO uTO_plus TO_plus uTO_max TO_max uTO_plus_max TO_plus_max
A1 6 1.494 0.757 1.622 2.138 2.333 0.667 2.14 3.49 3.891 5.244 2.14 3.49 3.891 5.244
A2 5 1.577 0.784 1.719 2.243 2.500 0.500 1.98 3.21 3.456 4.872 1.98 3.21 3.456 4.872
A ready-to-use Excel template is included: inst/templates/taxdiv_template.xlsx
Features
Diversity Indices (26 exported functions)
| Category | Functions | Description |
|---|---|---|
| Classical |
shannon(), simpson()
|
Shannon-Wiener H’ (with 3 bias corrections), Gini-Simpson |
| Clarke & Warwick |
delta(), delta_star(), avtd(), vartd()
|
Taxonomic diversity (Delta), taxonomic distinctness (Delta*), average taxonomic distinctness (AvTD), variation in taxonomic distinctness (VarTD) |
| Ozkan pTO |
ozkan_pto(), pto_components()
|
8 Deng entropy-based indices: uTO, TO, uTO+, TO+ (all levels) and their max-informative-level variants |
| Ozkan Pipeline |
ozkan_pto_resample(), ozkan_pto_sensitivity(), ozkan_pto_jackknife(), ozkan_pto_full()
|
Stochastic resampling (Run 2), sensitivity analysis (Run 3), jackknife leave-one-out, full pipeline |
| Batch |
batch_analysis(), compare_indices()
|
Multi-site analysis from data frame, multi-community comparison |
| Simulation | simulate_td() |
Random subsampling from species pool for significance testing |
| Rarefaction | rarefaction_taxonomic() |
Bootstrap rarefaction curves for 8 different indices |
| Distance |
tax_distance_matrix(), build_tax_tree(), deng_entropy_level()
|
Taxonomic distance matrices, tree construction, per-level Deng entropy |
Visualization (7 plot types)
| Function | Plot Type |
|---|---|
plot_funnel() |
Funnel plot for AvTD/VarTD significance testing |
plot_rarefaction() |
Rarefaction curves with bootstrap confidence intervals |
plot_iteration() |
Stochastic resampling iteration trajectories |
plot_radar() |
Radar/spider chart for multi-community comparison |
plot_heatmap() |
Taxonomic similarity heatmap |
plot_bubble() |
Bubble plot of community composition |
plot_taxonomic_tree() |
Dendrogram of taxonomic hierarchy |
S3 Class System
All main output objects have dedicated print() and summary() methods:
result <- batch_analysis(data)
print(result) # Clean formatted output
summary(result) # Min/max/mean/SD per index across sites
result <- ozkan_pto(community, tax_tree)
print(result) # All 8 pTO values + Deng entropy by levelExcel Macro Equivalence
taxdiv produces the same 8 Ozkan pTO values as the TD_OMD Excel macro:
Excel Macro (TD_OMD) R function output
────────────────────────────────────────────
Run 1: uT0+ -> uTO_plus
T0+ -> TO_plus
Run 2: uT0 -> uTO
T0 -> TO
Run 3: uT0+max -> uTO_plus_max
T0+max -> TO_plus_max
uT0max -> uTO_max
T0max -> TO_max
The “max” variants use only informative taxonomic levels where Deng entropy > 0, matching the Excel macro’s Run 3 behavior.
Theoretical Background
The Problem: Why Species Counts Are Not Enough
Consider two forest plots, each containing 10 species. In the first plot, all 10 species belong to the same genus. In the second, they span 5 families across 3 orders. Standard indices like Shannon and Simpson would assign identical diversity scores to both — yet the second community is clearly more diverse in an evolutionary and functional sense. Taxonomic diversity indices solve this problem by incorporating the hierarchical relationships among species.
Ozkan pTO Method
Ozkan (2018) introduced a method that uses Deng entropy — a generalization of Shannon entropy rooted in Dempster-Shafer evidence theory — to measure how species are distributed across a taxonomic hierarchy.
The method works in three stages:
Run 1 — Deterministic calculation: At each taxonomic level (genus, family, order, etc.), Deng entropy measures how evenly species are grouped. A level where all species fall into one group contributes zero entropy (no diversity), while a level where species are spread evenly across many groups contributes high entropy. The product of these level-wise entropies gives pTO (taxonomic diversity). When species-level Shannon entropy is also included in the product, the result is pTO+ (taxonomic distance), which captures both taxonomic structure and within-community evenness.
Run 2 — Stochastic resampling (slicing): Species are removed one at a time, starting with the least abundant. After each removal, all indices are recalculated. This reveals each species’ contribution to overall diversity: removing a “happy” species decreases diversity (it was contributing positively), while removing an “unhappy” species increases diversity (it was redundant in the taxonomic structure). The maximum pTO value across all slicing steps represents the community’s optimal taxonomic organization.
Run 3 — Max-informative level variants: Some taxonomic levels may carry no information (e.g., when all species share the same order, Deng entropy at that level is zero). Run 3 repeats the calculations using only the levels where Deng entropy is greater than zero, producing the _max variants of each index.
This three-stage pipeline yields 8 complementary indices: uTO, TO, uTO_plus, TO_plus, and their _max counterparts.
Clarke & Warwick Taxonomic Distinctness
Clarke & Warwick (1995, 1998, 2001) proposed a family of indices based on the pairwise taxonomic path length between species in a classification tree.
- Delta (Δ) — average taxonomic distance between two randomly chosen individuals, weighted by abundance
- Delta* (Δ*) — same as Delta but excluding same-species pairs, isolating the pure taxonomic component
- AvTD (Δ+) — average taxonomic distinctness based on presence/absence only. Because it does not depend on abundance, AvTD is independent of sample size, making it ideal for comparing datasets collected with different sampling efforts
- VarTD (Λ+) — the variance in taxonomic path lengths. High VarTD indicates an uneven taxonomic tree (some branches are deep, others shallow)
To assess whether an observed AvTD or VarTD value is statistically significant, simulate_td() draws random subsets of species from a master species pool and computes expected distributions. plot_funnel() then plots these as 95% confidence funnels — if a community falls below the funnel, its taxonomic diversity is significantly lower than expected by chance.
References
Primary Methods
Ozkan, K. (2018). A new proposed measure for estimating taxonomic diversity. Turkish Journal of Forestry, 19(4), 336-346. doi: 10.18182/tjf.441061
Deng, Y. (2016). Deng entropy. Chaos, Solitons & Fractals, 91, 549-553. doi: 10.1016/j.chaos.2016.08.011
Taxonomic Distinctness
Warwick, R.M. & Clarke, K.R. (1995). New ‘biodiversity’ measures reveal a decrease in taxonomic distinctness with increasing stress. Marine Ecology Progress Series, 129, 301-305. doi: 10.3354/meps129301
Clarke, K.R. & Warwick, R.M. (1998). A taxonomic distinctness index and its statistical properties. Journal of Applied Ecology, 35(4), 523-531. doi: 10.1046/j.1365-2664.1998.3540523.x
Clarke, K.R. & Warwick, R.M. (1999). The taxonomic distinctness measure of biodiversity: weighting of step lengths between hierarchical levels. Marine Ecology Progress Series, 184, 21-29. doi: 10.3354/meps184021
Clarke, K.R. & Warwick, R.M. (2001). A further biodiversity index applicable to species lists: variation in taxonomic distinctness. Marine Ecology Progress Series, 216, 265-278. doi: 10.3354/meps216265
Classical Diversity
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379-423. doi: 10.1002/j.1538-7305.1948.tb01338.x
Simpson, E.H. (1949). Measurement of diversity. Nature, 163, 688. doi: 10.1038/163688a0
Evidence Theory
Dempster, A.P. (1967). Upper and lower probabilities induced by a multivalued mapping. The Annals of Mathematical Statistics, 38(2), 325-339. doi: 10.1214/aoms/1177698950
Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ.
Bias Correction
- Chao, A. & Shen, T.-J. (2003). Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10, 429-443. doi: 10.1023/A:1026096204727
Package Status
| Metric | Value |
|---|---|
| R CMD check | 0 errors, 0 warnings, 0 notes |
| Unit tests | 610 passing |
| Exported functions | 26 |
| S3 methods | 13 (print, summary, plot) |
| R source files | 19 |
| Test files | 12 |
| Example datasets | 3 |
| Vignettes | 7 (6 English + 1 Turkish) |
Contributing
Contributions are welcome! If you encounter a bug or have an idea for a new feature, please open an issue using our templates:
- Bug Report — for errors, unexpected behavior, or incorrect results
- Feature Request — for new indices, visualizations, or enhancements
For general questions about the package or taxonomic diversity methods, feel free to start a GitHub Discussion or open a blank issue.
Citation
citation("taxdiv")Gorgoz MM, Ozkan K, Negiz MG (2026). taxdiv: Taxonomic Diversity Indices
Using Deng Entropy. R package version 0.1.0.
https://github.com/mgorgoz/taxonomic-diversity-r
Ozkan K (2018). "A new proposed measure for estimating taxonomic diversity."
Turkish Journal of Forestry, 19(4), 336-346. doi:10.18182/tjf.441061.