Classical Diversity Indices: Shannon and Simpson
Source:vignettes/classical-indices.Rmd
classical-indices.RmdWhat Are Classical Diversity Indices?
Shannon and Simpson indices are the two most widely used diversity measures in ecology. They quantify how species abundances are distributed within a community — essentially answering: “How diverse is this community based on how individuals are distributed among species?”
Both indices consider only the abundance distribution. They do not account for taxonomic, phylogenetic, or functional relationships between species. A community of 10 species from the same genus receives the same score as a community of 10 species spanning 10 different orders.
This is why taxdiv pairs them with taxonomic measures — classical indices capture the abundance structure, while taxonomic indices capture the hierarchical structure.
Shannon-Wiener Index (H’)
The idea
Shannon entropy, borrowed from information theory (Shannon, 1948), measures the uncertainty in predicting the species identity of a randomly chosen individual. High uncertainty means high diversity — if species are evenly distributed, it is hard to guess which species the next individual belongs to.
Key properties
- Minimum: when there is only one species (no uncertainty)
- Maximum: when all species have equal abundance (maximum uncertainty)
- Units: Measured in “nats” when using natural logarithm, “bits” when using log base 2
- Sensitivity: Moderately sensitive to both rare and abundant species
Usage in taxdiv
# Default: natural logarithm
H <- shannon(community)
cat("Shannon H':", round(H, 4), "\n")
#> Shannon H': 2.0948
cat("Maximum possible H' for", length(community), "species:",
round(log(length(community)), 4), "\n")
#> Maximum possible H' for 10 species: 2.3026
cat("Evenness (H'/H'max):", round(H / log(length(community)), 4), "\n")
#> Evenness (H'/H'max): 0.9098Bias Correction
When sample sizes are small, the observed Shannon index underestimates the true value because rare species are likely missing from the sample. taxdiv provides three correction methods:
cat("Uncorrected: ", round(shannon(community), 4), "\n")
#> Uncorrected: 2.0948
cat("Miller-Madow: ", round(shannon(community, correction = "miller_madow"), 4), "\n")
#> Miller-Madow: 2.1292
cat("Grassberger: ", round(shannon(community, correction = "grassberger"), 4), "\n")
#> Grassberger: 2.1338
cat("Chao-Shen: ", round(shannon(community, correction = "chao_shen"), 4), "\n")
#> Chao-Shen: 2.1014Which correction to use?
- No correction: When sample size is large relative to species richness (N >> S). This is the standard approach used in most published studies.
- Miller-Madow: Simple first-order correction. Adds to the estimate. Appropriate when you want a lightweight adjustment.
- Grassberger: Uses the digamma function for a more accurate correction. Performs well across a range of sample sizes.
- Chao-Shen: Uses Horvitz-Thompson estimation to account for unseen species. Best when you suspect many rare species are missing from the sample.
Simpson Index
The idea
Simpson’s index (Simpson, 1949) measures the probability that two randomly chosen individuals belong to the same species. A community dominated by one species has a high probability (low diversity); an even community has a low probability (high diversity).
Three variants
taxdiv provides all three common Simpson variants:
# Dominance (D): probability of same-species pair
D <- simpson(community, type = "dominance")
cat("Simpson dominance (D): ", round(D, 4), "\n")
#> Simpson dominance (D): 0.1424
# Gini-Simpson (1-D): probability of different-species pair
GS <- simpson(community, type = "gini_simpson")
cat("Gini-Simpson (1-D): ", round(GS, 4), "\n")
#> Gini-Simpson (1-D): 0.8576
# Inverse Simpson (1/D): effective number of species
inv <- simpson(community, type = "inverse")
cat("Inverse Simpson (1/D): ", round(inv, 4), "\n")
#> Inverse Simpson (1/D): 7.0246Understanding the variants
| Variant | Formula | Range | Interpretation |
|---|---|---|---|
| Dominance (D) | 0 to 1 | Higher = less diverse (one species dominates) | |
| Gini-Simpson (1-D) | 0 to 1 | Higher = more diverse (common choice) | |
| Inverse Simpson (1/D) | 1 to S | Effective number of equally abundant species |
The inverse Simpson is often the most intuitive: a value of 6.5 means the community is as diverse as one with 6.5 perfectly even species.
Shannon vs Simpson: When to Use Which?
# Even community
even <- c(sp1 = 20, sp2 = 20, sp3 = 20, sp4 = 20, sp5 = 20)
# Uneven community (same species, different abundances)
uneven <- c(sp1 = 90, sp2 = 4, sp3 = 3, sp4 = 2, sp5 = 1)
cat("=== Even community ===\n")
#> === Even community ===
cat("Shannon:", round(shannon(even), 4), "\n")
#> Shannon: 1.6094
cat("Simpson (1-D):", round(simpson(even, type = "gini_simpson"), 4), "\n\n")
#> Simpson (1-D): 0.8
cat("=== Uneven community ===\n")
#> === Uneven community ===
cat("Shannon:", round(shannon(uneven), 4), "\n")
#> Shannon: 0.4531
cat("Simpson (1-D):", round(simpson(uneven, type = "gini_simpson"), 4), "\n")
#> Simpson (1-D): 0.187Key difference: Shannon is more sensitive to rare species (because of the logarithm), while Simpson is more sensitive to dominant species (because of the squaring). When a community has many rare species, Shannon will detect them; Simpson may not.
| Scenario | Better index |
|---|---|
| Comparing sites with different rare species | Shannon |
| Detecting dominance shifts | Simpson |
| Need sample-size independence | Neither (use AvTD) |
| Need taxonomic information | Neither (use pTO or Delta) |
The Limitation: Why You Need Taxonomic Indices Too
Classical indices treat all species as interchangeable. Consider:
# Community A: 5 species from 5 different orders
comm_A <- c(sp1 = 20, sp2 = 20, sp3 = 20, sp4 = 20, sp5 = 20)
# Community B: 5 species from the same genus
comm_B <- c(sp6 = 20, sp7 = 20, sp8 = 20, sp9 = 20, sp10 = 20)
cat("Community A (5 orders) - Shannon:", round(shannon(comm_A), 4), "\n")
#> Community A (5 orders) - Shannon: 1.6094
cat("Community B (1 genus) - Shannon:", round(shannon(comm_B), 4), "\n")
#> Community B (1 genus) - Shannon: 1.6094
cat("Identical scores, yet A is far more taxonomically diverse.\n")
#> Identical scores, yet A is far more taxonomically diverse.This is exactly why taxdiv includes Clarke & Warwick and Ozkan pTO indices — they incorporate the taxonomic hierarchy to distinguish between these communities. See the Clarke & Warwick and Ozkan pTO articles for details.
References
- Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379-423.
- Simpson, E.H. (1949). Measurement of diversity. Nature, 163, 688.
- Chao, A. & Shen, T.-J. (2003). Nonparametric estimation of Shannon’s index of diversity when there are unseen species in sample. Environmental and Ecological Statistics, 10, 429-443.