skembel / picante

R tools for integrating phylogenies and ecology

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

trialswap and independentswap algorithms don't preserve sample species richness

joelnitta opened this issue · comments

Unless I'm missing something here, the trialswap and independentswap algorithms in randomizeMatrix() don't seem to "maintain species occurrence frequency and sample species richness" as described in the docs (or original publications Gotelli 2000 and Miklos & Podani 2004, for that matter). In either case, only species occurrence frequency (total of each column) is preserved, not sample species richness (total of each row).

library(picante)
#> Loading required package: ape
#> Loading required package: vegan
#> Loading required package: permute
#> Loading required package: lattice
#> This is vegan 2.5-7
#> Loading required package: nlme

data(phylocom)
comm <- phylocom$sample

set.seed(12345)

# Randomize community data matrix abundances within species (maintains species occurrence frequency)
comm_rand_f <- randomizeMatrix(comm, null.model = "frequency")

isTRUE(all.equal(rowSums(comm_rand_f), rowSums(comm))) # total abundance per site: expect different
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_f), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE

# Randomize community data matrix abundances within samples (maintains sample species richness)
comm_rand_r <- randomizeMatrix(comm, null.model = "richness")

isTRUE(all.equal(rowSums(comm_rand_r), rowSums(comm))) # total abundance per site: expect same
#> [1] TRUE
isTRUE(all.equal(colSums(comm_rand_r), colSums(comm))) # total individuals per species: expect different
#> [1] FALSE

# Randomize community data matrix with the trial-swap algorithm (Miklos & Podani 2004) maintaining species occurrence frequency and sample species richness
comm_rand_t <- randomizeMatrix(comm, null.model = "trialswap", iterations = 1000)

isTRUE(all.equal(rowSums(comm_rand_t), rowSums(comm))) # total abundance per site: expect same
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_t), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE

# Randomize community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness
comm_rand_i <- randomizeMatrix(comm, null.model = "independentswap", iterations = 1000)

isTRUE(all.equal(rowSums(comm_rand_i), rowSums(comm))) # total abundance per site: expect same
#> [1] FALSE
isTRUE(all.equal(colSums(comm_rand_i), colSums(comm))) # total individuals per species: expect same
#> [1] TRUE

Created on 2021-10-21 by the reprex package (v2.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Asia/Tokyo                  
#>  date     2021-10-21                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  ape         * 5.5     2021-04-25 [1] CRAN (R 4.1.0)
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
#>  cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
#>  cluster       2.1.2   2021-04-17 [1] CRAN (R 4.1.1)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
#>  digest        0.6.28  2021-09-23 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.0)
#>  knitr         1.36    2021-09-29 [1] CRAN (R 4.1.0)
#>  lattice     * 0.20-44 2021-05-02 [1] CRAN (R 4.1.1)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
#>  MASS          7.3-54  2021-05-03 [1] CRAN (R 4.1.1)
#>  Matrix        1.3-4   2021-06-01 [1] CRAN (R 4.1.1)
#>  mgcv          1.8-36  2021-06-01 [1] CRAN (R 4.1.1)
#>  nlme        * 3.1-152 2021-02-04 [1] CRAN (R 4.1.1)
#>  permute     * 0.9-5   2019-03-12 [1] CRAN (R 4.1.0)
#>  picante     * 1.8.2   2020-06-10 [1] CRAN (R 4.1.0)
#>  pillar        1.6.4   2021-10-18 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.10.1  2020-08-26 [1] CRAN (R 4.1.0)
#>  Rcpp          1.0.7   2021-07-07 [1] CRAN (R 4.1.0)
#>  reprex        2.0.0   2021-04-02 [1] CRAN (R 4.1.0)
#>  rlang         0.4.12  2021-10-18 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2   2021-09-23 [1] CRAN (R 4.1.0)
#>  tibble        3.1.5   2021-09-30 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
#>  vegan       * 2.5-7   2020-11-28 [1] CRAN (R 4.1.0)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.27    2021-10-18 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

Hi @joelnitta, I think this is not a bug: species occurrence frequency does not account for individual abundances, only species presence/absence. comm is a matrix of individual abundances. E.g.:isTRUE(all.equal(rowSums(comm_rand_i > 0), rowSums(comm > 0))) gives TRUE. I can't remember the details now, but randomizing to give fixed row and column individual abundance may be impossible.

Hi, as mentioned by @camwebb, this is a 'feature' of these null models, not a bug in picante. These null models do not maintain total abundance per row/column, only the total frequency (richness and frequency of occurrence). I am not aware of any matrix-randomization based null model that would allow maintaining abundances per site or per individual - these methods are all based on shuffling occurrences, so they maintain richness/frequency only.

In your code example above, you are not showing that the richness is changing, but rather that the total abundance per site is changing. This is a limitation of the null model itself, which swaps abundances in a checkerboard fashion, so it maintains sample richness/ species frequency but not total abundance of individuals per sample. This is why in the docs we say these methods "maintain species occurrence frequency and sample species richness" but we do not make such claims about maintaining the number of individuals per species or sample.

There is some discussion of this issue in this article, where I showed that the phylogenetic signal in abundances can influence the ability of these null models to detect community assembly processes:
S.W. Kembel. 2009. Disentangling niche and neutral influences on community assembly: assessing the performance of community phylogenetic structure tests. Ecology Letters 12:949-960.

I see now, thanks for the clarification!

I think some of my confusion stemmed from the fact that independentswap and trialswap were designed with binary (presence-absence) data in mind. In the papers that describe the algorithms (e.g. Gotelli 2000 Ecology) they mention "maintaining row and column sums". So I (mistakenly) assumed that should also apply to abundance data.

This is a fairly subtle difference, and may be worth fleshing out a bit in the documentation. IMHO, "species occurrence frequency" could be read as either species presence/absence or abundance.

Follow-up: adding to my confusion was the fact that null.model = frequency does maintain abundance per species, yet is described in the docs in the nearly the same way as independentswap or trialswap , which (I now know) don't maintain abundance per species (maintains species occurence [sic] frequency and maintaining species occurrence frequency, respectively).

Details
Currently implemented null models (arguments to null.model):

frequency
Randomize community data matrix abundances within species (maintains species occurence frequency)

richness
Randomize community data matrix abundances within samples (maintains sample species richness)

independentswap
Randomize community data matrix with the independent swap algorithm (Gotelli 2000) maintaining species occurrence frequency and sample species richness

trialswap
Randomize community data matrix with the trial-swap algorithm (Miklos & Podani 2004) maintaining species occurrence frequency and sample species richness

Package picante version 1.8.2