Guidelines for the *comspat* package
James Tsakalos (james.tsakalos@gmail.com), Stefano Chelli, Giandiego Campetella, Roberto Canullo, Enrico Simonetti, Sandor Bartha
04.03.2022
Source:vignettes/comspat.Rmd
comspat.Rmd
Introduction
The comspat (within-community spatial organization) R package introduces two diversity measures - the Number of Realized Species Combinations (NRC) and Compositional Diversity (CD) to model plant species co-occurrence patterns as a function of increasing sampling resolution. These diversity measures represent two commonly used measures emerging from a family of models grounded in Information Theory (see Juhász-Nagy, 1967, 1976, 1984a, 1984b, 1993; Juhász-Nagy & Podani, 1983).
This document provides a step-by-step guide demonstrating the
workflow of 'comspat'
with practical examples and expanded
explanatory text.
Step 1. Exploring the simulated and real data sets
The raw data (x) accepted by all functions in the package is a matrix (x) codifying the presence of m species in n primary sampling units as rows. Sites collected as a grid must contain columns (named ‘X’ and ‘Y’) referring to a specific sampling unit positioned in a grid sampling design. The cartesian coordinate 1, 1 refers to the first sampling unit of a grid. Sites collected as transect must contain the column named ‘X’, referring to a specific sampling unit recorded within a contiguous number sequence (starting at 1). For a crosswalk between grid and transect sample designs and the accepted data format see Fig S1.
# Lets have a look at the first few lines and structure of the example data sets
# View the grid data set
library(comspat)
head(grid_patchy_associated)
#> Species X Y
#> 1 Sp1 7 1
#> 2 Sp1 40 1
#> 3 Sp1 55 1
#> 4 Sp1 60 1
#> 5 Sp1 21 2
#> 6 Sp1 26 2
str(grid_patchy_associated)
#> 'data.frame': 1551 obs. of 3 variables:
#> $ Species: Factor w/ 6 levels "Sp1","Sp2","Sp3",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ X : num 7 40 55 60 21 26 28 31 36 50 ...
#> $ Y : num 1 1 1 1 2 2 2 2 2 2 ...
# View the transect data set
head(tran_grass_s)
#> X Species
#> 1 1 Cladonia convoluta
#> 2 1 Parmelia pokorny
#> 3 2 Cladonia convoluta
#> 4 2 Stipa borysthenica
#> 5 2 Tortella inclinata
#> 6 2 Parmelia pokorny
str(tran_grass_s)
#> 'data.frame': 713 obs. of 2 variables:
#> $ X : Factor w/ 426 levels "1","2","3","4",..: 1 1 2 2 2 2 3 3 3 4 ...
#> $ Species: Factor w/ 10 levels "Arenaria serpyllifolia",..: 2 7 2 8 10 7 9 10 7 8 ...
The package contains one simulation and two other grids derived by randomization and two real transect data sets (Table 1). The first grid data set (patchy associated) features patchy patterns with interspecific associations between species due to limited dispersal and interactions between species and due to their different responses to environmental heterogeneity. The second grid data set (patchy no isc) was a created using a random shift randomization where the interspecific species associations were randomized but intraspecific spatial dependence relationships were maintained. The third grid data set (complete random) was created using a complete spatial randomization where no intraspecific autocorrelations and no interspecific spatial associations are present. Please see Step 6 below for detailed information on the applied randomizations.
We provide two transects with real data sampled in open sand grassland in Hungary (Table 1). These data sets were derived from Bartha et al.’s (2008) long-term study on the effect of climate change on the interannual variability of grassland communities. The transect data sets represent: (1) the spatial pattern data of frequent species (species with more than 25 presences along the transect were selected as a threshold from standard textbooks), and (2) the spatial pattern data of plant functional types (PFTs) (in this case data of species which belong to the same functional groups were merged). These data represent the same transect and same community. However, for PFTs the data of rare species were also considered within the particular PFT. For simplicity, we have only selected a 25 m example with a typical pattern; this extent (transect length) is representative of the community pattern. For more details please see Bartha et al. (2008).
Table 1. Properties of the randomized grid and real transect data sets included within the package. The table shows the name, the sampling type, dimensions (Dim), number of species (NSP), simulated drivers of the spatial patterns and the source of the data sets provided by the package.
Data set name | Type | Dim | NSP | Type of patterns | Reference |
---|---|---|---|---|---|
grid_patchy_associated | Grid | 64 ⨯ 64 | 6 | interspecific and intraspecific spatial dependence due to simulated abiotic and biotic interactions | Czárán (1993) & Bartha et al. (1998) |
grid_patchy_n_isc | Grid | 64 ⨯ 64 | 6 | only intraspecific spatial dependence after removing interspecific relationships from the original patterns by randomizations | Bartha & Kertész (1998) |
grid_random | Grid | 64 ⨯ 64 | 6 | random distributions after removing both intra- and interspecific relationships by randomizing the original patterns | Bartha & Kertész (1998) |
tran_grass_s | Transect | 500 ⨯1 | 10 | – | Bartha et al. (2008) |
tran_grass_t | Transect | 500 ⨯1 | 6 | – | Bartha et al. (2008) |
# View the species spatial distributions
# Store user default parameter arguments to reset on exit
oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
# Set plot parameters
par(mar = c(0.5, 0.5, 0.5, 0.5), pty = "s", ps = 12, mfrow = c(1, 3))
colvec <- sample(c("#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#66a61e",
"#e6ab02"), 6)
# Plot the patchy associated data frame
plot(y = grid_patchy_associated$Y, x = grid_patchy_associated$X,
xlab = "", ylab = "", xaxt = "none", yaxt = "none", cex = 0)
points(y = grid_patchy_associated$Y, x = grid_patchy_associated$X,
pch = c(22, 17, 18, 2, 20, 18)[grid_patchy_associated$Species],
col = colvec[grid_patchy_associated$Species],
bg = c("yellow2")[grid_patchy_associated[grid_patchy_associated[, 3] ==
"Sp1", "Species"]],
cex = c(0.8, 0.8, 0.8, 0.6, 0.8, 1)[grid_patchy_associated$Species],
lwd = 2)
box(which = "plot", lwd = 2)
legend("topleft", as.expression(bquote(bold("A"))),
inset = c(-0.3, -0.12), bty = "n", cex = 2)
# Plot the patchy random data frame
plot(y = grid_patchy_n_isc$Y, x = grid_patchy_n_isc$X, xlab = "", ylab = "",
xaxt = "none", yaxt = "none", cex = 0)
points(y = grid_patchy_n_isc$Y, x = grid_patchy_n_isc$X,
pch = c(22, 17, 18, 2, 20, 18)[grid_patchy_n_isc$Species],
col = colvec[grid_patchy_n_isc$Species],
bg = c("yellow2")[grid_patchy_n_isc[grid_patchy_n_isc[, 3]
== "Sp1", "Species"]],
cex = c(0.8, 0.8, 0.8, 0.6, 0.8, 1)[grid_patchy_n_isc$Species],
lwd = 2)
box(which = "plot", lwd = 2)
legend("topleft", as.expression(bquote(bold("C"))),
inset = c(-0.3, -0.12), bty = "n", cex = 2)
# Plot the mixed random data frame
plot(y = grid_random$Y, x = grid_random$X, xlab = "", ylab = "",
xaxt = "none", yaxt = "none", cex = 0)
points(y = grid_random$Y, x = grid_random$X,
pch = c(22, 17, 18, 2, 20, 18)[grid_random$Species],
col = colvec[grid_random$Species],
bg = c("yellow2")[grid_random[grid_random[, 3] == "Sp1", "Species"]],
cex = c(0.8, 0.8, 0.8, 0.6, 0.8, 1)[grid_random$Species],
lwd = 2)
box(which = "plot", lwd = 2)
legend("topleft", as.expression(bquote(bold("B"))),
inset = c(-0.3, -0.12), bty = "n", cex = 2)
These steps show how to visualize the species frequency of occurrence within the grid data sets.
# View the species frequency of occurrence
# Set plot parameters
par(mar = c(4, 5, 2, 1), cex = 0.8, mfrow = c(1, 3))
colvec <- c("lightgoldenrod", "sienna1", "mediumspringgreen",
"forestgreen", "red4", "blue")
# Plot the patchy associated data frame
barplot(table(grid_patchy_associated$Species), col = colvec, las = 2,
ylim = c(0, 800))
mtext("Species frequency", side = 2, line = 3.5, cex = 0.8, font = 2)
box(which = "plot", lwd = 2)
# Plot the patchy random data frame
barplot(table(grid_patchy_n_isc$Species), col = colvec, las = 2,
ylim = c(0, 800))
mtext("Species frequency", side = 2, line = 3.5, cex = 0.8, font = 2)
box(which = "plot", lwd = 2)
# Plot the mixed random data frame
barplot(table(grid_random$Species), col = colvec, las = 2, ylim = c(0, 800))
mtext("Species frequency", side = 2, line = 3.5, cex = 0.8, font = 2)
box(which = "plot", lwd = 2)
Now that we have demonstrated that the grid data sets all feature the
same species abundance distributions we might also like to look at how
the distributions of species combinations vary between the grids. To
achieve this goal we have defined a helper function. Please see the
'freq_comb'
function and documentation.
# Plot the distribution of species combinations
# Get the species frequency at the Primary Sampling unit Scale
a <- freq_comb(grid_patchy_associated, type = "Grid", dim_max = 64)
b <- freq_comb(grid_patchy_n_isc, type = "Grid", dim_max = 64)
c <- freq_comb(grid_random, type = "Grid", dim_max = 64)
# Now we need to prepare the data for a simple bar plot
# Assign a name to the data set
a$dat <- "patchy associated"
b$dat <- "patchy no isc"
c$dat <- "random"
# Join the data together
a <- rbind(a, b, c)
# Reshape the data
data_base <- reshape(a, idvar = "dat", timevar = "combinations",
direction = "wide")
data_base[is.na(data_base)] <- 0
row.names(data_base) <- data_base$dat
data_base$dat <- NULL
data_base <- as.matrix(data_base)
colnames(data_base) <- gsub(pattern = "Freq.", replacement = "",
colnames(data_base))
# Create the barplot (note that we are using log transformation so was can better view the different combinations)
par(mar = c(8, 4, 1, 1))
barplot(height = log1p(data_base), beside = TRUE, cex.names = 0.7,
las = 2, col = c("red", "blue", "green"))
legend("topright", legend = row.names(data_base),
col = c("red", "blue", "green"), pch = c(15, 15, 15), pt.cex = 1.5)
axis(side = 1, line = 4, at = 15, expression(italic("grain size = 0.04 sq.m*")),
tick = FALSE, cex.axis = 0.75)
mtext("Species combinations", 1, 6.5, font = 2)
mtext("Freqency (log %)", 2, 2.5, font = 2)
box(lwd = 2)
These steps show how to visualize the species spatial distributions within the transect data sets.
# View the species spatial distributions
# Set plot parameters
par(mar = c(8, 3, 2, 1), mfrow = c(1, 2), cex = 0.8)
# Plots the transect plant functional type data frame
plot(as.numeric(tran_grass_s$Species) ~ as.numeric(tran_grass_s$X),
xaxt = 'n', yaxt = 'n', ann = FALSE, xlim = c(0,500),
pch = c(3, 4, 15:22)[tran_grass_s$Species],
col=c("red","blue","green","pink","orange","black",
"red","blue","green","pink")[tran_grass_s$Species])
axis(1, at = c(0, 500), labels = c(0, 500))
mtext("Species", side = 2, line = 1.5, font = 2, cex = 0.8)
mtext("Plant presences in 5x5cm units", side = 1, line = 1, font = 2, cex = 0.8)
box(which = "plot", lwd = 2)
legend(x = 20, y = -2.3, legend = c(levels(tran_grass_s$Species)),
pch = c(3, 4, 15:22),
col = c("red","blue","green","pink","orange","black",
"red","blue","green","pink"),
ncol = 2, cex = 0.8,
xpd = TRUE)
# Plots the transect plant functional type data frame
plot(as.numeric(tran_grass_t$Species) ~ as.numeric(tran_grass_t$X),
xaxt = 'n', yaxt = 'n', ann = FALSE, xlim = c(0,500),
pch = c(3, 4, 15:18)[tran_grass_t$Species],
col = c("red", "blue", "green", "pink", "orange", "black")[tran_grass_t$Species])
axis(1, at = c(0, 500), labels = c(0, 500))
mtext("Plant functional types", side = 2, line = 1.5, font = 2, cex = 0.8)
mtext("PFT presences in 5x5cm units", side = 1, line = 1, font = 2, cex = 0.8)
box(which = "plot", lwd = 2)
legend(x = 50, y = -1.2, legend = c(levels(tran_grass_t$Species)),
pch = c(3, 4, 15:18),
col = c("red", "blue", "green", "pink", "orange", "black"),
ncol = 2, cex = 0.8,
xpd = TRUE)
These steps show how to visualize the species frequency of occurrence within the transect data sets.
# View the species frequency of occurrence
# Set plot parameters
par(mar = c(7.5, 5, 1, 1), mfrow = c(1, 2), cex = 0.8, cex.axis = 0.8)
# This part plots the transect species data frame
barplot(table(tran_grass_s$Species), las = 2, ylim = c(0, 200))
mtext("Species frequency", side = 2, line = 3.5, font = 2)
box(which = "plot", lwd = 2)
# This part plots the patchy associated data frame
barplot(table(tran_grass_t$Species), las = 2, ylim = c(0, 400))
mtext("PFT frequency", side = 2, line = 3.5, font = 2)
box(which = "plot", lwd = 2)
Step 2. Preparing the parameter data set
The 'comspat'
function performs computerized sampling
(called secondary sampling) from the primary (grid or transect) data.
During secondary sampling users can change the sampling grain (by
merging adjacent sampling units into larger ones. To control the
secondary sampling procedure, the user must specify several sampling
parameters. The secondary sampling parameters are supplied to the params
argument as a data frame. The parameter data frame stores the secondary
sampling parameters as a series of steps stored as rows. The parameter
data frame also contains columns which specify the length and height of
the secondary sampling units. The units supplied to the length and
height columns must correspond to the number of primary sampling units
in each dimension. The height dimension is not required where sampling
was conducted as a transect. We have used Wensel’s (1975) method within
our package, where incomplete sampling units are complemented by
fragments taken from the opposite edge of the mapped grid. While we
highly recommend following the guideline sequentially if you would like
to visualize this secondary sampling routine please inspect our gif
animations embedded as Fig. S10. Please note that these
will appear as static images in pdf format.
We recommended that the size of the secondary sampling units follow a logarithmic spatial scale during the first approximation of the functions (Fig. S8). In this manner, exponential enlargement of the secondary sampling units is achieved. The largest secondary sampling unit should be less than a quarter of the total sampling area to reduce the edge effect. The ultimate selection of secondary sampling units should consider the distribution of maxima functions. After having the first results, the estimates can be improved by modifying of the secondary sampling (using more or different sampling unit sizes).
# Load and visualize the parameter data set
data("param_grid")
# The sample unit size of our simulated data was 5 cm x 5 cm
su_area <- param_grid[,2] * param_grid[,3] * (0.05 * 0.05)
# Plot the area
par(mar = c(4.5, 4.5, 0.5, 0.5), ps = 12)
plot(su_area ~ c(1:16), type = 'b', pch = 19, las = 2,
ylab = "Sample Unit Area",
xlab = "Steps of Scaling")
mtext(text = expression('(cm'^2*')'),
side = 2, line = 1.9, cex = 0.6)
Step 3. Calculation of the Information Theory models
Here, using visual aids, we demonstrate NRC and CD’s calculation at two scaling steps (spatial-scale 1 and -scale 2) using example grid data (Figure S8). The mathematical formula follows Juhász-Nagy (1967, 1976, 1984b, 1984a, 1993) summarized here.
CD is a descriptor of the spatial variability of the realized species combinations within a community expressed as the diversity of the species combinations in a sample: \[ H_{j} (A,B, ..., Y) = -\sum_{k=1}^{2^s} p_{kj} \times log_{2}(p_{kj}) \] Where A, B, …, Y are the s species of the community; pkj is the probability of finding a particular species combination (i.e. the kth combination of species) within the sample at a specific sampling unit size j; k ranges from 1 to 2s and is the number of the realized combination (NRC) of species.
Step 4. Computerized sampling of the data sets
Complete sampling, i.e. sampling from every possible positions with overlaps (cf. Bartha et al. 1995) within the sample universe was used to calculate NRC and CD.
While a picture paints a thousand words, perhaps an animation of the secondary sampling will solidify your understanding further. Here we present the complete sampling of every position within the sample universe of a grid (sized at 10 x 10 grids) at three spatial scaling steps.
Step 5. comspat without null models.
The 'comspat'
function allows users to calculate
Information Theory models with and without null models. Analysis without
the use of null models is a useful feature during the data exploration
phase. It is useful because it allows users to optimize their parameter
data set and to visually assess the data without the requirement of
parallel computing (initiated when null models are generated) and
waiting for potentially time-intensive calculations to complete. Here we
demonstrate how the 'comspat_plot'
function can be used to
prepare a graphical output. Because the plotting function was created
using R’s base plotting code it is compatible with several common
plotting functions (including 'par'
).
# Calculate the NRC and CD for the grid data sets.
g_r <- comspat(data = grid_random, params = param_grid,
dim_max = 64, type = "Grid")
g_pr <- comspat(grid_patchy_n_isc, param_grid, 64, "Grid")
g_pa <- comspat(grid_patchy_associated, param_grid, 64, "Grid")
# Usage of comspat_plot
# Lets visualize CD for all grid data sets on one plot.
data <- list(g_r, g_pr, g_pa)
p_col <- list("red", "blue", "green")
par(mar = c(3.5, 3, 0.5, 0.5), mfrow = c(1, 2))
# Let us start by adding the Compositional Diversity (i.e. CD)
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "CD",
su_size = 0.01, ymin = 0, ymax = 5.5, xmax = 1, p_col = p_col)
# Hint, try changing the measure argument to "NRC"
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "NRC",
su_size = 0.01, ymin = 0, ymax = 65, xmax = 1, p_col = p_col)
Visual inspection of Figure S11 suggests that NRC and CD peak at low levels and at larger sampling units within the simulated (grid_patchy_n_isc) data set compared to the grid_random and grid_patchy_n_isc data sets.
When studying plant species co-occurrence patterns, one can never be sure if there is spatial dependence among species. Under natural (real world) conditions, the causes of the spatial association are not discernible from the examination of the patterns alone. For revealing mechanisms, we should use repeated measures (through time) or manipulative experiments (Rejmanek & Leps 1966). Indeed, designing an experiment to unravel the causes of spatial association in species-rich systems presents a unique (and time and resource-intensive) challenge. Indeed, there are several problems with applying standard statistical tests. In short, this is because sampling units overlap, they are not strictly independent (Podani 1984; Podani et al. 1993). Fortunately, we can compare the observed data against random references and use Monte-Carlo simulations as a basis of the significance test (i.e. null models). Null models are commonly applied to address many ecological questions, especially where experimentation is impossible or where classical statistical methods fail (Gotelli & Graves 1996).
Step 6. comspat with null models.
Of the many types of randomizations potentially available to produce random references we have selected to use complete spatial randomness (CSR) and random shift (RS). Both of these randomizations have and extensive history in ecological studies and been used with JNP models in the past. In the case of CSR, all relationships (within species and between species relationships) are randomized. In the random shift, only interspecific relationships are randomized while maintaining the spatial dependence of species.
Random shifts are a type of randomization procedure that consists of moving a map (i.e. map of the coordinates of species) about a two-dimensional torus in four cardinal directions (see Harms 1997). Random shifts applied to transects can only occur along two cardinal directions. The essence of the RS algorithm is the symmetry transformation, i.e. a specific transformation when within-species pattern should remain identical (it means that all distances and spatial relationship between individuals within a single species will not change. To increase the number of potential random cases, RS is often used together with random rotations and reflections. When data are reflected or rotated, the within-species pattern remains identical (i.e. it is symmetrically transformed). Here Symmetry means the figure’s identity like our face in the mirror. The random shift does the same; for example, when we shift the corner of the picture, the whole picture itself (i.e. the within species pattern) does not change. In summary, random shift keeps the within-species relationships intact (invariant) and because you shift every species independently and differently, the interspecific relationships will change (this type of randomizations makes changes in interspecific relationships change but intraspecific relationships have not been changed).
To illustrate the process of a random shift (together with random rotations and reflections), we have provided diagrams demonstrating the potential reflections, rotations and random shifts (Figure S12-14). Please note that the randomization steps occur independently for each species for each randomization run (i.e. 1 run of 999 or any other number that you decide for the randomizations); each step is repeated n times (where n is the number of species).
Here is a diagram of several shifts on a 4 ⨯ 4 grid. The position (1,1) represents the starting position. The position (2,1) represents one shift to the right, the position (2, 2) represents two shifts to the right and two shifts down. For demonstration purposes we have only shown the movement along two cardinal directions (i.e. down and right); our code does in fact move in all directions. While the demonstration shows a systematic shift pattern, our code randomly selects a new x and y coordinate for each randomization (for each species).
Case Study 1: Simulated grid data
This section reproduces the analyses of the grid data sets presented
in Figure 1 (Tsakalos et al. 2022). It demonstrates how to use the
'comspat'
function with null models on grid data. It also
demonstrates how the 'comspat_plot'
function can be used in
combination with R’s base plotting system to create complex graphical
outputs.
# Case study 1: Simulated grid data sets
# Calculate NRC and CD
# Note that running 999 iterations can take considerable time
rand_csr <- comspat(data = grid_random, params = param_grid, dim_max = 64,
type = "Grid", randomization_type = "CSR", iterations = 999)
rand_rs <- comspat(data = grid_random, params = param_grid, dim_max = 64,
type="Grid", randomization_type = "RS", iterations = 999)
patch_csr <- comspat(data = grid_patchy_n_isc, params = param_grid,
dim_max = 64, type = "Grid", randomization_type = "CSR",
iterations = 999)
patch_rs <- comspat(data = grid_patchy_n_isc, params = param_grid,
dim_max = 64, type = "Grid", randomization_type = "RS",
iterations = 999)
assoc_csr <- comspat(data = grid_patchy_associated, params = param_grid,
dim_max = 64, type = "Grid", randomization_type = "CSR",
iterations = 999)
assoc_rs <- comspat(data = grid_patchy_associated, params = param_grid,
dim_max = 64, type = "Grid", randomization_type = "RS",
iterations = 999)
# Set plot parameters
colA <- sample(
c("#1b9e77", "#d95f02", "#7570b3", "#e7298a", "#66a61e", "#e6ab02"),6)
par(mfrow = c(3, 3), mar = c(0, 0, 0, 0), oma = c(5, 5, 0.5, 0.5),
mgp = c(2, 1, 0), tcl = -0.5)
################################################################################
# Part 1: the spatial distribution
# Plot the patchy associated data frame
plot(y = grid_patchy_associated$Y, x = grid_patchy_associated$X,
xlab = "", ylab = "", xaxt = "none", yaxt = "none", cex = 0)
points(y = grid_patchy_associated$Y, x = grid_patchy_associated$X,
pch = c(22, 17, 18, 2, 20, 18)[grid_patchy_associated$Species],
col = colA[grid_patchy_associated$Species],
bg = c("yellow2")[grid_patchy_associated[grid_patchy_associated[, 3]
== "Sp1", "Species"]],
cex = c(0.8, 0.8, 0.8, 0.6, 0.8, 1)[grid_patchy_associated$Species],
lwd = 2)
text(x = 3,y = 62, "A", font = 2, cex = 1)
# Plot the patchy random data frame
plot(y = grid_patchy_n_isc$Y, x = grid_patchy_n_isc$X,
xlab = "", ylab = "", xaxt = "none", yaxt = "none", cex = 0)
points(y = grid_patchy_n_isc$Y, x = grid_patchy_n_isc$X,
pch = c(22, 17, 18, 2, 20, 18)[grid_patchy_n_isc$Species],
col = colA[grid_patchy_n_isc$Species],
bg = c("yellow2")[grid_patchy_n_isc[grid_patchy_n_isc[, 3]
== "Sp1", "Species"]],
cex = c(0.8, 0.8, 0.8, 0.6, 0.8, 1)[grid_patchy_n_isc$Species],
lwd = 2)
text(x = 3, y = 62,"B", font = 2, cex = 1)
# Plot the mixed random data frame
plot(y = grid_random$Y, x = grid_random$X,
xlab = "", ylab = "", xaxt = "none", yaxt = "none", cex = 0)
points(y = grid_random$Y, x = grid_random$X,
pch = c(22, 17, 18, 2, 20, 18)[grid_random$Species],
col = colA[grid_random$Species],
bg = c("yellow2")[grid_random[grid_random[, 3] == "Sp1", "Species"]],
cex = c(0.8, 0.8, 0.8, 0.6, 0.8, 1)[grid_random$Species],
lwd = 2)
text(x = 3, y = 62, "C", font = 2, cex = 1)
################################################################################
# Part 2: the NRC plots with 95% CI
p_col = list("blue","red")
# Plot the mixed random data frame
data <- list("CSR" = assoc_csr[[2]], "RS" = assoc_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "NRC",
su_size = 0.01, ymin = 0, ymax = 65, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = TRUE, p_col = p_col, stats_output = TRUE)
mtext("NRC", side = 2, line = 3, font = 2)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.025)
text(x = 0.0125, y = 63,"D", font = 2, cex = 1)
# Plot the patchy random data frame
data <- list("CSR" = patch_csr[[2]],"Random Shift" = patch_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "NRC",
su_size = 0.01, ymin = 0, ymax = 65, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = FALSE, p_col = p_col, stats_output = TRUE)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.025)
axis(2, at = round(seq(0, 60, 10)),
labels = FALSE, cex.axis = 0.75, tck = 0.015)
text(x = 0.0125, y = 63,"E", font = 2, cex = 1)
# Plot the patchy associated data frame
data <- list("CSR" = rand_csr[[2]], "Random Shift" = rand_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "NRC",
su_size = 0.01, ymin = 0, ymax = 65, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = FALSE, p_col = p_col, stats_output = TRUE)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.025)
axis(2, at = round(seq(0, 60, 10)),
labels = FALSE, cex.axis = 0.75, tck = 0.015)
text(x = 0.0125,y = 63,"F", font = 2, cex = 1)
# Part 3: the CD plots with 95% CI
# Plot the mixed random data frame
data <- list("CSR" = assoc_csr[[2]], "Random Shift" = assoc_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "CD",
su_size = 0.01, ymin = 0, ymax = 5.5, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = TRUE, p_col = p_col, stats_output = TRUE)
mtext("CD", side = 2, line = 3, font = 2)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.025)
text(x = 0.0125, y = 5.3, "G", font = 2, cex = 1)
# Plot the patchy random data frame
data<-list("CSR" = patch_csr[[2]], "Random Shift" = patch_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "CD",
su_size = 0.01, ymin = 0, ymax = 5.5, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = FALSE, p_col = p_col, stats_output = TRUE)
mtext("Length of sampling units (m)", side = 1, line = 3, font =2)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.025)
axis(2, at = round(seq(0, 5, 1)), labels = FALSE, cex.axis = 0.75, tck = 0.015)
text(x = 0.0125, y = 5.3,"H", font = 2, cex = 1)
# Plot the patchy associated data frame
data <- list("CSR" = rand_csr[[2]],"Random Shift" = rand_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Grid", measure = "CD",
su_size = 0.01, ymin = 0, ymax = 5.5, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = FALSE, p_col = p_col, stats_output = TRUE)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.025)
axis(2, at = round(seq(0, 5, 1)), labels = FALSE, cex.axis = 0.75, tck = 0.015)
text(x = 0.0125, y = 5.3, "I", font = 2, cex = 1)
Unlike our interpretations of Figure S11 we can infer (supported by null distributions and statistical inference) which assembly mechanisms may produce the observed spatial patterns of the simulated communities.
As the level of complexity decreases (i.e. the number of assembly mechanisms involved in forming the simulated data) from left to right, we can observe an increase in the amplitude of the observed peaks correlated to an increase in the area where the maxima (peaks) of the unimodal curves are observed.
Firstly, both NRC and CD peak at high amplitudes (64 and 5.3, resp.) and small spatial scale (0.2 m). These observed patterns do not deviate from the CSR and RS 95% confidence intervals suggesting that the pattern forming processes are product of random expectation (Figure S15 C, F & I).
Second, both NRC and CD peak at intermediary amplitudes (54 and 4.19, resp.) and intermediate spatial scales (0.3 m). The observed patterns deviate strongly from the CSR 95% confidence intervals suggesting that the pattern forming processes are non-random. The observed patterns do not deviate from the RS 95% confidence intervals suggesting that the patterns are driven by spatial autocorrelations within individual species (for example due to abiotic filtering mechanisms or limited dispersal). (Figure S15, B, E, H).
Lastly, both NRC and CD peak at low amplitudes (59 and 4.2, resp.) and large spatial scales (0.30–0.35 m and 0.45 m, resp.). The observed patterns deviate strongly from both CSR and RS 95% confidence intervals (at large sample unit sizes the observed pattern starts to overlap) hinting at the role of mechanisms such as filtering (plant-environment relationships) and limiting similarity (plant-plant interactions) where plant spatial associations result from resource partitioning (Figure S15 A, D & G).
When using the 'comspat'
function with randomizations
the function outputs two list objects. The first list object stores the
specified Information Theory models (i.e. NRC and CD) for every
secondary sampling unit for every randomization. The second list
provides the summary statistics for each of the specified Information
Theory models (i.e. NRC and CD) for every secondary sampling unit
(Table 2).
The summary statistics can be accessed with the following code. These summary results may be helpful in interpreting specific results obtained from the randomizations.
knitr::kable(assoc_csr$`Summary statistics`[["CD"]][,1:10],format = "html", table.attr = "style='width:100%;'")
step_1 | step_2 | step_3 | step_4 | step_5 | step_6 | step_7 | step_8 | step_9 | step_10 | |
---|---|---|---|---|---|---|---|---|---|---|
Original | 1.8417 | 3.2104 | 3.8128 | 4.0063 | 4.0374 | 4.0464 | 4.0816 | 4.1351 | 4.1565 | 4.1345 |
Mean | 1.8504 | 4.0761 | 5.2368 | 5.3835 | 4.8631 | 3.9844 | 2.9926 | 2.0623 | 1.3037 | 0.7536 |
Max | 1.8681 | 4.1211 | 5.3113 | 5.5009 | 5.0036 | 4.2193 | 3.2684 | 2.4154 | 1.7069 | 1.1186 |
Min | 1.8309 | 4.0268 | 5.1499 | 5.2820 | 4.7433 | 3.7836 | 2.6821 | 1.7186 | 0.9700 | 0.4188 |
Quartile 1 | 1.8460 | 4.0670 | 5.2190 | 5.3600 | 4.8360 | 3.9420 | 2.9270 | 1.9820 | 1.2180 | 0.6786 |
Quartile 3 | 1.8550 | 4.0850 | 5.2540 | 5.4080 | 4.8900 | 4.0250 | 3.0550 | 2.1430 | 1.3860 | 0.8291 |
std | 0.0065 | 0.0143 | 0.0259 | 0.0351 | 0.0407 | 0.0637 | 0.0965 | 0.1184 | 0.1227 | 0.1132 |
cv | 0.0035 | 0.0035 | 0.0049 | 0.0065 | 0.0084 | 0.0160 | 0.0322 | 0.0574 | 0.0941 | 0.1502 |
p o < r | 0.0951 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.8458 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
p o > r | 0.9049 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.1542 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
95%UL | 1.8550 | 4.0850 | 5.2540 | 5.4080 | 4.8900 | 4.0250 | 3.0550 | 2.1430 | 1.3860 | 0.8291 |
95%LL | 1.8460 | 4.0670 | 5.2190 | 5.3600 | 4.8360 | 3.9420 | 2.9270 | 1.9820 | 1.2180 | 0.6786 |
Table 2. Summary statistic variables and descriptions exported by comspat when the randomization argument is specified.
Variable | Description |
---|---|
Original | Observed results prior to randomization. |
Mean | Mean of all randomized values |
Max | Maximum of all randomized values |
Min | Minimum of all randomized values |
Quartile 1 | 25th percentile, indicating that 25% of the data are less than or equal to this value |
Quartile 3 | 75th percentile indicating that 75% of the data are less than or equal to this value |
std | Standard deviation is a measure of the amount of variation or dispersion of a set of values (defined by the number of iterations |
cv | Coefficient of variation (CV) is a measure of relative variability. It is the ratio of the standard deviation to the mean |
p o<r | Number of occurrences where the observed value was less than the reference divided by the number of iterations |
p o>r | Number of occurrences where the observed value was greater than the reference divided by the number of iterations |
95% UL | Upper limit of the 95% confidence interval |
95% LL | Lower limit of the 95% confidence interval |
Case Study 2: Real transect data
This section analyses the real transect data sets provided by the
package. It demonstrates how to use the 'comspat'
function
with null models on transect data. It also demonstrates how the
'comspat_plot'
function can be used in combination with R’s
base plotting system to create complex graphical outputs.
# Case study 2: Real transect data sets
# Calculate NRC and CD
# Note that running 999 iterations can take considerable time
s_csr <- comspat(data = tran_grass_s, params = param_tran, dim_max = 500,
type = "Transect", randomization_type = "CSR",
iterations = 999)
s_rs <- comspat(data = tran_grass_s, params = param_tran, dim_max = 500,
type = "Transect", randomization_type = "RS",
iterations = 999)
t_csr <- comspat(data = tran_grass_t, params = param_tran, dim_max = 500,
type = "Transect", randomization_type = "CSR",
iterations = 999)
t_rs <- comspat(data = tran_grass_t, params = param_tran, dim_max = 500,
type = "Transect", randomization_type = "RS",
iterations = 999)
# Set plot parameters
par(mfrow = c(2, 2), mar = c(0.5, 2.2, 0, 0), oma = c(7, 3, 0.5, 0.5),
mgp = c(2, 1, 0), tcl = -0.5)
p_col = list("blue","red")
################################################################################
# Part 1: the NRC plots with 95% CI
# Plot the transect species data frame
data <- list("CSR" = s_csr[[2]], "RS" = s_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Transect",
measure = "NRC", su_size = 0.01, p_col = p_col,
ymin = 0, ymax = 200, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = TRUE, stats_output = TRUE)
mtext("NRC", side = 2, line = 3, font = 2)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2), labels = FALSE,
cex.axis = 0.75, tck = 0.025)
text(x = 0.0125, y = 197,"A", font = 2, cex = 1)
# Plot the transect PFT data frame
data <- list("CSR" = t_csr[[2]], "RS" = t_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Transect",
measure = "NRC", su_size = 0.01, p_col = p_col,
ymin = 0, ymax = 45, xmin = 0.01, xmax = 10,
xaxt = FALSE, yaxt = TRUE, stats_output = TRUE)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = 0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2), labels = FALSE,
cex.axis = 0.75, tck = 0.025)
text(x=0.0125,y=44,"B", font = 2, cex = 1)
# Part 2: the CD plots with 95% CI
# Plot the transect species data frame
data<-list("CSR"=s_csr[[2]],"Random Shift"=s_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Transect",
measure = "CD", su_size = 0.01, p_col = p_col,
ymin = 0, ymax = 8, xmin = 0.01, xmax = 10,
xaxt = TRUE, yaxt = TRUE, stats_output = TRUE)
mtext("CD (bits)", side = 2, line = 3, font = 2)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.015)
axis(1, at = round(c(0.01, 0.1, 1, 10, 100), 2), labels = FALSE,
cex.axis = 0.75, tck = -0.025)
mtext("Length of sampling units (m)", side =1, line = 4, font =2)
text(x = 0.0125, y = 7.8,"C", font = 2, cex = 1)
# Plot the transect PFT data frame
data <- list("CSR" = t_csr[[2]], "Random Shift" = t_rs[[2]])
comspat_plot(data = data, params = param_grid, type = "Transect",
measure = "CD", su_size = 0.01, p_col = p_col,
ymin = 0, ymax = 5, xmin = 0.01, xmax = 10,
xaxt = TRUE, yaxt = TRUE, stats_output = TRUE)
axis(1, at = round(c(0.01, seq(0.02, 0.09, 0.01), 0.1, 1, 10, 100), 2),
labels = FALSE, cex.axis = 0.75, tck = -0.015)
axis(1, at = c(0.01, 0.1, 1, 10, 100), labels = TRUE,
cex.axis = 0.75, tck = -0.025)
text(x = 0.0125, y = 5,"D", font = 2, cex = 1)
legend(x = 0.013, y = -1.5, legend = c("Observed", "CSR 95% CI", "RS 95% CI"),
lty = c(1, 1, 1),
col = c("black", "blue", "red"),
ncol = 2, cex = 0.8,
xpd = NA)
The real transect data sets were derived from Bartha et al.’s (2008) long-term study on the effect of climate change on the interannual variability of grassland communities. The observed patterns deviate significantly from CSR expectation but they did not differ from the null distribution produced by RS. This results suggest that interspecific interactions might have limited role for shaping the spatial patterns of species in this open grasslands while abiotic factors (soil heterogeneity, weather extremes) might be more important . Here we have analyzed only a single measure in time; the manuscript (and its strength) lies strongly in its long term (9 monitoring years) repeated measure sampling design.
References
Bartha, S. and M. Kertész. 1998. The importance of neutral-models in detecting interspecific spatial associations from ‘trainsect’ data. Tiscia 31, 85–98.
Bartha, S., Czárán, T., & Oborny, B. (1995). Spatial constraints masking community assembly rules: a simulation study. Folia geobotanica, 30, 471–482.
Bartha, S., Campetella, G., Ruprecht, E., Kun, A., Házi, J., Horváth,
A., Virágh, K., & Molnár, Zs. (2008). Will
interannual variability in sand grassland communities increase with
climate change? Community Ecology, 9, 13–21.
Czárán, T. 1993. PATPRO: A Monte-Carlo simulation program for multispecies neighborhood competition. Abstracta Botanica, 17, 275–281.
Gotelli, N.J., & Graves, G.R. (1996). Null Models in Ecology. Smithsonian Institution Press, Washington D.C.
Harms, K.E. (1997). Habitat specialization and seed dispersal limitation in a Neotropical forest. PhD thesis, Princeton University, Princeton, New Jersey, USA
Juhász-Nagy, P. (1967). On association among plant populations I. Acta Biologica Debrecina, 5, 43–56.
Juhász-Nagy, P. (1976). Spatial dependence of plant populations. Part 1. Equivalence analysis (an outline for a new model). Acta Botanica Academiae Scientiarum Hungaricae, 22, 61–78.
Juhász-Nagy, P. (1984a). Notes on diversity. Part I. Introduction. Abstracta Botanica, 8, 43–55.
Juhász-Nagy, P. (1984b). Spatial dependence of plant populations. Part 2. A family of new models. Acta Botanica Hungarica, 30, 363–402.
Juhász-Nagy, P. (1993). Notes on compositional diversity. Hydrobiologia, 249, 173–182.
Juhász-Nagy, P., & Podani, J. (1983). Information theory methods for the study of spatial processes and succession. Vegetatio, 51, 129–140.
Podani, J. (1984). Analysis of mapped and simulated vegetation patterns by means of computerized sampling techniques Acta Botanica Hungarica, 30, 403–425.
Podani, J., Czárán, T., & Bartha, S (1993). Pattern, area and diversity: the importance of spatial scale in species assemblages. Abstracta Botanica, 17, 37–51.
Rejmánek, M., & Lepš, J. (1966). Negative associations can reveal interspecific competition and reversal of competitive hierarchies during succession. Oikos, 76, 161–168.
Tsakalos, J.L., Chelli, S., Campetella, G., Canullo, R., Simonetti, E., Bartha, S. (2022). comspat: An R package to analyze within community spatial association.
Wensel, L. C. (1975). The treatment of boundary-line overlap in a forest sampling simulator. Hilgardia, 43, 143–159.