vignettes/Italian_Biomes.Rmd
Italian_Biomes.Rmd
climenv
includes three data sets that feature Biome
information for Italy. The data sets include it_py
,
it_pt
and it_data
. The prefix it stands for
“Italy”. Within it_py
the Biomes are represented by
polygons (py), within it_pt
the Biomes have been sampled by
a gridded (c. 5km) point (pt) data set. Lastly, it_data
is
the output of the ce_extract()
function generated using
WorldClim 2 climate and Mapzen digital elevation.
# Make a map of the Italian polygons data set
plot(it_py, key.pos = NULL, border = NA, main = "")
From Fig 1, it is evident that the Italian polygons data set effectively represents the spatial distribution of the Mediterranean and Nemoral Biomes derived from the aggregation of vegetation typologies across the entire geography of Italy. This data set offers valuable insights into the extent and boundaries of different Biomes within the region. Nevertheless, it is essential for users to recognise that should they have smaller or larger scaled maps, which capture the spatial boundaries of different vegetation groups such as communities, habitats, disturbance classes, etc., they would serve as an ideal candidate for analyses with our package.
# Make a map of the Italian points data set
plot(it_pt, pch = 19, cex = 0.5, key.pos = NULL, main = "")
By design Fig 2, spans almost the same spatial extent as in Fig 1. Each of the points (n = 9963) within the Italian point data set is spaced at c. 5m in a grid configuration. Here users containing georeferenced features such as quadrat locations, mapping notes of vegetation typologies, disturbance regimes, etc., they would serve as an ideal candidate for analyses with our package. By utilising these polygon or point mapping structures as a foundation, our package enables users to leverage the inherent structure and spatial representation they require to conduct their own targeted investigations effectively. This flexibility accommodates a wide range of research needs and empowers users to explore the intricate relationships of vegetation groups, habitats, disturbance classes, structural classes, etc., on their terms.
The function ce_download
enables users to download both
climate and elevation data. Users have the option to select climate data
from either WorldClim 2 (Fick & Hijmans, 2017) or CHELSA (Karger et
al., 2017; Karger et al., 2021). Regarding elevation data, users can
choose either the NASA Earth Explorer’s SRTM (Farr et al., 2007) or
Mapzen terrain tiles (Hollister & Shah, 2018).
In this specific example, we have utilised the default settings of
ce_download
to download WorldClim 2 climate data and Mapzen
elevation data for the Italian polygon data set. However, please note
that the following code chunk is intentionally set not to evaluate
because its execution requires time. Fortunately, to facilitate the
presented examples, we have already stored the downloaded and extracted
(using ce_extract
) data within the package (see below for
more details).
# Do not execute this code if you are in a hurry as it will take time depending
# on the strength and speed of your internet connection.
# This part downloads the data and takes time
ce_download(
"C/example_output..",
location = it_py
)
# Once the data is downloaded, this function is quick.
it_data <- ce_extract(
"C/example_output..",
location = it_py,
location_g = "GB"
)
# Fortunately you can avoid the above steps by simply loading the data
data("it_data")
climenv provides three plotting functions, one for the traditional
Holdridge life zone classification scheme (plot_h
), one for
the Walter-Lieth climate diagrams (plot_wl
) and a new
custom plot (plot_c
) integrating features from both systems
as well as the commonly used bioclim variables (Hijmans et al.,
2005).
One of our graphical outputs is the Holdridge (1967) life zone classification plot. Holdridge’s life zone classification plot, also known as the Holdridge Life Zone System or Holdridge Bioclimatic Classification System, is based on three main factors that influence the distribution of vegetation and ecosystems. By combining temperature, precipitation, and potential evapotranspiration Holdridge’s classification plot divides the Earth’s surface into distinct life zones or biomes (sensu Holdridge).
# Make Holdridge's (1967) life zone classification diagram
plot_h(data = it_data, "MED")
plot_h(data = it_data, "NEM")
Another common graphical output is the Walter-Lieth (1960) climatic diagram. Here our package is a wrapper for the existing diagwl function of the climatol R package (Guijarro, 2019). This diagram consists of two primary components: temperature and precipitation, which when combined in a single diagram is supposed to allow for a comprehensive visualisation of climate patterns.
# Make Walter-Leigh's (1960) climate diagram
plot_wl(data = it_data, "MED")
plot_wl(data = it_data, "NEM")
Lastly, we present our custom diagrams which incorporate elements from Holdridge’s (1967) life zone classification, Walter-Lieth’s climatic diagram (1960), and the widely utilised bioclimatic variables (Hijmans et al., 2005) commonly employed in ecological and environmental research. These variables are crucial in describing environmental factors that shape the distribution and behaviour of organisms, including plants, as evidenced by the high citation count of works by Holdridge, Walter-Lieth, and Hijmans.
# Make the custom climate diagram
oldpar <- par(mar = c(1.5, 2.8, 2, 17))
plot_c(data = it_data, "MED")
plot_c(data = it_data, "NEM")
par(oldpar)
To ensure the selection of climatic variables for discriminatory analyses, numerous data-driven approaches can be employed. For instance, with lots of data, one could use CART, random forests, boosted regression trees, or any number of machine learning algorithms to determine which variables are the most appropriate (robust and ecologically meaningful) for delineating between classes , for the purposes of discriminatory analyses. In this extended vignette, we will delve into an exploration of which climatic variables are most appropriate for delineating our highly specific example data set of Italian Biomes (Italian point data set). By employing these rigorous methods, one can make a more informed decisions about the choice of climatic variables that play a major role in characterising and distinguishing the climatic envelopes of the various biomes in their own study areas.
# Let's start by extracting the climate data for all points in the
# italy_pt data.
it_data_extended <- ce_extract(
path = "C:/Users/jamie/OneDrive/Desktop/temp",
location = it_pt
)
# If you haven't downloaded the climate and elevation data by following the code
# above you could simply load the data()
data("it_data_extended")
# But this data needs to be reformatted a bit.
# Let's start by calculating the bioclim variables:
bioclim <- data.frame(dismo::biovars(
prec = as.matrix(it_data_extended$prec[, 1:12]),
tmin = as.matrix(it_data_extended$tmin[, 1:12]),
tmax = as.matrix(it_data_extended$tmax[, 1:12])
))
# Let's make some more intuitive names for these variables:
colnames(bioclim) <- c(
"Mean_annual_temperature",
"Mean_diurnal_range",
"Isothermality",
"Temperature_seasonality",
"Max_temperature_of_warmest_month",
"Min_temperature_of_coldest_month",
"Temperature_annual_range",
"Mean_temperature_of_the_wettest_quarter",
"Mean_temperature_of_driest_quarter",
"Mean_temperature_of_warmest_quarter",
"Mean_temperature_of_coldest_quarter",
"Total_annual_precipitation",
"Precipitation_of_wettest_month",
"Precipitation_of_driest_month",
"Precipitation_seasonality",
"Precipitation_of_wettest_quarter",
"Precipitation_of_driest_quarter",
"Precipitation_of_warmest_quarter",
"Precipitation_of_coldest_quarter"
)
# Now get the Holdridge variables
hold <- bioclimate(
as.matrix(it_data_extended$tavg[, 1:12]),
as.matrix(it_data_extended$prec[, 1:12])
)
# Make some more intuitive names for these variables
colnames(hold)[1:3] <- c(
"Mean_Annual_Biotemperature",
"Total_Annual_Precipitation",
"Potential_Evapotranspiration_Ratio"
)
# Join them together along with elevation and altitude
dat <- cbind(
bioclim, hold[, c("Mean_Annual_Biotemperature",
"Potential_Evapotranspiration_Ratio")],
elev = it_data_extended$elev,
lat = it_data_extended$lat
)
# Let's just fix the last two column names
colnames(dat)[c(22, 23)] <- c("Elevation", "Latitude")
# Now let's add the Biome type as a column
dat$class <- as.factor(it_pt$GB)
Random Forest is an ensemble learning algorithm that combines multiple decision trees to make predictions. Decision trees are structures that make predictions by splitting data based on if-else conditions. They create a tree-like model of decisions. Random Forest adds randomness by using a technique called bootstrapping, which involves randomly sampling the data with replacement. It also introduces variability by randomly selecting a subset of predictor variables for each split in a decision tree. In Random Forest, a collection of decision trees is built. Each tree is trained on a different subset of the data and uses a random subset of predictor variables. Random Forest models are known for their robustness, accuracy, and resistance to overfitting. They handle high-dimensional data well, are less affected by outliers, and can capture complex relationships between variables. Additionally, Random Forest provides an estimate of variable importance, which helps with feature selection and interpretation. For more information please refer to Breiman (2001).
# Now we are about ready to start the random forests analyses
# Load the required package
library("randomForest")
# Now we need to split the dataset into training and testing sets
set.seed(123) # For reproducibility
train_indices <- sample(
seq_len(dim(dat)[1]), dim(dat)[1] * 0.8
) # 80% for training
train_data <- dat[train_indices, ]
test_data <- dat[-train_indices, ]
# Train the Random Forest model
model <- randomForest(formula = class ~ ., data = train_data)
# Make predictions on the test set
predictions <- predict(model, newdata = test_data)
# Evaluate the model's performance
confusion_matrix <- table(predictions, test_data$class)
accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix)
print(confusion_matrix)
#>
#> predictions MED NEM
#> MED 475 39
#> NEM 54 1425
print(paste("Accuracy:", accuracy))
#> [1] "Accuracy: 0.95333667837431"
Based on the provided results, the model demonstrated a high accuracy of approximately 95% in predicting the Mediterranean and Nemoral Biomes in Italy. The individual error rates were observed to be around 8% for the Mediterranean Biome and 4% for the Nemoral Biome. While cross-validation and other methods could potentially enhance the model’s performance, exploring them falls beyond the scope of this brief vignette. However, it might be valuable to assess the importance of variables in this model and determine whether they correspond to the variables present in our climate diagrams.
Considering this objective, the Gini coefficient is commonly employed to assess the performance of the predictive variables, specifically climate and elevation variables. The Gini coefficient, ranging from 0 to 1, serves as a measure of the relative importance of these predictor variables within the model. Higher Gini coefficients signify greater significance of the corresponding variables in predicting the desired outcome. By visualising the variable importance rankings, the plot sheds light on the crucial predictors that contribute to the model’s performance. Variables with higher Gini coefficients hold more influence within the Random Forest model, emphasising their importance in the prediction process.
# Get variable importance measures
importance <- importance(model)
# Print the importance measures
print(importance)
#> MeanDecreaseGini
#> Mean_annual_temperature 279.25592
#> Mean_diurnal_range 44.48494
#> Isothermality 42.86125
#> Temperature_seasonality 114.87925
#> Max_temperature_of_warmest_month 38.21648
#> Min_temperature_of_coldest_month 487.73150
#> Temperature_annual_range 78.86297
#> Mean_temperature_of_the_wettest_quarter 56.62279
#> Mean_temperature_of_driest_quarter 98.47761
#> Mean_temperature_of_warmest_quarter 70.30854
#> Mean_temperature_of_coldest_quarter 487.77039
#> Total_annual_precipitation 51.77637
#> Precipitation_of_wettest_month 44.92184
#> Precipitation_of_driest_month 132.60699
#> Precipitation_seasonality 236.84034
#> Precipitation_of_wettest_quarter 51.59033
#> Precipitation_of_driest_quarter 144.96324
#> Precipitation_of_warmest_quarter 91.28258
#> Precipitation_of_coldest_quarter 48.69874
#> Mean_Annual_Biotemperature 335.61126
#> Potential_Evapotranspiration_Ratio 64.91563
#> Elevation 49.51127
#> Latitude 185.55613
# Plot variable importance
oldpar <- par(mar = c(4, 3, 0, 1) + 0.1)
varImpPlot(model, main = "")
par(oldpar)
Breiman, L. (2001). Random forests. Machine learning, 45, 5–32.
Farr, T. G., Rosen, P. A., Caro, E., Crippen, R., Duren, R., Hensley, S. et al. (2007). The Shuttle Radar Topography Mission. Reviews of Geophysics, 45.
Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: new 1‐km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37, 4302–4315.
Guijarro, J. A. (2019). climatol: Climate Tools (Series Homogenization and Derived Products). Retrieved from https://cran.r-project.org/package=climatol
Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G., & Jarvis, A. (2005). Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965–1978.
Holdridge (1967), [Life zone ecology]. Tropical Science Center, San José: Tropical Science Center. [Life zone ecology]: https://reddcr.go.cr/sites/default/files/centro-de-documentacion/holdridge_1966_-_life_zone_ecology.pdf
Hollister, J., & Shah, T. (2018). elevatr: Access elevation data from various APIs (Version 0.2.0). Retrieved from https://CRAN.R-project.org/package=elevatr.
Karger, D.N., Conrad, O., Bőhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W. et al. (2017) Climatologies at high resolution for the earth’s land surface areas. Scientific Data, 4, 170122.
Karger, D.N., Conrad, O., Bőhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W. et al. (2021) Climatologies at high resolution for the earth’s land surface areas. EnviDat.
Mucina, L., Divíšek, J., & Tsakalos, J.L. (2023) Europe, Ecosystems of. In: Encyclopedia of biodiversity, vol X (in print).
von Walter, H.B., & Lieth, H. (1960). Klimadiagramm-Weltatlas. VEB Gustav Fischer Verlag, Jena.