Create and process Ecological Niche Models
ENMTML(
pred_dir,
proj_dir = NULL,
result_dir = NULL,
occ_file,
sp,
x,
y,
min_occ = 10,
thin_occ = NULL,
eval_occ = NULL,
colin_var = NULL,
imp_var = FALSE,
sp_accessible_area = NULL,
pseudoabs_method,
pres_abs_ratio = 1,
part,
save_part = FALSE,
save_final = TRUE,
algorithm,
thr,
msdm = NULL,
ensemble = NULL,
extrapolation = FALSE,
cores = 1
)
character. Directory path with predictors (file formats supported are ASC, BILL, TIFF or TXT)
character. Directory path containing folders with predictors for different regions or time periods used to project models (file formats supported are ASC, BILL, TIFF, or TXT).
character. Directory path with the folder in which model results will be recorded.
NULL: Results will be recorded in a default Result folder, at the same level as the pred_dir folder.
Simple name: A folder with the specified name will be created at the same level as the pred_dir folder (e.g. usage result_dir="MyFolderName")
Complete path: A folder will be created at the specified path (e.g. result_dir="C:/Users/mypc/Documents/MyFolderName").
character. Directory path with the tab-delimited TXT file, which will contain at least three columns with information about species names, and the latitude and longitude of species occurrences.
character. Name of the column with information about species names.
character. Name of the column with information about longitude.
character. Name of the column with information about latitude.
integer. Minimum number of unique occurrences (species with less than this number will be excluded).
character. Perform spatial filtering (Thinning, based on spThin package) on the presences. For this augment it is necessary provide a vector in which its elements need to have the names 'method' or 'method' and 'distance' (more information below). Three thinning methods are available (default NULL):
MORAN: Distance defined by Moran Variogram, usage thin_occ=c(method='MORAN').
CELLSIZE: Distance defined by 2x cellsize (Haversine Transformation), usage thin_occ=c(method='CELLSIZE').
USER-DEFINED: User defined distance. For this option it is necessary to provide a vector with two values. Usage thin_occ=c(method='USER-DEFINED', distance='300'). The second numeric value refers to the distance in km that will be used for thinning. So distance=300 means that all records within a radius of 300 km will be deleted.
character. Directory path with tab-delimited TXT file with species names, latitude and longitude, these three columns must have the same columns names than the database used in the occ_file argument. This external occurrence database will be used to external models validation (i.e., it will no be use to model fitting). (default NULL).
character. Method to reduce variable collinearity:
PCA: Perform a Principal Component Analysis on predictors and use Principal Components as environmental variables, usage colin_var=c(method='PCA').
VIF: Variance Inflation Factor; usage colin_var=c(method='VIF').
PEARSON: Select variables by Pearson correlation, a threshold of maximum correlation must be specified by user, usage colin_var=c(method='PEARSON', threshold='0.7').
logical. Perform variable importance and data for curves response for selected algorithms? (default FALSE)
character. Restrict for each species the accessible area, i.e., the area used to model fitting. It is necessary to provide a vector for this argument. Three methods were implemented
BUFFER area used to model fitting delimited by a buffer with a width size equal to the maximum distance among pair of occurrences for each species. Usage sp_accessible_area=c(method='BUFFER', type='1').
BUFFER area used to model fitting delimited by a buffer with a width size defined by the user in km. Note this width size of buffer will be used for all species. Usage sp_accessible_area=c(method='BUFFER', type='2', width='300').
MASK: this method consists in delimit the area used to model fitting based on the polygon where a species occurrences fall. For instance, it is possible delimit the calibration area based on ecoregion shapefile. For this option it is necessary inform the path to the file that will be used as mask. Next file format can be loaded '.bil', '.asc', '.tif', '.shp', and '.txt'. Usage sp_accessible_area=c(method='MASK', filepath='C:/Users/mycomputer/ecoregion/olson.shp').
USER_DEFINED: users can inform their own masks for accessible area. In this situation the program requires a folder within species-specific masks, one for each species, being that the mask name must match the species name within the occurrence file.For this option it is necessary inform the path to the folder containing the accessible areas. The following file formats can be loaded '.bil', '.asc', '.tif', '.shp', and '.txt'. Usage sp_accessible_area=c(method='USER-DEFINED', filepath='C:/Users/mycomputer/accessibleareafolder').
character. Pseudo-absence allocation method. It is necessary to provide a vector for this argument. Only one method can be chosen. The next methods are implemented:
RND: Random allocation of pseudo-absences throughout the area used for model fitting. Usage pseudoabs_method=c(method='RND').
ENV_CONST: Pseudo-absences are environmentally constrained to a region with lower suitability values predicted by a Bioclim model. Usage pseudoabs_method=c(method='ENV_CONST').
GEO_CONST: Pseudo-absences are allocated far from occurrences based on a geographical buffer. For this method it is necessary provide a second value which express the buffer width in km. Usage pseudoabs_method=c(method='GEO_CONST', width='50').
GEO_ENV_CONST: Pseudo-absences are constrained environmentally (based on Bioclim model) but distributed geographically far from occurrences based on a geographical buffer. For this method it is necessary provide a second value which express the buffer width in km. Usage pseudoabs_method=c(method='GEO_ENV_CONST', width='50')
GEO_ENV_KM_CONST: Pseudo-absences are constrained on a three-level procedure; it is similar to the GEO_ENV_CONST with an additional step which distributes the pseudo-absences in the environmental space using k-means cluster analysis. For this method it is necessary provide a second value which express the buffer width in km. Usage pseudoabs_method=c(method='GEO_ENV_KM_CONST', width='50')
numeric. Presence-Absence ratio (values between 0 and 1)
character. Partition method for model's validation. Only one method can be chosen. It is necessary to provide a vector for this argument. The next methods are implemented:
BOOT: Random bootstrap partition. Usage part=c(method='BOOT', replicates='2', proportion='0.7'). 'replicate' refers to the number of replicates, it assumes a value >=1. 'proportion' refers to the proportion of occurrences used for model fitting, it assumes a value >0 and <=1. In this example proportion='0.7' mean that 70% of data will be used for model training, while 30% for model testing.
KFOLD: Random partition in k-fold cross-validation. Usage part=c(method= 'KFOLD', folds='5'). 'folds' refers to the number of folds for data partitioning, it assumes value >=1.
BANDS: Geographic partition structured as bands arranged in a latitudinal way (type 1) or longitudinal way (type 2). Usage part=c(method= 'BANDS', type='1'). 'type' refers to the bands disposition
BLOCK: Geographic partition structured as a checkerboard (a.k.a. block cross-validation). Usage part=c(method= 'BLOCK').
logical. If TRUE, function will save .tif files of partial models, i.e. model created by each occurrence partitions. (default FALSE).
logical. If TRUE, function will save .tif files of the final model, i.e. fitted with all occurrences data. (default TRUE)
character. Algorithm to construct ecological niche models (it is possible to use more than one method):
BIO: Bioclim
MAH: Mahalanobis
DOM: Domain
ENF: Ecological Niche Factor Analysis
MXS: Maxent Simple (only linear and quadratic features, based on MaxNet package)
MXD: Maxent Default (all features, based on MaxNet package)
SVM: Support Vector Machine
SVM-B: Support Vector Machine (using Background instead of Pseudo-Absences)
GLM: Generalized Linear Model
GAM: Generalizes Additive Model
BRT: Boosted Regression Tree
RDF: Random Forest
MLK: Maximum Likelihood
GAU: Gaussian Process
character. Threshold used for presence-absence predictions. It is possible to use more than one threshold type. It is necessary to provide a vector for this argument:
LPT: The highest threshold at which there is no omission. Usage thr=c(type='LPT').
MAX_TSS: Threshold at which the sum of the sensitivity and specificity is the highest. Usage thr=c(type='MAX_TSS').
MAX_KAPPA: The threshold at which kappa is the highest ("max kappa"). Usage thr=c(type='MAX_KAPPA').
SENSITIVITY: A threshold value specified by user. Usage thr=c(type='SENSITIVITY', sens='0.6'). 'sens' refers to models will be binarized using this suitability value. Note that this method assumes 'sens' value for all algorithm and species.
JACCARD: The threshold at which Jaccard is the highest. Usage thr=c(type='JACCARD').
SORENSEN: The threshold at which Sorensen is highest. Usage thr=c(type='SORENSEN').
In the case of use more than one threshold type it is necessary concatenate the names of threshold types, e.g., thr=c(type=c('LPT', 'MAX_TSS', 'JACCARD')). When SENSITIVITY threshold is used in combination with other it is necessary specify the desired sensitivity value, e.g. thr=c(type=c('LPT', 'MAX_TSS', 'SENSITIVITY'), sens='0.8')
character. Include spatial restrictions to model projection. These methods restrict ecological niche models in order to have less potential prediction and turn models closer to species distribution models. They are classified in 'a Priori' and 'a Posteriori' methods. The first one encompasses method that include geographical layers as predictor of models' fitting, whereas a Posteriori constrain models based on occurrence and suitability patterns. This argument is filled only with a method, in the case of use MCP-B method msdm is filled in a different way se below:
a Priori methods (layer created area added as a predictor at moment of model fitting):
XY: Create two layers latitude and longitude layer. Usage msdm=c(method='XY').
MIN: Create a layer with information of the distance from each cell to the closest occurrence. Usage msdm=c(method='MIN').
CML: Create a layer with information of the summed distance from each cell to all occurrences. Usage msdm=c(method='CML').
KER: Create a layer with a Gaussian-Kernel on the occurrence data. Usage msdm=c(method='KER').
a Posteriori methods
OBR: Occurrence based restriction, uses the distance between points to exclude far suitable patches (Mendes et al., in prep). Usage msdm=c(method='OBR').
LR: Lower Quantile, select the nearest 25% patches (Mendes et al., in prep). Usage msdm=c(method='LR').
PRES: Select only the patches with confirmed occurrence data (Mendes et al, in prep). Usage msdm=c(method='PRES').
MCP: Excludes suitable cells outside the Minimum Convex Polygon (MCP) built based on occurrences data. Usage msdm=c(method='MCP').
MCP-B: Creates a buffer (with a width size defined by user in km) around the MCP. Usage msdm=c(method='MCP-B', width=100). In this case width=100 means that a buffer with 100km of width will be created around the MCP.
character. Method used to ensemble different algorithms. It is possible to use more than one method. A vector must be provided for this argument. For SUP, W_MEAN or PCA_SUP method it is necessary provide an evaluation metric to ensemble arguments (i.e., AUC, Kappa, TSS, Jaccard, Sorensen or Fpb) see below. (default NULL):
MEAN: Simple average of the different models. Usage ensemble=c(method='MEAN').
W_MEAN: Weighted average of models based on their performance. An evaluation metric must be provided. Usage ensemble=c(method='W_MEAN', metric='TSS').
SUP: Average of the best models (e.g., TSS over the average). An evaluation metric must be provided. Usage ensemble=c(method='SUP', metric='TSS').
PCA: Performs a Principal Component Analysis (PCA) and returns the first axis. Usage ensemble=c(method='PCA').
PCA_SUP: PCA of the best models (e.g., TSS over the average). An evaluation metric must be provided. Usage ensemble=c(method='PCA_SUP', metric='Fpb').
PCA_THR: PCA performed only with those cells with suitability values above the selected threshold. Usage ensemble=c(method='PCA_THR').
In the case of use more than one ensemble method it is necessary concatenate the names of ensemble methods within the argument, e.g., ensemble=c(method=c('MEAN', 'PCA')), ensemble=c(method=c('MEAN, 'W_MEAN', 'PCA_SUP'), metric='Fpb')
logical. If TRUE the function will calculate extrapolation based on Mobility-Oriented Parity analysis (MOP) for current conditions. If the argument proj_dir is used, the extrapolation layers for other regions or time periods will also be calculated.
numeric. Define the number of CPU cores to run modeling procedures in parallel (default 1).
require(ENMTML)
require(raster)
##%######################################################%##
# #
#### Directories and data creation ####
# #
##%######################################################%##
# ENMTML package account with some bioclimantic variables
# used to test ENMTML function.
# In order to simulate the files and folders needed for an ENMTML function
# will be created different folders with some data
# First will be created a folder with a working directory
getwd() #' Working directory of R session
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference"
d_ex <- file.path(getwd(), 'ENMTML_example')
d_ex
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example"
dir.create(d_ex)
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example' already exists
# Will be saved some ENMTML data sets to ENMTML_example folder
# Virtual species occurrences
data("occ")
d_occ <- file.path(d_ex, 'occ.txt')
utils::write.table(occ, d_occ, sep = '\t', row.names = FALSE)
# Five bioclimatic variables for current conditions
data("env")
d_env <- file.path(d_ex, 'current_env_var')
dir.create(d_env)
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\current_env_var' already exists
raster::writeRaster(env, file.path(d_env, names(env)), bylayer=TRUE, format='GTiff')
#> Error: [writeStart] file exists. You can use 'overwrite=TRUE' to overwrite it
# Five bioclimatic variables for future conditions
# (for more details see predictors_future help)
data("env_fut")
d_fut <- file.path(d_ex, 'future_env_var')
dir.create(d_fut)
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\future_env_var' already exists
d0 <- file.path(d_fut, names(env_fut))
sapply(d0, dir.create)
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\future_env_var\2080_4.5' already exists
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\future_env_var\2080_8.5' already exists
#> C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/future_env_var/2080_4.5
#> FALSE
#> C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/future_env_var/2080_8.5
#> FALSE
raster::writeRaster(env_fut$`2080_4.5`, file.path(d0[1],
names(env_fut$`2080_4.5`)), bylayer=TRUE, format='GTiff')
#> Error: [writeStart] file exists. You can use 'overwrite=TRUE' to overwrite it
raster::writeRaster(env_fut$`2080_8.5`, file.path(d0[2],
names(env_fut$`2080_8.5`)), bylayer=TRUE, format='GTiff')
#> Error: [writeStart] file exists. You can use 'overwrite=TRUE' to overwrite it
# Polygon of terrestrial ecoregions
data("ecoregions")
d_eco <- file.path(d_ex, 'ecoregions')
dir.create(d_eco)
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\ecoregions' already exists
d_eco <- file.path(d_eco, paste0('eco','.shp'))
shapefile(ecoregions, d_eco)
#> Error in .local(x, ...): file exists, use overwrite=TRUE to overwrite it
# shell.exec(d_ex) # open the directory and folders created
rm(list = c('d0', 'd_ex', 'ecoregions', 'env', 'env_fut', 'occ'))
#> Warning: object 'ecoregions' not found
#> Warning: object 'env' not found
#> Warning: object 'env_fut' not found
#> Warning: object 'occ' not found
# Now we have the minimum data needed to create models with ENMTML package
# a directory with environmental rasters and a .txt file with occurrence
##%######################################################%##
# #
#### Construction ENM with ENMTML ####
# #
##%######################################################%##
args(ENMTML)
#> function (pred_dir, proj_dir = NULL, result_dir = NULL, occ_file,
#> sp, x, y, min_occ = 10, thin_occ = NULL, eval_occ = NULL,
#> colin_var = NULL, imp_var = FALSE, sp_accessible_area = NULL,
#> pseudoabs_method, pres_abs_ratio = 1, part, save_part = FALSE,
#> save_final = TRUE, algorithm, thr, msdm = NULL, ensemble = NULL,
#> extrapolation = FALSE, cores = 1)
#> NULL
# ENMTML provides a variety of tools to build different models
# depending on the modeling objectives.
# Here will be provided a single modeling procedure.
# For more example and exploration of models
# see <https://github.com/andrefaa/ENMTML>
# Will be fitted models for five virtual species with
# current and future conditions. Please read ENMTML arguments.
# The next object contains the directory and file path data and folders that will be used
d_occ # file path with species occurrences
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/occ.txt"
d_env # directory path with current environmental conditions (raster in tiff format)
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/current_env_var"
d_fut # directory path with folders with future environmental conditions (raster in tiff format)
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/future_env_var"
d_eco # file path with shapefile used to constrain models
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/ecoregions/eco.shp"
ENMTML(
pred_dir = d_env,
proj_dir = NULL,
result_dir = NULL,
occ_file = d_occ,
sp = 'species',
x = 'x',
y = 'y',
min_occ = 10,
thin_occ = NULL,
eval_occ = NULL,
colin_var = c(method='PCA'),
imp_var = FALSE,
sp_accessible_area = c(method='BUFFER', type='2', width='500'),
pseudoabs_method = c(method = 'RND'),
pres_abs_ratio = 1,
part=c(method= 'KFOLD', folds='2'),
save_part = FALSE,
save_final = TRUE,
algorithm = c('SVM', 'RDF', 'MXD'),
thr = c(type='MAX_TSS'),
msdm = NULL,
ensemble = c(method='PCA'),
extrapolation = FALSE,
cores = 1
)
#> Checking for function arguments ...
#> Loading environmental variables ...
#> RasterBrick successfully created!
#> Performing a reduction of variables collinearity ...
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\current_env_var\PCA' already exists
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\current_env_var\PCA\Tables' already exists
#> Warning: 'C:\Users\santi\Documents\GitHub\ENMTML_RStudio\docs\reference\ENMTML_example\Projection_PCA' already exists
#> Loading and processing species occurrence data ...
#> Warning: Result folder already exists, files may be overwritten!
#> Results can be found at:
#> C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result
#> Generating masks for species acessible area ...
#> GeoMasks already exist for all species! Using already-created restrictions!
#> Performing partition of species occurrence data ...
#> Adjusting fold....1
#> Total species to be modeled 5
#> Fitting Models....
#> Models fitted!
#> Adjusting fold....2
#> Total species to be modeled 5
#> Fitting Models....
#> Models fitted!
#> Performing Ensemble....
#> Ensemble created!
#> Models were created successfully!
#> Outputs are in:
#> C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result
# ENMTML function will create a folder named Result a directory
# prior to the directory specified in the pred_dir argument
d_env # Directory used to define environmental variables
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/current_env_var"
d_rslt <- file.path(dirname(d_env), 'Result')
d_rslt
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result"
# shell.exec(d_rslt) # for Windows users
# List of txt files and subdirectories
list.files(d_rslt)
#> [1] "Algorithm" "CrossValidationGroups.txt"
#> [3] "CrossValidation_Moran_MESS.txt" "Ensemble"
#> [5] "Evaluation_Table.txt" "Extent_Masks"
#> [7] "InfoModeling.txt" "Number_Unique_Occurrences.txt"
#> [9] "Occurrences_Cleaned.txt" "Occurrences_Evaluation.txt"
#> [11] "Occurrences_Fitting.txt" "Thresholds_Algorithms.txt"
#> [13] "Thresholds_Ensemble.txt"
list.dirs(d_rslt)
#> [1] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result"
#> [2] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Algorithm"
#> [3] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Algorithm/MXD"
#> [4] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Algorithm/MXD/MAX_TSS"
#> [5] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Algorithm/RDF"
#> [6] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Algorithm/RDF/MAX_TSS"
#> [7] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Algorithm/SVM"
#> [8] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Algorithm/SVM/MAX_TSS"
#> [9] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Ensemble"
#> [10] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Ensemble/PCA"
#> [11] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Ensemble/PCA/MAX_TSS"
#> [12] "C:/Users/santi/Documents/GitHub/ENMTML_RStudio/docs/reference/ENMTML_example/Result/Extent_Masks"