We structured ENMTML as a single function with multiple arguments, which, once filled, require a single Ctrl+R to fit, project, evaluate models and present them to users in a clear and simple way.
The main function (ENMTML) has several arguments, which user’s need to specify according to their modeling needs.
As we know this is not an simple task, we indicate the papers which proposed those methods in our paper. Coupled with a better (but brief) explanation on those.
ENMTML(pred_dir,
proj_dir = NULL,
result_dir = NULL,
occ_file,
sp,
x,
y,
min_occ = 10,
thin_occ = NULL,
eval_occ = NULL,
colin_var = NULL,
imp_var = FALSE,
sp_accessible_area = NULL,
pseudoabs_method,
pres_abs_ratio = 1,
part, save_part = FALSE,
save_final = TRUE,
algorithm,
thr,
msdm = NULL,
ensemble = NULL,
extrapolation = FALSE,
cores = 1)
See possible input options below
pred_dir: character. Directory path with predictors (file formats supported are ASC, BILL, TIFF or TXT)
proj_dir: character. Directory path containing folders with predictors for different regions or time periods used to project models (file formats supported are ASC, BILL, TIFF, or TXT).
result_dir: character. Directory path with the folder in which model results will be recorded:
result_dir=NULL
).result_dir="MyFolderName"
)result_dir="C:/Users/mypc/Documents/MyFolderName"
).occ_file: character. Directory path with the tab-delimited TXT file, which will contain at least three columns with information about species names, and the latitude and longitude of species occurrences.
sp: character. Name of the column with information about species names.
x: character. Name of the column with information about longitude.
y: character. Name of the column with information about latitude.
min_occ: integer. Minimum number of unique occurrences (species with less than this number will be excluded).
thin_occ: character. Perform spatial filtering
(Thinning, based on spThin package) on the presences. For this augment
it is necessary provide a vector in which its elements need to have the
names ‘method’ or ‘method’ and ‘distance’ (more information below).
Three thinning methods are available (default NULL
):
thin_occ=c(method='MORAN')
.thin_occ=c(method='CELLSIZE')
.thin_occ=c(method='USER-DEFINED', ditance='300')
.
The second numeric value refers to the distance in km that will be used
for thinning. So distance=300 means that all records within a radius of
300 km will be deleted.eval_occ: character. Directory path with
tab-delimited TXT file with species names, latitude and longitude, these
three columns must have the same columns names than the database used in
the occ_file
argument. This external occurrence database
will be used to external models validation (i.e., it will no be use to
model fitting). (default NULL
).
colin_var: character. Method to reduce variable collinearity:
colin_var=c(method='PCA')
.colin_var=c(method='VIF')
.colin_var=c(method='PEARSON', threshold='0.7')
.imp_var: logical. Perform variable importance and data for curves response for selected algorithms? (default FALSE)
sp_accessible_area: character. Restrict for each species the accessible area, i.e., the area used to model fitting. It is necessary to provide a vector for this argument. Three methods were implemented
sp_accessible_area=c(method='BUFFER', type='1')
.sp_accessible_area=c(method='BUFFER', type='2', width='300')
.sp_accessible_area=c(method='MASK', filepath='C:/Users/mycomputer/ecoregion/olson.shp')
..pseudoabs_method: character. Pseudo-absence allocation method. It is necessary to provide a vector for this argument. Only one method can be chosen. The next methods are implemented:
pseudoabs_method=c(method='RND')
.pseudoabs_method=c(method='ENV_CONST')
.pseudoabs_method=c(method='GEO_CONST', width='50')
.pseudoabs_method=c(method='GEO_ENV_CONST', width='50')
pseudoabs_method=c(method='GEO_ENV_KM_CONST', width='50')
.pres_abs_ratio: numeric. Presence-Absence ratio (values between 0 and 1).
part: character. Partition method for model’s validation. Only one method can be chosen. It is necessary to provide a vector for this argument. The next methods are implemented:
part=c(method='BOOT', replicates='2', proportion='0.7')
.
replicate
refers to the number of replicates, it assumes a
value >=1. proportion
refers to the proportion of
occurrences used for model fitting, it assumes a value >0 and <=1.
In this example proportion='0.7'
mean that 70% of data will
be used for model training, while 30% for model testing.part=c(method= 'KFOLD', folds='5')
.
folds
refers to the number of folds for data partitioning,
it assumes value >=1.part=c(method= 'BANDS', type='1')
.
type
refers to the bands disposition.part=c(method= 'BLOCK')
.save_part: logical. If TRUE
,
function will save .tif files of partial models, i.e. model created by
each occurrence partitions. (default FALSE
).
save_final: logical. If TRUE
,
function will save .tif files of the final model, i.e. fitted with all
occurrences data. (default TRUE
)
algorithm: character. Algorithm to construct ecological niche models (it is possible to use more than one method):
thr: character. Threshold used for presence-absence predictions. It is possible to use more than one threshold type. It is necessary to provide a vector for this argument:
thr=c(type='LPT')
.thr=c(type='MAX_TSS')
.thr=c(type='MAX_KAPPA')
.thr=c(type='SENSITIVITY', sens='0.6')
.
‘sens’ refers to models will be binarized using this suitability value.
Note that this method assumes ‘sens’ value for all algorithm and
species.thr=c(type='JACCARD')
.thr=c(type='SORENSEN')
.In the case of use more than one threshold type it is necessary
concatenate the names of threshold types, e.g.,
thr=c(type=c('LPT', 'MAX_TSS', 'JACCARD'))
.
When SENSITIVITY
threshold is used in combination with
other it is necessary specify the desired sensitivity value, e.g.,
thr=c(type=c('LPT', 'MAX_TSS', 'SENSITIVITY'), sens='0.8')
.
msdm: character. Include spatial restrictions to
model projection. These methods restrict ecological niche models in
order to have less potential prediction and turn models closer to
species distribution models. They are classified in ‘a Priori’ and ‘a
Posteriori’ methods. The first one encompasses method that include
geographical layers as predictor of models’ fitting, whereas a
Posteriori constrain models based on occurrence and suitability
patterns. This argument is filled only with a method, in the case of use
MCP-B method msdm is filled in a different way se below (default
NULL
):
a Priori methods (layer created area added as a predictor at moment of model fitting):
msdm=c(method='XY')
.msdm=c(method='MIN')
.msdm=c(method='CML')
.msdm=c(method='KER')
.
a Posteriori methods:
msdm=c(method='OBR')
.msdm=c(method='LR')
.msdm=c(method='PRES')
.msdm=c(method='MCP')
.msdm=c(method='MCP-B', width=100)
.
In this case width=100
means that a buffer with 100km of
width will be created around the MCP.ensemble: character. Method used to ensemble different algorithms. It is possible to use more than one method. A vector must be provided for this argument. For SUP, W_MEAN or PCA_SUP method it is necessary provide an evaluation metric to ensemble arguments (i.e., AUC, Kappa, TSS, Jaccard, Sorensen or Fpb) see below. (default NULL):
ensemble=c(method='MEAN')
.
ensemble=c(method='W_MEAN', metric='TSS')
.ensemble=c(method='SUP', metric='TSS')
.ensemble=c(method='PCA')
.ensemble=c(method='PCA_SUP', metric='Fpb')
.ensemble=c(method='PCA_THR')
.In the case of use more than one ensemble method it is necessary
concatenate the names of ensemble methods within the argument, e.g.,
ensemble=c(method=c('MEAN', 'PCA'))
,
ensemble=c(method=c('MEAN, 'W_MEAN', 'PCA_SUP'), metric='Fpb')
.
extrapolation logical. If TRUE the function will
calculate extrapolation based on Mobility-Oriented Parity analysis (MOP)
for current conditions. If the argument proj_dir is used, the
extrapolation layers for other regions or time periods will also be
calculated (default FALSE
).
cores numeric. Define the number of CPU cores to
run modeling procedures in parallel (default 1
).
Within the result_dir folder you will find several sub-folders: Algorithm, Ensemble(decision-based), Projection(decision-based), Extrapolation(decision-based), BLOCK(decision-based), Extent Masks(decision-based).
There are also some .txt files (some txt will only be
created under ceratin modeling settings):
Evaluation_Table.txt Contains the results for model
evaluation, with several metrics
InfoModeling.txt Information of the chosen modeling
parameters
Number_Unique_Occurrences.txt Number of unique
occurrences for each species
Occurrences_Cleaned.txt Dataset produced after
selecting a single occurrence per grid-cell(unique
occurrences)
Occurrences_Filtered.txt Datasets produced after
occurrences were corrected for sampling spatial bias (thinned
occurrences)
Thresholds_Algorithm.txt Information about the
thresholds used to create the presence-absence maps for each algorithm
(Presence-absence maps are created from the Threshold of complete
models)
Thresholds_Ensemble.txt Information about the
thresholds used to create the presence-absence maps for ensembled
models
**Moran_&_Mess** Contains information about autocorrelation and
environmental similatiry between the datasets used to fit and evaluate
the model
Andrade, A.F.A., Velazco, S.J.E., De Marco Jr, P., 2020. ENMTML: An R package for a straightforward construction of complex ecological niche models. Environmental Modelling & Software 125, 104615. https://doi.org/10.1016/j.envsoft.2019.104615
Test the package and give us feedback here or send an e-mail to andrefaandrade@gmail.com or sjevelazco@gmail.com!