Implementing a classic pixel based supervised classification/supervised learning of Sentinel-2 multispectral images for an area in central Romania. I used the framework provided by the caret R package (“Classification and Regression Training”). I compered predictions from Random Forests with those from SVM and Neural Networks.
Bands used: B2, B3, B4, B5, B6, B7, B8, B8A, B11 and B12, so ten out of the total thirteen spectral bands. I didn’t use the 60 m bands “Band 1 - Coastal aerosol”, “Band 9 - Water vapour” & “Band 10 - SWIR - Cirrus” (wiki table), which carry information about atmospheric water content. There was no further thought invested into what combinations of bands to use.
Downloaded the “Level-2A” processed data sets (Level-2A = bottom-of-atmosphere reflectance in cartographic geometry). For downloading I used the framework provided by the R package getSpatialData, which connects to the Copernicus Open Access Hub.
Digitization of training polygons was carried in QGIS using the Bing aerial imagery and the True Color Image (TCI) at 10 m resulution in background for visual validation. Five classes were identified for this exercise: agriculture, construction, forest, pasture and water.
Disclaimer: This is work in progress based on my own exploration with the methods. Treat this content as a blog post and nothing more. It does not have the pretention to be an exhaustive exercise nor a replacement for your critical thinking. If you find mistakes/bottle necks/bugs in my work, please let me know by opening an issue or can also let me know on twitter.
# Packages for spatial data processing & visualization
library(rgdal)
library(gdalUtils)
library(raster)
library(sf)
library(sp)
library(RStoolbox)
library(getSpatialData)
library(rasterVis)
library(mapview)
library(RColorBrewer)
library(plotly)
library(grDevices)
# Machine learning packages
library(caret)
library(randomForest)
library(ranger)
library(MLmetrics)
library(nnet)
library(NeuralNetTools)
library(LiblineaR)
# Packages for general data processing and parallel computation
library(data.table)
library(dplyr)
library(stringr)
library(doParallel)
library(snow)
library(parallel)
# set the temporary folder for raster package operations
rasterOptions(tmpdir = "./cache/temp")
## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] parallel grid stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] snow_0.4-3 doParallel_1.0.14 iterators_1.0.10
## [4] foreach_1.4.4 stringr_1.4.0 dplyr_0.8.0.1
## [7] data.table_1.12.0 LiblineaR_2.10-8 NeuralNetTools_1.5.2
## [10] nnet_7.3-12 MLmetrics_1.1.1 ranger_0.11.2
## [13] randomForest_4.6-14 caret_6.0-81 plotly_4.8.0
## [16] ggplot2_3.1.0 mapview_2.6.3 rasterVis_0.45
## [19] latticeExtra_0.6-28 RColorBrewer_1.1-2 lattice_0.20-38
## [22] getSpatialData_0.0.4 RStoolbox_0.2.4 sf_0.7-3
## [25] gdalUtils_2.0.1.14 rgdal_1.4-3 ncdf4_1.16.1
## [28] raster_2.8-19 sp_1.3-1 png_0.1-7
## [31] rmarkdown_1.12 epuRate_0.1
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.4-1 class_7.3-14 leaflet_2.0.2
## [4] rprojroot_1.3-2 satellite_1.0.1 base64enc_0.1-3
## [7] fs_1.2.7 hexbin_1.27.2 remotes_2.0.2
## [10] prodlim_2018.04.18 lubridate_1.7.4 xml2_1.2.0
## [13] codetools_0.2-15 splines_3.5.2 R.methodsS3_1.7.1
## [16] knitr_1.22 pkgload_1.0.2 jsonlite_1.6
## [19] R.oo_1.22.0 rgeos_0.4-2 shiny_1.2.0
## [22] compiler_3.5.2 httr_1.4.0 backports_1.1.3
## [25] assertthat_0.2.1 Matrix_1.2-15 lazyeval_0.2.2
## [28] cli_1.1.0 later_0.8.0 htmltools_0.3.6
## [31] prettyunits_1.0.2 tools_3.5.2 gtable_0.2.0
## [34] glue_1.3.1 reshape2_1.4.3 Rcpp_1.0.1
## [37] nlme_3.1-137 crosstalk_1.0.0 timeDate_3043.102
## [40] gower_0.2.0 xfun_0.5 ps_1.3.0
## [43] mime_0.6 devtools_2.0.1 XML_3.98-1.19
## [46] zoo_1.8-5 getPass_0.2-2 MASS_7.3-51.1
## [49] scales_1.0.0 ipred_0.9-8 promises_1.0.1
## [52] mapedit_0.5.0 yaml_2.2.0 memoise_1.1.0
## [55] geosphere_1.5-7 rpart_4.1-13 stringi_1.4.3
## [58] desc_1.2.0 e1071_1.7-1 pkgbuild_1.0.3
## [61] lava_1.6.5 rlang_0.3.2 pkgconfig_2.0.2
## [64] evaluate_0.13 purrr_0.3.2 recipes_0.1.5
## [67] htmlwidgets_1.3 processx_3.3.0 tidyselect_0.2.5
## [70] plyr_1.8.4 magrittr_1.5 R6_2.4.0
## [73] generics_0.0.2 DBI_1.0.0 pillar_1.3.1
## [76] withr_2.1.2 units_0.6-2 survival_2.43-3
## [79] tibble_2.1.1 crayon_1.3.4 usethis_1.4.0
## [82] callr_3.2.0 ModelMetrics_1.2.2 digest_0.6.18
## [85] classInt_0.3-1 webshot_0.5.1 xtable_1.8-3
## [88] tidyr_0.8.3 httpuv_1.5.0 R.utils_2.8.0
## [91] stats4_3.5.2 munsell_0.5.0 viridisLite_0.3.0
## [94] sessioninfo_1.1.1
If you want to install a snapshot of the packages as they existed on CRAN at the creation of this document, then can use the checkpoint package:
# /////////////////////////////////////////////////////////////////////////
#
# Create a checkpoint at "2019-03-23".
#
# This should guarantee the same versions of the packages as they existed at the
# specified point in time.
#
# Scans for packages in the project folder and its subfolder. It scans all R code
# (.R, .Rmd, and .Rpres files) for `library()` and `require()` statements. Then
# creates a local library into which it installs a copy of the packages required
# in the project as they existed on CRAN at the specified snapshot date. See
# details with ?checkpoint after you load library(checkpoint), or here:
# https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html
#
# Warning: Installing older versions of the packages in the .checkpoint local
# library, may take up some hundreds of MG in space.
#
# /////////////////////////////////////////////////////////////////////////
# install.packages("checkpoint")
library(checkpoint) # checkpoint, version 0.4.5
checkpoint(snapshotDate = "2019-03-23")
# Check that library path is set to ~/.checkpoint/2019-03-23/ ...
.libPaths()
grepl(pattern = "\\.checkpoint/2019-03-23/", x = .libPaths()[1]) # should be TRUE
# Optional checks ---------------------------------------------------------
# Check that CRAN mirror is set to MRAN snapshot
getOption("repos")
# At first run, after installing the packages, should see something like:
# nowosaddrat
# "https://mran.microsoft.com/snapshot/2019-03-23" "https://nowosad.github.io/drat/"
# Check which packages are installed in checkpoint library
installed.packages(.libPaths()[1])[, "Package"]
# Experimental - use unCheckpoint() to reset .libPaths to the default user library.
# Note that this does not undo any of the other side-effects. Specifically, all
# loaded packages remain loaded, and the value of getOption("repos") remains
# unchanged. See details with ?unCheckpoint
Define an area of interest (AOI). Here I used a matrix that defines the corners of a square (unprojected coordinates in WGS84 datum):
# long lat
aoi <- matrix(data = c(22.85, 45.93, # Upper left corner
22.95, 45.93, # Upper right corner
22.95, 45.85, # Bottom right corner
22.85, 45.85, # Bottom left corner
22.85, 45.93), # Upper left corner - closure
ncol = 2, byrow = TRUE)
set_aoi(aoi)
## Warning: Argument 'aoi' is a matrix, assuming '+proj=longlat +datum=WGS84
## +no_defs' projection.
Note that, in case you have a shapefile of the aoi, you can then use it instead of building the matrix as above. You can read it with rgdal::readOGR()
or sf::st_read()
, example:
First, you have to register to the Copernicus Open Access Hub (COAH). Their user guide illustrates here how to just do that.
# Set login credentials
login_CopHub(username = "happy_user", password = "something-more-than-1234")
# Set the archive directory (where the raw data will be downloaded)
set_archive("./data")
Search for data and filter the records:
records <- getSentinel_query(time_range = c("2017-05-15",
as.character(Sys.Date())),
platform = "Sentinel-2")
records_filtered <- records %>%
filter(processinglevel == "Level-2A") %>%
filter(cloudcoverpercentage <= 1)
Preview records on a mapview map with the AOI. I was hoping to get good data for the lowest cloud coverage, but when visualizing it, I realized that data is incomplete. I am better off with some other record with very low cloud coverage. This proves that one needs to visually check the data in order to be sure one gets usable records.
idx_min_clowd_cov <- which.min(records_filtered$cloudcoverpercentage)
getSentinel_preview(record = records_filtered[idx_min_clowd_cov,])
getSentinel_preview(record = records_filtered[2,])
Get and uzip the data:
datasets <- getSentinel_data(records = records_filtered[2,])
## Downloading 'S2B_MSIL2A_20181016T092029_N0209_R093_T34TFR_20181016T143428' to
## './data//get_data/Sentinel-2//S2B_MSIL2A_20181016T092029_N0209_R093_T34TFR_20181016T143428.zip'...
## |====================================================================| 100%
## Successfull download, MD5 check sums match.
## All downloads have been succesfull (1 attempts).
unzip(zipfile = datasets, exdir = "./data")
Read and crop the raster bands: B2, B3, B4, B5, B6, B7, B8, B8A, B11 and B12:
# Path to jp2 files
jp2_path <- "./data/S2B_MSIL2A_20181016T092029_N0209_R093_T34TFR_20181016T143428.SAFE/GRANULE/L2A_T34TFR_A008413_20181016T092720/IMG_DATA/"
# From the folder "R10m", read B02, B03, B04 and B08. They are the only files containing "B".
jp2_10m <- list.files(paste0(jp2_path, "R10m"), pattern = ".*B.*jp2$", full.names = TRUE)
# From the folder "R20m", read B5, B6, B7, B8A, B11 and B12.
jp2_20m <- list.files(paste0(jp2_path, "R20m"), pattern = ".*B.*[56781].*jp2$", full.names = TRUE)
# Read all bands
rst_lst <- lapply(c(jp2_10m, jp2_20m), FUN = raster)
Prepare the extent for cropping from AOI. Don’t forget to project the coordinates to the CRS of the sentinel raster datasets, which is +proj=utm +zone=34 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0.
extent_crop <- aoi %>%
project(proj = proj4string(rst_lst[[1]])) %>%
apply(MARGIN = 2, FUN = range) %>% # get the range for the two columns - long and lat
t %>% # transpose so to get the 2x2 matrix expected by extent()
extent # make extent from 2x2 matrix (first row: xmin, xmax; second row: ymin, ymax)
Crop the bands
rst_crop_lst <- lapply(rst_lst, FUN = raster::crop, y = extent_crop)
# Short name for each bands (each element of the list of cropped bands)
names(rst_crop_lst) <- str_extract(sapply(rst_crop_lst, names), "B.{2}")
Visualize the crop in Natural Color (R = Red, G = Green, B = Blue).
## Warning in rasterCheckSize(x, maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ;
## the supplied Raster* has 722879
## ... decreasing Raster* resolution to 5e+05 pixels
## to view full resolution set 'maxpixels = 722879 '
Save the cropped bands so that we can delete the raw original ones in order to save some space (this is optional, but I need the space…). For the purpose of being able to build faster this document I cached the results and are read in the setup chunk at the top of the rmd file. Also I can’t actually be expected to download data from the Copernicus API every time I build this document from its rmarkdown file :)
The 20 m bands were resampled to 10 m. Resampling is needed because otherwise one cannot build a brick
raster object to be used for predictions - see visualize classifications.
rst_for_prediction <- vector(mode = "list", length = length(rst_crop_lst))
names(rst_for_prediction) <- names(rst_crop_lst)
for (b in c("B05", "B06", "B07", "B8A", "B11", "B12")){
beginCluster(n = round(3/4 * detectCores()))
try(
rst_for_prediction[[b]] <- raster::resample(x = rst_crop_lst[[b]],
y = rst_crop_lst$B02)
)
endCluster()
}
b_10m <- c("B02", "B03", "B04", "B08")
rst_for_prediction[b_10m] <- rst_crop_lst[b_10m]
brick_for_prediction <- brick(rst_for_prediction)
This process involves subtracting the mean and dividing by the standard deviation for each variable/feature/band. Is not that data needs normalization as in the case of linear regression which assumes a Gaussian distribution of the response/predicted variable, but the transformation seems to be important for the ML algorithms, especially for the neural networks as mentioned here, or depicted here.
Read the training polygons.
poly <- rgdal::readOGR(dsn = "./data/train_polys",
layer = "train_polys",
stringsAsFactors = FALSE)
# Need to have a numeric id for each class - helps with rasterization later on.
poly@data$id <- as.integer(factor(poly@data$class))
setDT(poly@data)
Plot them:
# Prepare colors for each class.
cls_dt <- unique(poly@data) %>%
arrange(id) %>%
mutate(hex = c(agriculture = "#ff7f00",
construction = "#e41a1c",
forest = "#4daf4a",
pasture = "#984ea3",
water = "#377eb8"))
view_aoi(color = "#a1d99b") +
mapView(poly, zcol = "class", col.regions = cls_dt$hex)
The polygons need to be projected using the Sentinel’s CRS.
Rasterize the polygons to 10 m resolution, convert the raster to points and then use them to extract values from the Sentinel bands.
# Create raster template
template_rst <- raster(extent(rst_crop_lst$B02), # band B2 has resolution 10 m
resolution = 10,
crs = projection(rst_crop_lst$B02))
poly_utm_rst <- rasterize(poly_utm, template_rst, field = 'id')
poly_dt <- as.data.table(rasterToPoints(poly_utm_rst))
setnames(poly_dt, old = "layer", new = "id_cls")
points <- SpatialPointsDataFrame(coords = poly_dt[, .(x, y)],
data = poly_dt,
proj4string = poly_utm_rst@crs)
If the data.table
syntax is too foreign for you, then can check the dplyr
syntax tab.
dt <- brick_for_prediction_norm %>%
extract(y = points) %>%
as.data.table %>%
.[, id_cls := points@data$id_cls] %>% # add the class names to each row
merge(y = unique(poly@data), by.x = "id_cls", by.y = "id", all = TRUE, sort = FALSE) %>%
.[, id_cls := NULL] %>% # this column is extra now, delete it
.[, class := factor(class)]
# View the first 6 rows
head(dt)
dt2 <- brick_for_prediction_norm %>%
extract(y = points) %>%
as.data.frame %>%
mutate(id_cls = points@data$id_cls) %>% # add the class names to each row
left_join(y = unique(poly@data), by = c("id_cls" = "id")) %>%
mutate(id_cls = NULL) %>% # this column is extra now, delete it
mutate(class = factor(class))
setDT(dt2)
identical(dt, dt2)
## [1] TRUE
dt %>%
select(-"class") %>%
melt(measure.vars = names(.)) %>%
ggplot() +
geom_histogram(aes(value)) +
geom_vline(xintercept = 0, color = "gray70") +
facet_wrap(facets = vars(variable), ncol = 3)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The training dataset will be used for model tuning by cross-validation and grid search. Then will use the final tuned models on the test dataset to build confusion matrices (as showed in the intro vignette of caret package, here).
set.seed(321)
# A stratified random split of the data
idx_train <- createDataPartition(dt$class,
p = 0.7, # percentage of data as training
list = FALSE)
dt_train <- dt[idx_train]
dt_test <- dt[-idx_train]
table(dt_train$class)
##
## agriculture construction forest pasture water
## 4572 2771 3762 2388 1169
##
## agriculture construction forest pasture water
## 1959 1187 1612 1023 500
The training dataset is used for carrying cross-validation (CV) and grid search for model tuning. Once the optimal/best parameters were found a final model is fit to the entire training dataset using those findings. Further we can check how these final models behave on unseen data (the testing dataset).
Details are provided in the intro vignette of caret package, here and also in this documentation of scikit-learn.
The CV indices need to match when comparing multiple models, so to get a fair comparison. Therefore, folds
will pass to trainControl
argument for each type of model. See also the example from help(trainControl)
.
# create cross-validation folds (splits the data into n random groups)
n_folds <- 10
set.seed(321)
folds <- createFolds(1:nrow(dt_train), k = n_folds)
# Set the seed at each resampling iteration. Useful when running CV in parallel.
seeds <- vector(mode = "list", length = n_folds + 1) # +1 for the final model
for(i in 1:n_folds) seeds[[i]] <- sample.int(1000, n_folds)
seeds[n_folds + 1] <- sample.int(1000, 1) # seed for the final model
Note that, for each model, in trainControl
we need to provide the followings:
ctrl <- trainControl(summaryFunction = multiClassSummary,
method = "cv",
number = n_folds,
search = "grid",
classProbs = TRUE, # not implemented for SVM; will just get a warning
savePredictions = TRUE,
index = folds,
seeds = seeds)
All in all, this is important for being able to compare the different type of models; see details in the chapter “Selecting models: a case study in churn prediction” in Data Camp - Machine Learning Toolbox.
Models can be also tuned manually. caret
does it automatically anyways, choosing some default values for us. Manual tuning means picking some desired values for tuning the model parameters instead the default ones. So, you can have more fine-grained control over the tuning parameters.
Manual tuning is done via the tuneGrid
argument, and of course, differs from model to model, e.g.:
mtry
parameter (the number of randomly selected predictors, k, to choose from at each split);cost
and Loss
function;size
and decay
See details in Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). New York: Springer or Hastie, T., James, G., Tibshirani, R., & Witten, D. (2013). An introduction to statistical learning with applications in R.
List of available models with caret
- train Models By Tag
# Register a doParallel cluster, using 3/4 (75%) of total CPU-s
cl <- makeCluster(3/4 * detectCores())
registerDoParallel(cl)
model_rf <- caret::train(class ~ . , method = "rf", data = dt_train,
importance = TRUE, # passed to randomForest()
# run CV process in parallel;
# see https://stackoverflow.com/a/44774591/5193830
allowParallel = TRUE,
tuneGrid = data.frame(mtry = c(2, 3, 4, 5, 8)),
trControl = ctrl)
stopCluster(cl); remove(cl)
# Unregister the doParallel cluster so that we can use sequential operations
# if needed; details at https://stackoverflow.com/a/25110203/5193830
registerDoSEQ()
saveRDS(model_rf, file = "./cache/model_rf.rds")
Tuning here was done via the mtry
argument, which can vary from 2 up to total number of predictors (bands) used (here, 10).
So, the optimization was done via cross validation and grid search (here by grid I refer to tuneGrid = data.frame(mtry = c(2, 3, 4, 5, 8))
). The final/optimal, model stored in model_rf,
corresponds to mtry
= 5 with the highest accuracy = 0.956.
## user system elapsed
## 16.57 0.51 73.96
Compute the confusion matrix and associated statistics using the test data. A confusion matrix indicates how “confused” the model is between the given classes and highlights instances in which one class is confused for another. The main (first) diagonal of the matrix shows the cases when the model is “correct” (Data Camp - Machine Learning Toolbox).
class
factor - agriculture);See also this Wikipedia link, this one or help(confusionMatrix)
for more details on confusion matrix terminology.
## Confusion Matrix and Statistics
##
## Reference
## Prediction agriculture construction forest pasture water
## agriculture 1935 6 1 47 0
## construction 0 1181 0 0 0
## forest 1 0 1610 0 0
## pasture 23 0 1 976 0
## water 0 0 0 0 500
##
## Overall Statistics
##
## Accuracy : 0.9874
## 95% CI : (0.9843, 0.99)
## No Information Rate : 0.3119
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9836
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: agriculture Class: construction Class: forest
## Sensitivity 0.9877 0.9949 0.9988
## Specificity 0.9875 1.0000 0.9998
## Pos Pred Value 0.9729 1.0000 0.9994
## Neg Pred Value 0.9944 0.9988 0.9996
## Prevalence 0.3119 0.1890 0.2566
## Detection Rate 0.3081 0.1880 0.2563
## Detection Prevalence 0.3167 0.1880 0.2565
## Balanced Accuracy 0.9876 0.9975 0.9993
## Class: pasture Class: water
## Sensitivity 0.9541 1.00000
## Specificity 0.9954 1.00000
## Pos Pred Value 0.9760 1.00000
## Neg Pred Value 0.9911 1.00000
## Prevalence 0.1629 0.07961
## Detection Rate 0.1554 0.07961
## Detection Prevalence 0.1592 0.07961
## Balanced Accuracy 0.9747 1.00000
You can also get a confusion matrix for the final model using the entire train dataset. This is different from the approach above. However, you would usually want to see a confusion matrix based on the testing dataset. In both cases one can see that the model is the most confused about distinguishing between the pasture and agriculture land use classes. On the other hand, the model “seems super certain” when classifying water.
##
## Call:
## randomForest(x = x, y = y, mtry = param$mtry, importance = TRUE, allowParallel = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 5
##
## OOB estimate of error rate: 1.45%
## Confusion matrix:
## agriculture construction forest pasture water class.error
## agriculture 4506 3 5 58 0 0.014435696
## construction 18 2751 2 0 0 0.007217611
## forest 0 0 3762 0 0 0.000000000
## pasture 124 0 2 2262 0 0.052763819
## water 0 0 0 0 1169 0.000000000
Simple rule: higher values mean the variables are more important.
So, band 11 seems to be very important for detecting pasture patches.
For more in depth details about predictor importance, check the work of Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). New York: Springer, specifically, chapters 18 Measuring Predictor Importance and 8.5 Random Forests. Also check this Stack Overflow link.
Some selected ideas:
From the R syntax point of view, there are several ways to get importance values and metrics for each band:
caret::varImp()
, generic method, so will work also for svm and nnet models;randomForest::importance()
& randomForest::varImpPlot
, work only for randomForest
modelsNote that, by default the scale
argument in both caret::varImp()
and randomForest::importance()
is set to TRUE. However, in caret::varImp()
it means that the importance values are scaled between 0 and 100 %, while this doesn’t seem to be true in the case of randomForest::importance()
. One way or another, the conveyed information is the same, as you can see from the two heatmaps below. To get identical results, set scale = FALSE
in caret::varImp()
.
caret::varImp(model_rf)$importance %>%
as.matrix %>%
plot_ly(x = colnames(.), y = rownames(.), z = ., type = "heatmap",
width = 350, height = 300)
“L2 Regularized Support Vector Machine (dual) with Linear Kernel”. To try other SVM options see SVM tags.
Note that, importance = TRUE
is not applicable anymore, so I didn’t mentioned it in train()
. Same for class probabilities classProbs = TRUE
defined in ctrl
above. However, I didn’t bother to make another ctrl
object for SVM, so it works to recycle the one used for the random forests models with ignoring the warning: Class probabilities were requested for a model that does not implement them.
# Grid of tuning parameters
svm_grid <- expand.grid(cost = c(0.2, 0.5, 1),
Loss = c("L1", "L2"))
cl <- makeCluster(3/4 * detectCores())
registerDoParallel(cl)
model_svm <- caret::train(class ~ . , method = "svmLinear3", data = dt_train,
allowParallel = TRUE,
tuneGrid = svm_grid,
trControl = ctrl)
stopCluster(cl); remove(cl)
registerDoSEQ()
# Warning message:
# In train.default(x, y, weights = w, ...) :
# Class probabilities were requested for a model that does not implement them
# (see why above)
saveRDS(model_svm, file = "./cache/model_svm.rds")
## user system elapsed
## 1.44 0.31 16.55
# The confusion matrix using the test dataset
cm_svm <- confusionMatrix(data = predict(model_svm, newdata = dt_test),
dt_test$class)
cm_svm
## Confusion Matrix and Statistics
##
## Reference
## Prediction agriculture construction forest pasture water
## agriculture 1818 13 0 90 0
## construction 5 1173 0 0 0
## forest 1 0 1612 3 0
## pasture 135 1 0 930 0
## water 0 0 0 0 500
##
## Overall Statistics
##
## Accuracy : 0.9605
## 95% CI : (0.9554, 0.9652)
## No Information Rate : 0.3119
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9487
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: agriculture Class: construction Class: forest
## Sensitivity 0.9280 0.9882 1.0000
## Specificity 0.9762 0.9990 0.9991
## Pos Pred Value 0.9464 0.9958 0.9975
## Neg Pred Value 0.9677 0.9973 1.0000
## Prevalence 0.3119 0.1890 0.2566
## Detection Rate 0.2894 0.1868 0.2566
## Detection Prevalence 0.3058 0.1875 0.2573
## Balanced Accuracy 0.9521 0.9936 0.9996
## Class: pasture Class: water
## Sensitivity 0.9091 1.00000
## Specificity 0.9741 1.00000
## Pos Pred Value 0.8724 1.00000
## Neg Pred Value 0.9822 1.00000
## Prevalence 0.1629 0.07961
## Detection Rate 0.1481 0.07961
## Detection Prevalence 0.1697 0.07961
## Balanced Accuracy 0.9416 1.00000
To try other Neural Network options see Neural Network tags.
# Grid of tuning parameters
nnet_grid <- expand.grid(size = c(5, 10, 15),
decay = c(0.001, 0.01, 0.1))
cl <- makeCluster(3/4 * detectCores())
registerDoParallel(cl)
model_nnet <- train(class ~ ., method = 'nnet', data = dt_train,
importance = TRUE,
maxit = 1000, # set high enough so to be sure that it converges
allowParallel = TRUE,
tuneGrid = nnet_grid,
trControl = ctrl)
stopCluster(cl); remove(cl)
registerDoSEQ()
saveRDS(model_nnet, file = "./cache/model_nnet.rds")
## user system elapsed
## 46.16 0.58 182.03
# The confusion matrix using the test dataset
cm_nnet <- confusionMatrix(data = predict(model_nnet, newdata = dt_test),
dt_test$class)
cm_nnet
## Confusion Matrix and Statistics
##
## Reference
## Prediction agriculture construction forest pasture water
## agriculture 1938 3 0 24 0
## construction 0 1184 0 0 0
## forest 0 0 1610 0 0
## pasture 21 0 2 999 0
## water 0 0 0 0 500
##
## Overall Statistics
##
## Accuracy : 0.992
## 95% CI : (0.9895, 0.9941)
## No Information Rate : 0.3119
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.9896
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: agriculture Class: construction Class: forest
## Sensitivity 0.9893 0.9975 0.9988
## Specificity 0.9938 1.0000 1.0000
## Pos Pred Value 0.9863 1.0000 1.0000
## Neg Pred Value 0.9951 0.9994 0.9996
## Prevalence 0.3119 0.1890 0.2566
## Detection Rate 0.3085 0.1885 0.2563
## Detection Prevalence 0.3128 0.1885 0.2563
## Balanced Accuracy 0.9915 0.9987 0.9994
## Class: pasture Class: water
## Sensitivity 0.9765 1.00000
## Specificity 0.9956 1.00000
## Pos Pred Value 0.9775 1.00000
## Neg Pred Value 0.9954 1.00000
## Prevalence 0.1629 0.07961
## Detection Rate 0.1591 0.07961
## Detection Prevalence 0.1627 0.07961
## Balanced Accuracy 0.9861 1.00000
Get variable relative importance and plot the neural network. Check out the package NeuralNetTools for more details. Helpful can be this Q&A link as well.
cols <- grDevices::colorRampPalette(colors = brewer.pal(n = 9, name = "YlGnBu"))(10)
garson(model_nnet) +
scale_y_continuous('Rel. Importance') +
scale_fill_gradientn(colours = cols)
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
Plot as a neural interpretation diagram, though not super useful here.
We’ll compare the three types of models using the framework set by the caret
package via resamples()
function as long as the train indices of the observations match (which we made sure they do by setting specific seeds). Here we compare the results obtained via cross validation on the train dataset when we tuned the models.
# Create model_list
model_list <- list(rf = model_rf, svm = model_svm, nnet = model_nnet)
# Pass model_list to resamples()
resamples <- caret::resamples(model_list)
## Warning in resamples.default(model_list): Some performance measures were
## not computed for each model: AUC, logLoss, prAUC
In general, the model with the higher median accuracy is the “winner”, as well as a smaller range between min and max accuracy.
## [1] "Accuracy" "Kappa"
## [3] "Mean_Balanced_Accuracy" "Mean_Detection_Rate"
## [5] "Mean_F1" "Mean_Neg_Pred_Value"
## [7] "Mean_Pos_Pred_Value" "Mean_Precision"
## [9] "Mean_Recall" "Mean_Sensitivity"
## [11] "Mean_Specificity"
Paired t-tests:
##
## Call:
## summary.diff.resamples(object = .)
##
## p-value adjustment: bonferroni
## Upper diagonal: estimates of the difference
## Lower diagonal: p-value for H0: difference = 0
##
## Accuracy
## rf svm nnet
## rf -0.0003107 -0.0255460
## svm 1 -0.0252353
## nnet 8.869e-12 5.781e-10
The paired t-tests with Bonferroni multi-test corrections on p-values show that there are significant differences between the neural network model and the other two, while there are no significant differences between the random forest and SVM models. In this case we can choose the neural network model as the “best” kind of model. However, the differences are really small between accuracies. If you want to run tests on all metrics, then just skip the metric argument. The neural network has a marginally better accuracy than the random forest model, only by 2.55%.
Note that the synchronizing capabilities of the mapview
package are interesting to use, but for now there is a bug in displaying correctly the labels in the legends (I submitted an issue on GitHub here).
system.time({
predict_rf <- raster::predict(object = brick_for_prediction_norm,
model = model_rf, type = 'raw')
predict_svm <- raster::predict(object = brick_for_prediction_norm,
model = model_svm, type = 'raw')
predict_nnet <- raster::predict(object = brick_for_prediction_norm,
model = model_nnet, type = 'raw')
})
## user system elapsed
## 43.39 1.72 45.79
sync(viewRGB(brick(rst_crop_lst[1:3]), r = 3, g = 2, b = 1) +
mapView(poly, zcol = "class", col.regions = cls_dt$hex),
mapView(predict_rf, col.regions = cls_dt$hex),
mapView(predict_svm, col.regions = cls_dt$hex),
mapView(predict_nnet, col.regions = cls_dt$hex))
A work by by Valentin Stefan - last update 02 April 2019