R - using Random Forests, Support Vector Machines and Neural Networks for a pixel based supervised classification of Sentinel-2 multispectral images

Summary

Implementing a classic pixel based supervised classification/supervised learning of Sentinel-2 multispectral images for an area in central Romania. I used the framework provided by the caret R package (“Classification and Regression Training”). I compered predictions from Random Forests with those from SVM and Neural Networks.

Bands used: B2, B3, B4, B5, B6, B7, B8, B8A, B11 and B12, so ten out of the total thirteen spectral bands. I didn’t use the 60 m bands “Band 1 - Coastal aerosol”, “Band 9 - Water vapour” & “Band 10 - SWIR - Cirrus” (wiki table), which carry information about atmospheric water content. There was no further thought invested into what combinations of bands to use.

Downloaded the “Level-2A” processed data sets (Level-2A = bottom-of-atmosphere reflectance in cartographic geometry). For downloading I used the framework provided by the R package getSpatialData, which connects to the Copernicus Open Access Hub.

Digitization of training polygons was carried in QGIS using the Bing aerial imagery and the True Color Image (TCI) at 10 m resulution in background for visual validation. Five classes were identified for this exercise: agriculture, construction, forest, pasture and water.

Disclaimer: This is work in progress based on my own exploration with the methods. Treat this content as a blog post and nothing more. It does not have the pretention to be an exhaustive exercise nor a replacement for your critical thinking. If you find mistakes/bottle necks/bugs in my work, please let me know by opening an issue or can also let me know on twitter.

The R environment

R Session

## R version 3.5.2 (2018-12-20)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] parallel  grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] snow_0.4-3           doParallel_1.0.14    iterators_1.0.10    
##  [4] foreach_1.4.4        stringr_1.4.0        dplyr_0.8.0.1       
##  [7] data.table_1.12.0    LiblineaR_2.10-8     NeuralNetTools_1.5.2
## [10] nnet_7.3-12          MLmetrics_1.1.1      ranger_0.11.2       
## [13] randomForest_4.6-14  caret_6.0-81         plotly_4.8.0        
## [16] ggplot2_3.1.0        mapview_2.6.3        rasterVis_0.45      
## [19] latticeExtra_0.6-28  RColorBrewer_1.1-2   lattice_0.20-38     
## [22] getSpatialData_0.0.4 RStoolbox_0.2.4      sf_0.7-3            
## [25] gdalUtils_2.0.1.14   rgdal_1.4-3          ncdf4_1.16.1        
## [28] raster_2.8-19        sp_1.3-1             png_0.1-7           
## [31] rmarkdown_1.12       epuRate_0.1         
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1   class_7.3-14       leaflet_2.0.2     
##  [4] rprojroot_1.3-2    satellite_1.0.1    base64enc_0.1-3   
##  [7] fs_1.2.7           hexbin_1.27.2      remotes_2.0.2     
## [10] prodlim_2018.04.18 lubridate_1.7.4    xml2_1.2.0        
## [13] codetools_0.2-15   splines_3.5.2      R.methodsS3_1.7.1 
## [16] knitr_1.22         pkgload_1.0.2      jsonlite_1.6      
## [19] R.oo_1.22.0        rgeos_0.4-2        shiny_1.2.0       
## [22] compiler_3.5.2     httr_1.4.0         backports_1.1.3   
## [25] assertthat_0.2.1   Matrix_1.2-15      lazyeval_0.2.2    
## [28] cli_1.1.0          later_0.8.0        htmltools_0.3.6   
## [31] prettyunits_1.0.2  tools_3.5.2        gtable_0.2.0      
## [34] glue_1.3.1         reshape2_1.4.3     Rcpp_1.0.1        
## [37] nlme_3.1-137       crosstalk_1.0.0    timeDate_3043.102 
## [40] gower_0.2.0        xfun_0.5           ps_1.3.0          
## [43] mime_0.6           devtools_2.0.1     XML_3.98-1.19     
## [46] zoo_1.8-5          getPass_0.2-2      MASS_7.3-51.1     
## [49] scales_1.0.0       ipred_0.9-8        promises_1.0.1    
## [52] mapedit_0.5.0      yaml_2.2.0         memoise_1.1.0     
## [55] geosphere_1.5-7    rpart_4.1-13       stringi_1.4.3     
## [58] desc_1.2.0         e1071_1.7-1        pkgbuild_1.0.3    
## [61] lava_1.6.5         rlang_0.3.2        pkgconfig_2.0.2   
## [64] evaluate_0.13      purrr_0.3.2        recipes_0.1.5     
## [67] htmlwidgets_1.3    processx_3.3.0     tidyselect_0.2.5  
## [70] plyr_1.8.4         magrittr_1.5       R6_2.4.0          
## [73] generics_0.0.2     DBI_1.0.0          pillar_1.3.1      
## [76] withr_2.1.2        units_0.6-2        survival_2.43-3   
## [79] tibble_2.1.1       crayon_1.3.4       usethis_1.4.0     
## [82] callr_3.2.0        ModelMetrics_1.2.2 digest_0.6.18     
## [85] classInt_0.3-1     webshot_0.5.1      xtable_1.8-3      
## [88] tidyr_0.8.3        httpuv_1.5.0       R.utils_2.8.0     
## [91] stats4_3.5.2       munsell_0.5.0      viridisLite_0.3.0 
## [94] sessioninfo_1.1.1

Checkpoint

If you want to install a snapshot of the packages as they existed on CRAN at the creation of this document, then can use the checkpoint package:

# /////////////////////////////////////////////////////////////////////////
#
# Create a checkpoint at "2019-03-23".
#
# This should guarantee the same versions of the packages as they existed at the
# specified point in time.
# 
# Scans for packages in the project folder and its subfolder. It scans all R code
# (.R, .Rmd, and .Rpres files) for `library()` and `require()` statements. Then
# creates a local library into which it installs a copy of the packages required
# in the project as they existed on CRAN at the specified snapshot date. See
# details with ?checkpoint after you load library(checkpoint), or here:
# https://cran.r-project.org/web/packages/checkpoint/vignettes/checkpoint.html
#
# Warning: Installing older versions of the packages in the .checkpoint local
# library, may take up some hundreds of MG in space.
#
# /////////////////////////////////////////////////////////////////////////

# install.packages("checkpoint")
library(checkpoint) # checkpoint, version 0.4.5

checkpoint(snapshotDate = "2019-03-23")

# Check that library path is set to ~/.checkpoint/2019-03-23/ ...
.libPaths()
grepl(pattern = "\\.checkpoint/2019-03-23/", x = .libPaths()[1]) # should be TRUE


# Optional checks ---------------------------------------------------------

# Check that CRAN mirror is set to MRAN snapshot
getOption("repos")
# At first run, after installing the packages, should see something like:
#                                                                            nowosaddrat 
# "https://mran.microsoft.com/snapshot/2019-03-23"     "https://nowosad.github.io/drat/" 

# Check which packages are installed in checkpoint library
installed.packages(.libPaths()[1])[, "Package"]

# Experimental - use unCheckpoint() to reset .libPaths to the default user library. 
# Note that this does not undo any of the other side-effects. Specifically, all
# loaded packages remain loaded, and the value of getOption("repos") remains
# unchanged. See details with ?unCheckpoint

Data preparation

Get raster data

Define AOI

AOI as matrix

Define an area of interest (AOI). Here I used a matrix that defines the corners of a square (unprojected coordinates in WGS84 datum):

## Warning: Argument 'aoi' is a matrix, assuming '+proj=longlat +datum=WGS84
## +no_defs' projection.

AOI as shapefile

Note that, in case you have a shapefile of the aoi, you can then use it instead of building the matrix as above. You can read it with rgdal::readOGR() or sf::st_read(), example:

Query API & download

First, you have to register to the Copernicus Open Access Hub (COAH). Their user guide illustrates here how to just do that.



Search for data and filter the records:



Preview records on a mapview map with the AOI. I was hoping to get good data for the lowest cloud coverage, but when visualizing it, I realized that data is incomplete. I am better off with some other record with very low cloud coverage. This proves that one needs to visually check the data in order to be sure one gets usable records.

Left - record for the lowest cloud coverage (is incomplete); right - final selected record for analysisLeft - record for the lowest cloud coverage (is incomplete); right - final selected record for analysis

Left - record for the lowest cloud coverage (is incomplete); right - final selected record for analysis



Get and uzip the data:

Crop raster images

Read and crop the raster bands: B2, B3, B4, B5, B6, B7, B8, B8A, B11 and B12:



Prepare the extent for cropping from AOI. Don’t forget to project the coordinates to the CRS of the sentinel raster datasets, which is +proj=utm +zone=34 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0.



Crop the bands



Visualize the crop in Natural Color (R = Red, G = Green, B = Blue).

## Warning in rasterCheckSize(x, maxpixels): maximum number of pixels for Raster* viewing is 5e+05 ; 
## the supplied Raster* has 722879 
##  ... decreasing Raster* resolution to 5e+05 pixels
##  to view full resolution set 'maxpixels =  722879 '



Save the cropped bands so that we can delete the raw original ones in order to save some space (this is optional, but I need the space…). For the purpose of being able to build faster this document I cached the results and are read in the setup chunk at the top of the rmd file. Also I can’t actually be expected to download data from the Copernicus API every time I build this document from its rmarkdown file :)

Center & scale raster images

This process involves subtracting the mean and dividing by the standard deviation for each variable/feature/band. Is not that data needs normalization as in the case of linear regression which assumes a Gaussian distribution of the response/predicted variable, but the transformation seems to be important for the ML algorithms, especially for the neural networks as mentioned here, or depicted here.

Data from training polygons

Polygons to points

Read the training polygons.



Plot them:

The polygons need to be projected using the Sentinel’s CRS.



Rasterize the polygons to 10 m resolution, convert the raster to points and then use them to extract values from the Sentinel bands.

Extract band values to points

data.table syntax

If the data.table syntax is too foreign for you, then can check the dplyr syntax tab.

Split into train and test

The training dataset will be used for model tuning by cross-validation and grid search. Then will use the final tuned models on the test dataset to build confusion matrices (as showed in the intro vignette of caret package, here).

## 
##  agriculture construction       forest      pasture        water 
##         4572         2771         3762         2388         1169
## 
##  agriculture construction       forest      pasture        water 
##         1959         1187         1612         1023          500

Fit models

The training dataset is used for carrying cross-validation (CV) and grid search for model tuning. Once the optimal/best parameters were found a final model is fit to the entire training dataset using those findings. Further we can check how these final models behave on unseen data (the testing dataset).

Details are provided in the intro vignette of caret package, here and also in this documentation of scikit-learn.

The CV indices need to match when comparing multiple models, so to get a fair comparison. Therefore, folds will pass to trainControl argument for each type of model. See also the example from help(trainControl).



Note that, for each model, in trainControl we need to provide the followings:

All in all, this is important for being able to compare the different type of models; see details in the chapter “Selecting models: a case study in churn prediction” in Data Camp - Machine Learning Toolbox.


Models can be also tuned manually. caret does it automatically anyways, choosing some default values for us. Manual tuning means picking some desired values for tuning the model parameters instead the default ones. So, you can have more fine-grained control over the tuning parameters.

Manual tuning is done via the tuneGrid argument, and of course, differs from model to model, e.g.:

  • for the random forest models there can be the mtry parameter (the number of randomly selected predictors, k, to choose from at each split);
  • for svm models, can be the cost and Loss function;
  • for neural networks, can be size and decay

See details in Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). New York: Springer or Hastie, T., James, G., Tibshirani, R., & Witten, D. (2013). An introduction to statistical learning with applications in R.


List of available models with caret - train Models By Tag

Random forest

Model summary & confusion matrix

Tuning here was done via the mtry argument, which can vary from 2 up to total number of predictors (bands) used (here, 10).

So, the optimization was done via cross validation and grid search (here by grid I refer to tuneGrid = data.frame(mtry = c(2, 3, 4, 5, 8))). The final/optimal, model stored in model_rf, corresponds to mtry = 5 with the highest accuracy = 0.956.

##    user  system elapsed 
##   16.57    0.51   73.96



Compute the confusion matrix and associated statistics using the test data. A confusion matrix indicates how “confused” the model is between the given classes and highlights instances in which one class is confused for another. The main (first) diagonal of the matrix shows the cases when the model is “correct” (Data Camp - Machine Learning Toolbox).

  • Accuracy looks really high, far better that the “no-information-rate” model which always predicts the dominant class (here, the first level of the class factor - agriculture);
  • Sensitivity (Recall) refers to the true positive rate (model correctly detects the class);
  • Specificity is the true negative rate (model correctly rejects the class)

See also this Wikipedia link, this one or help(confusionMatrix) for more details on confusion matrix terminology.

## Confusion Matrix and Statistics
## 
##               Reference
## Prediction     agriculture construction forest pasture water
##   agriculture         1935            6      1      47     0
##   construction           0         1181      0       0     0
##   forest                 1            0   1610       0     0
##   pasture               23            0      1     976     0
##   water                  0            0      0       0   500
## 
## Overall Statistics
##                                         
##                Accuracy : 0.9874        
##                  95% CI : (0.9843, 0.99)
##     No Information Rate : 0.3119        
##     P-Value [Acc > NIR] : < 2.2e-16     
##                                         
##                   Kappa : 0.9836        
##  Mcnemar's Test P-Value : NA            
## 
## Statistics by Class:
## 
##                      Class: agriculture Class: construction Class: forest
## Sensitivity                      0.9877              0.9949        0.9988
## Specificity                      0.9875              1.0000        0.9998
## Pos Pred Value                   0.9729              1.0000        0.9994
## Neg Pred Value                   0.9944              0.9988        0.9996
## Prevalence                       0.3119              0.1890        0.2566
## Detection Rate                   0.3081              0.1880        0.2563
## Detection Prevalence             0.3167              0.1880        0.2565
## Balanced Accuracy                0.9876              0.9975        0.9993
##                      Class: pasture Class: water
## Sensitivity                  0.9541      1.00000
## Specificity                  0.9954      1.00000
## Pos Pred Value               0.9760      1.00000
## Neg Pred Value               0.9911      1.00000
## Prevalence                   0.1629      0.07961
## Detection Rate               0.1554      0.07961
## Detection Prevalence         0.1592      0.07961
## Balanced Accuracy            0.9747      1.00000



You can also get a confusion matrix for the final model using the entire train dataset. This is different from the approach above. However, you would usually want to see a confusion matrix based on the testing dataset. In both cases one can see that the model is the most confused about distinguishing between the pasture and agriculture land use classes. On the other hand, the model “seems super certain” when classifying water.

## 
## Call:
##  randomForest(x = x, y = y, mtry = param$mtry, importance = TRUE,      allowParallel = TRUE) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 5
## 
##         OOB estimate of  error rate: 1.45%
## Confusion matrix:
##              agriculture construction forest pasture water class.error
## agriculture         4506            3      5      58     0 0.014435696
## construction          18         2751      2       0     0 0.007217611
## forest                 0            0   3762       0     0 0.000000000
## pasture              124            0      2    2262     0 0.052763819
## water                  0            0      0       0  1169 0.000000000

Predictor importance

Simple rule: higher values mean the variables are more important.

So, band 11 seems to be very important for detecting pasture patches.


For more in depth details about predictor importance, check the work of Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). New York: Springer, specifically, chapters 18 Measuring Predictor Importance and 8.5 Random Forests. Also check this Stack Overflow link.

Some selected ideas:

  • The importance values indicate the loss in performance when the effect of the predictor is negated. So, a substantial drop in performance is indicative of an important predictor;
  • Correlations between predictors can have a significant impact on the importance values (case of uninformative predictors highly correlated with informative ones get abnormally high importance);
  • Correlated important predictors can dilute each other (render each other artificially unimportant)

From the R syntax point of view, there are several ways to get importance values and metrics for each band:

  • caret::varImp(), generic method, so will work also for svm and nnet models;
  • randomForest::importance() & randomForest::varImpPlot, work only for randomForest models

Note that, by default the scale argument in both caret::varImp() and randomForest::importance() is set to TRUE. However, in caret::varImp() it means that the importance values are scaled between 0 and 100 %, while this doesn’t seem to be true in the case of randomForest::importance(). One way or another, the conveyed information is the same, as you can see from the two heatmaps below. To get identical results, set scale = FALSE in caret::varImp().


SVM

“L2 Regularized Support Vector Machine (dual) with Linear Kernel”. To try other SVM options see SVM tags.

Note that, importance = TRUE is not applicable anymore, so I didn’t mentioned it in train(). Same for class probabilities classProbs = TRUE defined in ctrl above. However, I didn’t bother to make another ctrl object for SVM, so it works to recycle the one used for the random forests models with ignoring the warning: Class probabilities were requested for a model that does not implement them.

Model summary & confusion matrix

##    user  system elapsed 
##    1.44    0.31   16.55

## Confusion Matrix and Statistics
## 
##               Reference
## Prediction     agriculture construction forest pasture water
##   agriculture         1818           13      0      90     0
##   construction           5         1173      0       0     0
##   forest                 1            0   1612       3     0
##   pasture              135            1      0     930     0
##   water                  0            0      0       0   500
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9605          
##                  95% CI : (0.9554, 0.9652)
##     No Information Rate : 0.3119          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9487          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: agriculture Class: construction Class: forest
## Sensitivity                      0.9280              0.9882        1.0000
## Specificity                      0.9762              0.9990        0.9991
## Pos Pred Value                   0.9464              0.9958        0.9975
## Neg Pred Value                   0.9677              0.9973        1.0000
## Prevalence                       0.3119              0.1890        0.2566
## Detection Rate                   0.2894              0.1868        0.2566
## Detection Prevalence             0.3058              0.1875        0.2573
## Balanced Accuracy                0.9521              0.9936        0.9996
##                      Class: pasture Class: water
## Sensitivity                  0.9091      1.00000
## Specificity                  0.9741      1.00000
## Pos Pred Value               0.8724      1.00000
## Neg Pred Value               0.9822      1.00000
## Prevalence                   0.1629      0.07961
## Detection Rate               0.1481      0.07961
## Detection Prevalence         0.1697      0.07961
## Balanced Accuracy            0.9416      1.00000

Neural Network

To try other Neural Network options see Neural Network tags.

Model summary & confusion matrix

##    user  system elapsed 
##   46.16    0.58  182.03

## Confusion Matrix and Statistics
## 
##               Reference
## Prediction     agriculture construction forest pasture water
##   agriculture         1938            3      0      24     0
##   construction           0         1184      0       0     0
##   forest                 0            0   1610       0     0
##   pasture               21            0      2     999     0
##   water                  0            0      0       0   500
## 
## Overall Statistics
##                                           
##                Accuracy : 0.992           
##                  95% CI : (0.9895, 0.9941)
##     No Information Rate : 0.3119          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9896          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: agriculture Class: construction Class: forest
## Sensitivity                      0.9893              0.9975        0.9988
## Specificity                      0.9938              1.0000        1.0000
## Pos Pred Value                   0.9863              1.0000        1.0000
## Neg Pred Value                   0.9951              0.9994        0.9996
## Prevalence                       0.3119              0.1890        0.2566
## Detection Rate                   0.3085              0.1885        0.2563
## Detection Prevalence             0.3128              0.1885        0.2563
## Balanced Accuracy                0.9915              0.9987        0.9994
##                      Class: pasture Class: water
## Sensitivity                  0.9765      1.00000
## Specificity                  0.9956      1.00000
## Pos Pred Value               0.9775      1.00000
## Neg Pred Value               0.9954      1.00000
## Prevalence                   0.1629      0.07961
## Detection Rate               0.1591      0.07961
## Detection Prevalence         0.1627      0.07961
## Balanced Accuracy            0.9861      1.00000



Get variable relative importance and plot the neural network. Check out the package NeuralNetTools for more details. Helpful can be this Q&A link as well.

## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.



Plot as a neural interpretation diagram, though not super useful here.

Compare models

We’ll compare the three types of models using the framework set by the caret package via resamples() function as long as the train indices of the observations match (which we made sure they do by setting specific seeds). Here we compare the results obtained via cross validation on the train dataset when we tuned the models.

## Warning in resamples.default(model_list): Some performance measures were
## not computed for each model: AUC, logLoss, prAUC



In general, the model with the higher median accuracy is the “winner”, as well as a smaller range between min and max accuracy.

##  [1] "Accuracy"               "Kappa"                 
##  [3] "Mean_Balanced_Accuracy" "Mean_Detection_Rate"   
##  [5] "Mean_F1"                "Mean_Neg_Pred_Value"   
##  [7] "Mean_Pos_Pred_Value"    "Mean_Precision"        
##  [9] "Mean_Recall"            "Mean_Sensitivity"      
## [11] "Mean_Specificity"



Paired t-tests:

## 
## Call:
## summary.diff.resamples(object = .)
## 
## p-value adjustment: bonferroni 
## Upper diagonal: estimates of the difference
## Lower diagonal: p-value for H0: difference = 0
## 
## Accuracy 
##      rf        svm        nnet      
## rf             -0.0003107 -0.0255460
## svm  1                    -0.0252353
## nnet 8.869e-12 5.781e-10

The paired t-tests with Bonferroni multi-test corrections on p-values show that there are significant differences between the neural network model and the other two, while there are no significant differences between the random forest and SVM models. In this case we can choose the neural network model as the “best” kind of model. However, the differences are really small between accuracies. If you want to run tests on all metrics, then just skip the metric argument. The neural network has a marginally better accuracy than the random forest model, only by 2.55%.

Visualize classifications

Note that the synchronizing capabilities of the mapview package are interesting to use, but for now there is a bug in displaying correctly the labels in the legends (I submitted an issue on GitHub here).

##    user  system elapsed 
##   43.39    1.72   45.79
 




A work by by Valentin Stefan - last update 02 April 2019