3.6. Land Cover Classification

The term Land Cover Classification defined here as a process of assigning the likelihood (probability) of data pixel to represent certain land cover class based on the application of statistical decision rules in the multispectral/multi-temporal domain. The decision rules are generated using a training population of pixels (with assigned land cover classes) by building a decision tree model.


The decision tree is a powerful tool for supervised image classification. The tree model is built automatically using a population of training objects (image pixels manually mapped by an expert). Each pixel in the training population has data on its land cover class (defined by the image interpretation expert, called the “dependent variable”) and on its spectral/temporal properties (multi-temporal metrics values, called “independent variables”). The tree model separates the multi-dimensional space of independent variables into compartments (hypervolumes called “nodes”) so that most training pixels within each node belong to the same information class. When the classification tree model is implemented for an entire image, it will predict the land cover class for every pixel.


The prediction, however, may be false due to inconsistent or insufficient training data. The model can be improved through the iterative process of adding training data, knows as “active learning”. The active learning method consists of iterations of model construction, application, evaluation of results, and adding new training data until the desired map quality is achieved.


In the following section, we describe the application of the supervised decision tree classification tool that employs dependent variable (training data) in the form of vector polygon layers and independent variables in the form of phenological metrics. Provided tools only operate with two land cover classes: a target class, and a background class. The output layer shows the likelihood of each pixel to be assigned to the target class (in percent).

1. Collecting training data

Training data represent two polygon shapefiles, one with areas marking training class pixels (“target”), and the other marking other pixels (“background”). Both shapefiles should in the same coordinate system as phenological metrics (+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs). The classification tool uses only the object shape data, all attributes ignored. The shapefiles may contain overlapping polygons. The correct topology is not required as long as data can be correctly rasterized. The polygons in “target” and “background” shapefiles may overlap. In case of overlap, the area under the “target” class polygons will be erased from the “background” layer.


The polygon layers may be created in any GIS software. The following manual demonstrates the use of QGIS 2.18 for shapefile editing. The following checklist summarizes the requirement for training data collection:

  • QGIS 2.18 with OpenLayers, Freehand Editing, Send2GE plugins.
  • Image mosaic (VRT or raster format) of selected metrics used for data visualization.
  • Two empty shapefiles in geographic coordinates WGS84.  

To collect training data, follow the routine described below:

  • Create classification workspace. The workspace (folder) should include:
  • Open QGIS (new project) and load required plugins.
  • Add raster layers (mosaics of selected metrics). Optionally: load Bing Maps layer using OpenLayers plugin.
  • Load target.shp and background.shp files. Put the target layer onto the top of the background layer in the Layer Panel.
  • Start editing (Toggle Editing button) for both shapefiles
  • Use “Add Polygon” or “Freehand Drawing” tools to add training samples. Avoid creating large training polygons. Distribute samples over the entire area of the image.

Sample drawing example:


Image composite


Target training


Background training (overlaid with target training)

  • Save layers and project (periodically)

2. Applying classification

Before applying classification, check that all required software installed on your computer:

The following software should be installed to generate metrics:

To apply classification, follow the routine described below:

  • Save all edits and close the QGIS project.
  • Edit the classification parameter file.


Metric type


Multi-temporal metrics source folder


Topography metrics source folder


Year (for multi-temporal metrics)


Target class shapefile name


Background class shapefile name


Name of the tile list file


Output file name


Mask file name (none – no mask)


Number of trees (odd number in the range 1-25)


Sampling rate (percent training data extracted for each tree)


Tree pruning rule


Number of parallel processes


Number of parallel processes for a tree model

ogr=C:/Program Files/QGIS 2.18/OSGeo4w.bat

A link to OSGeo4w.bat file (check your local installation)

You may modify parameter file depending on the computer capacity, training size, etc. Specifically:

- Increasing maxtrees parameter will slow classification but improve model generalization.

- Increasing mindev will reduce tree complexity, reducing will increase tree complexity.

- Reduce sampling parameter if sample areas are too large. Increase if maxtrees parameter is reduced.

- Reduce threads and treetherads parameters for a low capacity computer (minimal value 1)


  • Open cmd, navigate to the folder with tile list, and run the program:

> perl C:/GLAD_1.0/classification.pl param_pheno_B.txt

  • Wait for the process to complete.
  • Open QGIS and load the classification result (TIF file). To visualize target class, use transparency threshold 0-49. To show only background class, apply transparency to the interval 50-100.

3. Understanding classification outputs

A decision tree is a hierarchical classifier that predicts class membership by recursively partitioning a data set into more homogeneous subsets. This splitting procedure is followed until a perfect tree (one in which every pixel is discriminated from pixels of other classes, if possible) is created with all pure terminal nodes or until preset conditions are met for terminating the tree’s growth. Our approach employs a deviance measure to split data into nodes that are more homogeneous with respect to class membership than the parent node. The reduction in deviance (D), is defined as:

D = Ds − Dt − Du

where s is the parent node, and t and u are the splits from s. Right and left splits along the digital counts for all data are examined. When D is maximized, the best split has been found, and the data are divided at that digital count and the process repeated on the two new nodes of the tree. The deviance for nodes is calculated from the following:


where n is the number of pixels in class k in node i and p is the probability distribution of class k in node i.


If a decision tree model is implemented without constraints, it will attempt to create pure nodes (the nodes that consist of training pixels of the same class). However, due to noise and errors in training data, such a complex tree may produce incorrect results. To avoid over-fitting of a tree model, we implement pruning based on deviance decrease parameter (mindev). Increasing of the deviance decrease parameter we may reduce the complexity of a tree which will produce a more generalized result.


To improve the model performance in case of errors in training data and noise in independent variables (metrics), we implementing bagging (Bootstrap Aggregation) technique. The essence of bagging is to generate an ensemble of decision trees created using independent subsets of data from a training sample chosen randomly with replacement. The output class likelihood is calculated as the median output of all trees in the ensemble. A user may adjust the number of bagged trees (maxtrees). The number should not be too high (using more than 25 trees has a negligible effect on model performance). The number of trees should be odd (to simplify median calculation).


The decision tree models are stored in the “trees” folder of the classification workspace.


The decision tree file structure:

Input file: D:/xx_GLAD_ARD_WEB/04_classification/sample1.txt

Rows: 7963

Columns : 255

Tree type: Classification tree

mincut: 1

minsize: 2

mindev: 0.000100

Output type of y: label and probabilities

Number of classes: 2

Number of all nodes: 369

Number of terminal nodes: 185


Totally 122 Variables used in model building:

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X15 …

Overall misclassification rate on training data: 0.000000


node), split, n, deviance, yval, (1 2)

                   * denotes terminal node

1) root 7963 11036.273815 1 (0.509356 0.490644)

2) X133 < 1637.5 4206 3621.415894 2 (0.154541 0.845459)

4) X114 < 5264 345 24.589972 1 (0.994203 0.005797)

8) X173 < 1926.5 337 -0.000000 1 (1.000000 0.000000) *

9) X173 >= 1926.5 8 8.997362 1 (0.750000 0.250000)

18) X8 < 306 2 -0.000000 2 (0.000000 1.000000) *

19) X8 >= 306 6 -0.000000 1 (1.000000 0.000000) *

5) X114 >= 5264 3861 2143.461403 2 (0.079513 0.920487)

10) X102 < 1945.5 2843 377.376031 2 (0.012311 0.987689)

20) X141 < 1207.5 2009 45.036198 2 (0.001493 0.998507)

40) X68 < 6829 1961 -0.000000 2 (0.000000 1.000000) * …

Tree header

Rows – number of sampled training pixels

Columns – Number of metrics



Classes IDs: 1 – background, 2 - target








List of all metrics used to build the model. For metric names, refer to metrics_list_pheno_<type>.txt in C:/GLAD_1.0


Misclassification rate. 0 indicates a perfect tree (all training pixels were separated).


Tree model. First the root node (all training pixels), then child nodes and terminal nodes (marked with *).


Parameters for each node:

Node number

Metric used to produce the split

Threshold value

Number of pixels after the split

Deviance after the split

Assigned class

Target classes probability


The classification output is stored as a raster file (LZW-compressed GeoTIFF). All tiles are mosaicked together. The pixel value is in the range of 0 – 100 and represents the target class likelihood. The commonly used threshold to identify the target class is 50% (values 0-49 represent background class, and 50-100 - target class). However, the threshold may be adjusted if needed. The likelihood should not be treated as “probability” or “similarity”, as it depends on the training population. It also never shows the percent of a target class within a pixel.


Another output of classification is tree_report.txt which includes the analysis of metric importance. The metric importance is calculated as the total deviance reduction from all nodes that uses a particular metric for a split. The deviance decrease is summarized for all decision tree models in the ensemble. The “percent_decrease_of_root” shows the percent of total deviance decrease for each metric of the root deviance. The higher the value, the higher the importance of a metric to separate the classes.

4. Iterating classification

Due to a high complexity of land cover, the accuracy of classification that is based on a small subjectively selected training population is (usually) low. To improve the classification accuracy, we implement an active learning method. After obtaining the initial classification output, we evaluate it and add new training sites in areas where commission or omission errors are evident. To perform active learning iterative training, follow this routine:

  • Open the QGIS project and load classification results.
  • Start editing for training shapefiles.
  • Visually check the map (using both target and background class masks) and add training to shapefiles.
  • Save shapefiles and the project and close QGIS.
  • Perform classification. Classification results will be updated.

5. Hierarchical classification: using masks

The classification tool only operates with two classes at a time. After a map of a certain class (e.g., “forest”) is completed, it is possible to use classification results to map a sub-class (e.g., “evergreen forest”) within the “forest” class. The following steps illustrate the application of hierarchical classification using classification output:

  • Create a new classification workspace. Copy there empty training shapefiles, the list of tiles, and classification parameter file.
  • Create a copy of previous classification results (e.g., forest class likelihood) in the new classification workspace.
  • Create a text file that contains a recode table. See recode table example: recode_table.txt. Each line of the recode table file has three elements: the minimal value in the input range, the maximal value in the input range, and the output value. The example file contains the table to recode likelihood file into a mask with values 0 (background class) and 1 (target class).
  • Download utility recode_8bit.exe to C:/GLAD_1.0
  • Open CMD. Navigate to the new classification workspace. Perform the following command:

C:/GLAD_1.0/recode_8bit.exe <input>.tif <output>.tif recode_table.txt

  • The resulting file (e.g. mask.tif) may be used as a mask for classification. The training data will be only collected within a non-zero portion of the mask, and the results will be only applied if the mask file value is >0.
  • Create QGIS project and collect training.
  • Save shapefiles and project and close QGIS.
  • In the classification parameter file, replace “mask=none” with “mask=”mask.tif” (or other mask file name)
  • Run classification.