3.3.1. Phenological metrics type A (pheno_A)
- Purpose: annual land cover and vegetation structure mapping.
- Reference: Potapov et al., 2019 (https://doi.org/10.1016/j.rse.2019.111278)
- Interval: annual (January 1 – December 31).
- 16-day interval data: four years of data is recommended for metric processing (e.g., to generate 2018 metrics we recommend downloading data from 2015 to 2018). See details in the description below.
- Naming convention: Metrics_pheno_A.xlsx
- Metric generation code: https://glad.umd.edu/gladtools/Tools/compute_metrics_pheno_A.zip
- Classification code parameters: Use keyword “pheno_A” to specify this metric set.
- Requires at least 13GB RAM.
- The metrics dataset size for one tile / one year is 9GB.
This metric set is designed to allow annual land cover and vegetation structure mapping models extrapolation in space and time. While this annual metric set is generated primarily using the observations collected during the corresponding year, the data from the three previous years are used to fill gaps in the observation time-series that may affect the consistency of phenological metrics. To generate an annual set of phenological metrics we recommend using 16-day Landsat ARD from four years (the current and three preceding years). The metric generation may be applied to an incomplete set of 16-day intervals (i.e., only intervals for a corresponding year). In this case, the gap-filling algorithm will shut off automatically. Before processing the phenological metrics, use Landsat ARD Download to download the 16-day data time series.
The process of phenological metrics construction includes two stages: (1) selecting clear-sky observations and filling gaps in the observation time series; and (2) extracting reflectance distribution statistics from the selected observation time-series.
First, we compile a time-series of annual observations with the lowest atmospheric contamination (Figure A). The per-pixel criterion for 16-day data selection is defined automatically based on the distribution of quality flags within the four years of data. If clear-sky land or water observations are present in the time-series data, only those are used for subsequent analysis. If no such observations are found, the code successively changes the quality threshold for data inclusion, first allowing observations with proximity to clouds and shadows, then allowing all available observations.
To create an annual gap-filled observation time-series for metric extraction, the code analyzes the duration of the gaps between existing 16-day observations of the current year (Year i). If a gap exceeded two months (four 16-day intervals), it will search for the clear-sky observations in the previous years within the gap date range, starting with Year i-1 and until the Year i-3. When clear-sky observations are found, they are added to the gap-filled time-series data, and the gap analysis is performed again until all gaps longer than two months are filled or no available data are found within the four-year interval.
After compilation of the annual gap-filled observation time-series, the code computes selected normalized band ratios, or indices, (Band A - Band B)/(Band A + Band B) for each selected observation. A spectral variability vegetation index (SVVI, Coulter et al., 2016) is calculated using the standard deviation of spectral reflectance values.
Multi-temporal metrics are generated from the time-series of normalized reflectance and indices using four independent ranking approaches (Figure B). First, all observations are ranked by each spectral band reflectance or index value individually. From obtained individual ranks, we select the highest/lowest, second to the highest/lowest values and values corresponding to the first, second, and third quartiles. In addition to individual observations, we calculate averages for all observations between selected ranks and amplitudes between selected metrics. Second, we distribute observation dates by corresponding ranks of (i) NDVI, (ii) SVVI, and (iii) brightness temperature. For these distributions, we extract observation dates corresponding to highest/lowest, second to highest/lowest and first, second, and third quartiles of the ranked variable. For the metric set, we record normalized surface reflectance of these observations and calculated averages and amplitudes for observations between selected ranks. The amplitudes are not written to the files but calculated on the fly by classification software.
Metric types and naming convention
The metrics for each tile is stored in a separate folder as a single-band UInt16 bit GeoTIFF files. The generic naming convention is the following:
YYYY – corresponding year
B – spectral band or index
S – statistic
C – corresponding band or index used for ranking (only for metrics extracted from ranks defined by a corresponding value)
2018_blue_max_RN.tif - The metric represents the value of the normalized surface reflectance of the Landsat blue band for the 16-day interval that has the highest red/NIR normalized ration (also known as NDVI) value during the year 2018.
The table Metrics_pheno_A.xlsx has details on the bands, indices, and computed statistics.
In addition to spectral metrics, the metric generation software produces a set of technical layers including the number of cloud-free 16-day composites, gap-filling algorithm outputs, data quality, and water presence.
The following software should be installed to generate metrics:
- ActivePerl (https://www.activestate.com/products/activeperl/)
- GLAD_1.0 core package (https://glad.umd.edu/gladtools/Complete_package/GLAD_1.0_master.zip)
- Download all required 16-day composites
- Download and install software
- Make a list of tiles to process (single column, tile names only – see example tiles.txt).
- Use the following command to compute metrics:
> perl C:/GLAD_1.0/metrics_pheno.pl A <tile_list> <year> <input folder> <output folder> <threads>
> perl C:/GLAD_1.0/metrics_pheno.pl A tiles.txt 2018 D:/Data D:/Metrics 1
The command parameters are:
Input folder: the folder with 16-day composite data. It should contain tile data in subfolders.
Output folder: will be created by the code, tile data will be recorded into subfolders.
Threads: the number of parallel processes. The parameter should be increased only if:
- A computer has a multi-core processor (e.g., Intel Xeon)
- The RAM can hold several processes simultaneously. Each process will use 13GB RAM. To calculate the total RAM usage, multiply 13GB by the number of processes.