The primary aim of this hands-on exercise is to produce wall-to-wall maps of aboveground biomass (AGB; Mg ha⁻¹) for a selected area of interest (AOI) using AI-based modelling approaches in R. The workflow follows the same strategy introduced in Exercise 04. Participants will compute LiDAR-derived metrics, train a Random Forest model using field plot data, and generate a continuous AGB prediction map. Unlike the previous exercise, the analysis is performed using a LAScatalog composed of multiple LiDAR tiles rather than a single point cloud. The data are already normalised (Z = height above ground, m), so terrain modelling steps such as ground classification, DTM generation, and height normalisation are not repeated. The same methodology is therefore applied operationally over a larger area.
The modelling logic is consistent with Exercise 04:
LAScatalog to enable memory-safe, chunk-based
processing.f_metrics() function used for
wall-to-wall mapping (training–prediction consistency).catalog_apply() and predict AGB per tile.LAScatalog, chunk processing, and metric extractionpackages <- c("lidR","terra","sf","dplyr","ranger","future","tcltk","ggplot2")
missing_pkgs <- packages[!packages %in% installed.packages()[, "Package"]]
if (length(missing_pkgs) > 0) install.packages(missing_pkgs, dependencies = TRUE)
suppressMessages(lapply(packages, library, character.only = TRUE))
options(lidR.progress = FALSE)
terra_temp <- file.path(tempdir(), "terra_tmp")
dir.create(terra_temp, recursive = TRUE, showWarnings = FALSE)
terra::terraOptions(tempdir = terra_temp)Select (1) the folder containing normalised LAZ tiles, and (2) the plots file containing AGB.
## Select the folder containing NORMALIZED LAZ tiles (Z already normalized)...
norm_dir <- tcltk::tk_choose.dir(caption = "Select NORMALIZED LAZ tile folder")
if (is.na(norm_dir) || norm_dir == "") stop("No normalized folder selected.")
cat("Select the plot file (GeoPackage / Shapefile / GeoJSON) containing AGB...\n")## Select the plot file (GeoPackage / Shapefile / GeoJSON) containing AGB...
These parameters control plot geometry handling, CRS metadata assignment, mapping resolution, chunk buffering, height filtering, parallelisation, and Random Forest settings.
id_col <- "plotID_best"
agb_col <- "AGB"
plots_are_points <- FALSE
plot_radius_m <- 20
# If CRS missing in tiles, assign it here (metadata only)
epsg_target <- 32617
# Metrics resolution for mapping
metrics_res <- 40
metrics_start <- c(0, 0)
# Catalog chunking
chunk_size <- 0
chunk_buffer <- 30
# Height filtering (normalized heights)
max_norm_height <- 80
# Parallel
use_parallel <- TRUE
n_workers <- 4
omp_threads <- 1 # IMPORTANT with future
# RF
rf_num_trees <- 600
train_frac <- 0.70
# Metrics to EXCLUDE (problematic bands)
metrics_to_exclude <- c("TotalReturns")All products are saved under a single root folder:
ForestSAT2026_05_Exercises/01_plot_clips_norm/ForestSAT2026_05_Exercises/02_metrics_tiles/ForestSAT2026_05_Exercises/03_agb_tiles_from_metrics/out_dir <- file.path(dirname(norm_dir), "ForestSAT2026_05_Exercise")
dir.create(out_dir, showWarnings = FALSE, recursive = TRUE)
dir_clips <- file.path(out_dir, "01_plot_clips_norm")
dir_metrics <- file.path(out_dir, "02_metrics_tiles")
dir_pred <- file.path(out_dir, "03_agb_tiles_from_metrics")
dir.create(dir_clips, showWarnings = FALSE, recursive = TRUE)
dir.create(dir_metrics, showWarnings = FALSE, recursive = TRUE)
dir.create(dir_pred, showWarnings = FALSE, recursive = TRUE)
model_rds <- file.path(out_dir, "final_model_rf.rds")
train_csv <- file.path(out_dir, "training_table.csv")
metrics_csv <- file.path(out_dir, "rf_accuracy_metrics.csv")Parallelism is handled by future at the chunk/tile
level.
To avoid nested parallelism, keep lidR threads to
1 (omp_threads <- 1).
A LAScatalog enables scalable reading and processing of
many tiles. Filters are applied to remove withheld/overlap points and
enforce a valid height range.
ctg_norm <- lidR::readLAScatalog(norm_dir)
if (is.na(sf::st_crs(ctg_norm))) {
message("Catalog CRS is missing. Assigning EPSG:", epsg_target)
sf::st_crs(ctg_norm) <- sf::st_crs(epsg_target)
}
if (!lidR::is.indexed(ctg_norm)) {
message("No .lax index found. Creating .lax files (recommended)...")
tryCatch(lidR:::catalog_laxindex(ctg_norm),
error = function(e) message("Could not create .lax indexes. Continue anyway."))
}
lidR::opt_select(ctg_norm) <- "xyzicrn"
lidR::opt_filter(ctg_norm) <- paste(
"-drop_withheld -drop_overlap",
sprintf("-drop_z_below %g", 0),
sprintf("-drop_z_above %g", max_norm_height)
)
lidR::opt_chunk_size(ctg_norm) <- chunk_size
lidR::opt_chunk_buffer(ctg_norm) <- chunk_buffer
lidR::opt_wall_to_wall(ctg_norm) <- TRUEPlots are read and transformed to match the catalog CRS. If plots are points, they can be buffered into circular plots.
plots <- sf::st_read(plots_path, quiet = TRUE)
if (!all(c(id_col, agb_col) %in% names(plots))) {
stop("Plots file must contain columns: ", id_col, " and ", agb_col)
}
plots <- sf::st_transform(plots, sf::st_crs(ctg_norm))
if (plots_are_points) {
plots <- sf::st_buffer(plots, dist = plot_radius_m)
}
plots <- plots %>% dplyr::select(dplyr::all_of(c(id_col, agb_col)), geometry)Each plot polygon is used to clip the catalog and write plot-level LAZ/ LAS files. These plot clips are later used to compute plot-level metrics consistently.
plots_clip <- plots %>% dplyr::rename(ID = !!id_col)
message("Clipping normalized point clouds by plot geometry (writing files)...")
ctg_clip <- ctg_norm
lidR::opt_output_files(ctg_clip) <- file.path(dir_clips, "plot_{ID}")
lidR::opt_laz_compression(ctg_clip) <- TRUE
lidR::clip_roi(ctg_clip, plots_clip)## Chunk 1 of 10 (10%): state ✓
## Chunk 2 of 10 (20%): state ✓
## Chunk 3 of 10 (30%): state ✓
## Chunk 4 of 10 (40%): state ✓
## Chunk 5 of 10 (50%): state ✓
## Chunk 7 of 10 (60%): state ✓
## Chunk 6 of 10 (70%): state ✓
## Chunk 8 of 10 (80%): state ✓
## Chunk 9 of 10 (90%): state ✓
## Chunk 10 of 10 (100%): state ✓
##
## class : LAScatalog (v1.3 format 3)
## extent : 541864, 542652.3, 4134595, 4136540 (xmin, xmax, ymin, ymax)
## coord. ref. : WGS 84 / UTM zone 17N
## area : 15997.72 m²
## points : 654.5 thousand points
## type : airborne
## density : 40.9 points/m²
## density : 14.5 pulses/m²
## num. files : 10
This step ensures consistency between model training and spatial prediction. For each field plot, LiDAR point clouds are summarized into structural metrics describing canopy height and vertical structure, using the same definitions later applied to the wall-to-wall data. Plots with insufficient or unreliable information are excluded, and the remaining plots are linked to their field-measured aboveground biomass (AGB). Records with missing or invalid values are removed, and only informative predictors are retained. The resulting harmonized dataset contains observed AGB and corresponding LiDAR metrics and is used to calibrate the biomass model.
plot_metrics_one <- function(las, min_pts = 30) {
if (is.null(las) || lidR::is.empty(las)) return(NULL)
d <- las@data
d <- d[is.finite(d$Z) & d$Z >= 0 & d$Z <= max_norm_height, , drop = FALSE]
if (nrow(d) < min_pts) return(NULL)
intensity <- if ("Intensity" %in% names(d)) d$Intensity else rep(NA_real_, nrow(d))
m <- f_metrics(d$X, d$Y, d$Z, intensity)
as.data.frame(as.list(m))
}
message("Computing plot-level metrics from plot clips...")
metrics_list <- lapply(clip_files, function(f) {
las <- lidR::readLAS(f)
mdf <- plot_metrics_one(las, min_pts = 30)
if (is.null(mdf)) return(NULL)
pid <- sub("^plot_", "", tools::file_path_sans_ext(basename(f)))
mdf$plot_id <- as.character(pid)
mdf
})metrics_df <- dplyr::bind_rows(metrics_list)
if (nrow(metrics_df) == 0) stop("All plot clips were empty or produced no metrics.")
plots_tbl <- sf::st_drop_geometry(plots) %>%
dplyr::transmute(
plot_id = as.character(.data[[id_col]]),
y = as.numeric(.data[[agb_col]])
)
train_df <- plots_tbl %>%
dplyr::inner_join(metrics_df, by = "plot_id") %>%
dplyr::filter(is.finite(y)) %>%
dplyr::select(where(~ !all(is.na(.))))
write.csv(train_df, train_csv, row.names = FALSE)
message("Training table saved: ", train_csv)In this step, LiDAR-derived plot metrics are used as predictors to model plot-level aboveground biomass (AGB). The dataset is first restricted to numeric predictors (as required by XGBoost), cleaned of missing or non-finite values, and then split into 70% training and 30% test sets to evaluate performance on independent data. Random Forest (RF) models were implemented using the ranger package. RF was chosen because it is robust to noise, captures non-linear relationships, handles correlated predictors effectively, and generally provides stable performance with minimal parameter tuning. Predictions were generated for both the training and test sets, and model performance was visually summarized to assess generalization. The best-performing model, primarily based on test performance, was then selected and carried forward to the subsequent steps of the workflow.
pred_cols <- setdiff(names(train_df), c("plot_id", "y"))
is_num <- sapply(train_df[, pred_cols, drop = FALSE], is.numeric)
pred_cols <- pred_cols[is_num]
# Exclude problematic metrics from modeling
pred_cols <- setdiff(pred_cols, metrics_to_exclude)
dat_use <- train_df[, c("plot_id", "y", pred_cols)]
dat_use <- dat_use[stats::complete.cases(dat_use[, c("y", pred_cols)]), , drop = FALSE]
n <- nrow(dat_use)
if (n < 4) stop("Not enough plots for training. Need at least 4 complete cases.")
idx <- sample(seq_len(n))
n_train <- max(2, floor(train_frac * n))
train_id <- idx[1:n_train]
test_id <- setdiff(idx, train_id)
train <- dat_use[train_id, , drop = FALSE]
test <- dat_use[test_id, , drop = FALSE]
target <- "y"
features <- pred_cols
rf_fit <- ranger(
dependent.variable.name = target,
data = train[, c(target, features)],
num.trees = rf_num_trees,
mtry = max(1, floor(sqrt(length(features)))),
importance = "impurity",
seed = 42
)
pred_train <- predict(rf_fit, data = train[, features, drop = FALSE])$predictions
pred_test <- predict(rf_fit, data = test[, features, drop = FALSE])$predictions
metrics_out <- dplyr::bind_rows(
extract_metrics(train$y, pred_train, model_name = "RF_ranger", part_name = "train") |>
dplyr::mutate(n = nrow(train), .before = 1),
extract_metrics(test$y, pred_test, model_name = "RF_ranger", part_name = "test") |>
dplyr::mutate(n = nrow(test), .before = 1)
)
write.csv(metrics_out, metrics_csv, row.names = FALSE)
saveRDS(rf_fit, model_rds)
message("RF model saved: ", model_rds)
message("Accuracy metrics saved: ", metrics_csv)
# Safety: ensure excluded metrics are not in 'features'
features <- setdiff(rf_fit$independent.variable.names, metrics_to_exclude)Model performance is evaluated by comparing predicted and observed plot-level aboveground biomass (AGB) for both training and independent test datasets. Accuracy is quantified using complementary metrics that describe goodness of fit, prediction error magnitude, and systematic bias. Specifically, we report the coefficient of determination (R²; Eq1), Root Mean Square Error (RMSE;Eq2-3), bias (Eq3) and Mean Absolute Error (MAE;Eq5-6), along with their relative forms normalized by the mean observed AGB. Using both absolute and relative metrics provides a robust assessment of model accuracy and generalization performance, supporting objective comparison between the Random Forest and XGBoost models.
\[ R^2 = 1 - \frac{\sum_{i=1}^{n}\left(y_i - \hat{y}_i\right)^2}{\sum_{i=1}^{n}\left(y_i - \bar{y}\right)^2} \qquad \text{(Eq1)} \] \[ RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}\left(\hat{y}_i - y_i\right)^2} \qquad \text{(Eq2)} \] \[ RMSE\% = \frac{RMSE}{\bar{y}} \times 100 \qquad \text{(Eq3)} \]
\[ MD = \frac{1}{n}\sum_{i=1}^{n}\left(\hat{y}_i - y_i\right) \qquad \text{(Eq4)} \]
\[ MAE = \frac{1}{n}\sum_{i=1}^{n}\left|\hat{y}_i - y_i\right| \qquad \text{(Eq5)} \] \[ MAE\% = \frac{MAE}{\bar{y}} \times 100 \qquad \text{(Eq6)} \]
## n model part r2 rmse rmseR bias biasR mae
## 1 7 RF_ranger train 0.9034344 15.60472 14.36903 0.1845695 0.169954 14.35551
## 2 3 RF_ranger test 0.3100952 27.67790 23.22882 -7.2091154 -6.050287 23.34014
## maeR
## 1 13.21873
## 2 19.58833
Wall-to-wall predictor tiles are generated by computing
pixel_metrics() per catalog chunk. The chunk buffer reduces
edge effects; outputs are cropped back to the core chunk extent to
remove overlap.
metrics_chunk <- function(chunk, res, start, max_h) {
las <- lidR::readLAS(chunk)
if (is.null(las) || lidR::is.empty(las)) return(NULL)
las <- lidR::filter_poi(las, Z >= 0 & Z <= max_h)
if (lidR::is.empty(las)) return(NULL)
m <- lidR::pixel_metrics(
las = las,
func = ~f_metrics(X, Y, Z, Intensity),
res = res,
start = start,
pkg = "terra",
geom = "bbox"
)
if (is.null(m)) return(NULL)
# Crop to chunk extent (remove buffer overlap)
m <- terra::crop(m, terra::ext(chunk))
m
}
message(" computing METRICS tiles with catalog_apply()...")
lidR::opt_output_files(ctg_norm) <- file.path(dir_metrics, "METRICS_{XLEFT}_{YBOTTOM}")
lidR::opt_progress(ctg_norm) <- FALSE
options_engine_metrics <- list(
need_output_file = TRUE,
need_buffer = TRUE,
automerge = FALSE,
overwrite = TRUE
)
metrics_files <- lidR::catalog_apply(
ctg_norm,
metrics_chunk,
res = metrics_res,
start = metrics_start,
max_h = max_norm_height,
.options = options_engine_metrics
)
metrics_files <- unlist(metrics_files, use.names = FALSE)
metrics_files <- metrics_files[file.exists(metrics_files)]
if (length(metrics_files) == 0) stop("No METRICS tiles were written.")
message("METRICS tiles written to: ", dir_metrics)Each metrics tile is read, aligned to the model predictors (drop excluded metrics; pad missing predictors with NA), and predicted using the trained RF model. Outputs are saved as AGB GeoTIFF tiles.
message("Step F2: predicting AGB tiles from METRICS tiles using terra::predict()...")
agb_files <- character(0)
#remind this information #urgent
target = "y"
features= pred_cols
for (f in metrics_files) {
metrics_r <- terra::rast(f)
# Drop excluded metric bands if present (e.g., TotalReturns)
drop_now <- intersect(metrics_to_exclude, names(metrics_r))
if (length(drop_now) > 0) {
metrics_r <- metrics_r[[setdiff(names(metrics_r), drop_now)]]
}
# Pad missing predictors (if needed)
missing <- setdiff(features, names(metrics_r))
if (length(missing) > 0) {
na_layer <- metrics_r[[1]]
na_layer[] <- NA_real_
for (m in missing) {
metrics_r <- c(metrics_r, na_layer)
names(metrics_r)[terra::nlyr(metrics_r)] <- m
}
}
x_r <- metrics_r[[features]]
names(x_r) <- features
out_path <- file.path(dir_pred, paste0("AGB_", tools::file_path_sans_ext(basename(f)), ".tif"))
terra::predict(
x_r,
model = rf_fit,
na.rm = TRUE,
cores = 1,
filename = out_path,
overwrite = TRUE
)
agb_files <- c(agb_files, out_path)
}
agb_files <- agb_files[file.exists(agb_files)]
if (length(agb_files) == 0) stop("No AGB tiles were written from metrics.")
message("AGB tiles written to: ", dir_pred)All predicted tiles are mosaicked into a single AGB map. The mosaic is exported as GeoTIFF and visualised as a PNG figure.
agb_coll <- sprc(agb_files)
agb_mosaic <- mosaic(agb_coll, fun = "mean")
# Save final mosaic GeoTIFF (single root folder)
out_mosaic_tif <- file.path(out_dir, "AGB_40m_mosaic.tif")
writeRaster(agb_mosaic, out_mosaic_tif, overwrite = TRUE)
message("Final AGB mosaic saved: ", out_mosaic_tif)
# ggplot map
df_pred <- as.data.frame(agb_mosaic, xy = TRUE, na.rm = TRUE)
names(df_pred) <- c("x", "y", "AGB")
p_agb <- ggplot(df_pred) +
geom_raster(aes(x = x, y = y, fill = AGB)) +
scale_fill_viridis_c(option = "C", name = "AGB (Mg/ha)") +
coord_equal() +
labs(title = "Predicted Aboveground Biomass", x = NULL, y = NULL) +
theme_minimal(base_size = 13) +
theme(
panel.grid = element_blank(),
plot.title = element_text(face = "bold"),
legend.position = "right"
)
print(p_agb)# Save ggplot figure (single root folder)
out_png <- file.path(out_dir, "AGB_40m_mosaic.png")
ggsave(out_png, plot = p_agb, width = 8, height = 6, dpi = 400)
message("AGB mosaic figure saved: ", out_png)
# Save tile list (helpful for QA)
writeLines(agb_files, con = file.path(out_dir, "AGB_tile_list.txt"))
message("ALL outputs are under: ", out_dir)LAScatalog enables scalable processing for large
AOIs.
Chunk-based reading, buffering, and tile outputs allow wall-to-wall
mapping across hundreds of files.
Training–mapping consistency is essential.
Using the same f_metrics() function for plot metrics and
wall-to-wall metrics improves robustness and reduces predictor
mismatch.
Most mapping artefacts come from predictor rasters and
edge handling.
Buffers, cropping to chunk extent, and quality checks of
metric/prediction tiles are critical before interpretation.