TY - JOUR
T1 - Regional flood frequency analysis using generalized additive models, random forest, and extreme gradient boosting for South-East Australia
AU - Pan, Xiao
AU - Yildirim, Gokhan
AU - Rahman, Ataur
AU - Ouarda, Taha B.M.J.
PY - 2026/1
Y1 - 2026/1
N2 - This study develops a new regional flood frequency analysis (RFFA) model using Generalized Additive Models (GAM), Random Forest (RF), and XGBoost (XG) within the Peaks Over Threshold (POT) modelling framework. These machine learning techniques attempt to overcome the limitations associated with traditional linear regression-based RFFA models by better capturing complexity in non-linear rainfall-runoff process. Analysing data from 145 catchments in south-east Australia, we assess each of three model’s ability to predict flood quantiles across various return periods. GAM is found to be superior in accuracy, with a median absolute relative error of 33%, compared to 37% for RF and 40% for XG. Spatial analysis shows GAM’s robustness, significantly reducing errors in regions with high stream densities. It is also found that RF and XG models tend to overestimate flood quantiles in catchments with high stream densities. This research demonstrates that the integration of advanced machine learning methods within the POT framework significantly enhances the accuracy of flood quantile estimation, supporting more resilient flood risk management and infrastructure planning in flood affected regions. The findings of this study will assist upgrading Australian Rainfall and Runoff (ARR) – the national guideline. Unlike prior POT-RFFA studies based on linear/regularised regressions (and AM/GEV-focused GAM/ML work), we provide the first comprehensive comparison of GAM, RF, and XGBoost in a POT framework across 12EY–10ARI, with consistent cross-validation and spatial error diagnostics for SE Australia.
AB - This study develops a new regional flood frequency analysis (RFFA) model using Generalized Additive Models (GAM), Random Forest (RF), and XGBoost (XG) within the Peaks Over Threshold (POT) modelling framework. These machine learning techniques attempt to overcome the limitations associated with traditional linear regression-based RFFA models by better capturing complexity in non-linear rainfall-runoff process. Analysing data from 145 catchments in south-east Australia, we assess each of three model’s ability to predict flood quantiles across various return periods. GAM is found to be superior in accuracy, with a median absolute relative error of 33%, compared to 37% for RF and 40% for XG. Spatial analysis shows GAM’s robustness, significantly reducing errors in regions with high stream densities. It is also found that RF and XG models tend to overestimate flood quantiles in catchments with high stream densities. This research demonstrates that the integration of advanced machine learning methods within the POT framework significantly enhances the accuracy of flood quantile estimation, supporting more resilient flood risk management and infrastructure planning in flood affected regions. The findings of this study will assist upgrading Australian Rainfall and Runoff (ARR) – the national guideline. Unlike prior POT-RFFA studies based on linear/regularised regressions (and AM/GEV-focused GAM/ML work), we provide the first comprehensive comparison of GAM, RF, and XGBoost in a POT framework across 12EY–10ARI, with consistent cross-validation and spatial error diagnostics for SE Australia.
KW - ARR
KW - Generalized additive model (GAM)
KW - Machine learning
KW - Peaks over threshold (POT)
KW - Random forest
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=105027728747&partnerID=8YFLogxK
U2 - 10.1007/s12665-025-12800-5
DO - 10.1007/s12665-025-12800-5
M3 - Article
AN - SCOPUS:105027728747
SN - 1866-6280
VL - 85
JO - Environmental Earth Sciences
JF - Environmental Earth Sciences
IS - 2
M1 - 67
ER -