TY - JOUR
T1 - The influence of pH and temperature on benthic chlorophyll-a
T2 - insights from SHAP-XGBoost and random forest models
AU - Khan, Sangar
AU - Juvigny-Khenafou, Noël P.D.
AU - Dalu, Tatenda
AU - Milham, Paul J.
AU - Hamid, Yasir
AU - Eltohamy, Kamel Mohamed
AU - Ullah, Habib
AU - Amiri, Bahman Jabbarian
AU - Chen, Hao
AU - Wu, Naicheng
PY - 2025
Y1 - 2025
N2 - Biological threats to river health relate to algal biomass, for which benthic chlorophyll–a (chl–a) is an indicator; consequently, predicting chl–a helps understand ecosystem dynamics. There is little information on machine learning predictive models of benthic chl–a and input parameters in lotic ecosystems, and to fill this gap, we predict benthic chl–a levels in China's Thousand Islands Lake (TIL) watershed using machine learning algorithms. Water samples for nutrient and metal analysis were collected across 147 sites in the TIL catchment. We employed Random Forest (RF), eXtreme gradient boosting (XGBoost) and SHAP-enhanced eXtreme gradient boosting (SHAP XGBoost) models, alongside Support Vector Regression (SVR), to predict chl–a levels in diverse reaches and identify the key determinants. The XGBoost outperformed the RF model in the test, training and validation datasets. In the SHAP XGBoost, pH was the most important characteristic, followed by mean average temperature (AT). The SVR demonstrated that AT is vital for the upper and middle catchment reaches, while pH is more important in the lower reaches. In partial dependence plots, the chl–a concentration depended highly on pH and AT. High pH and AT released P from stream colloids, lowered colloid adsorption, increasing chl–a concentration. We concluded that the SHAP XGBoost model could be used to identify the key determinants of chl–a from chemical and physical variables in the lotic system.
AB - Biological threats to river health relate to algal biomass, for which benthic chlorophyll–a (chl–a) is an indicator; consequently, predicting chl–a helps understand ecosystem dynamics. There is little information on machine learning predictive models of benthic chl–a and input parameters in lotic ecosystems, and to fill this gap, we predict benthic chl–a levels in China's Thousand Islands Lake (TIL) watershed using machine learning algorithms. Water samples for nutrient and metal analysis were collected across 147 sites in the TIL catchment. We employed Random Forest (RF), eXtreme gradient boosting (XGBoost) and SHAP-enhanced eXtreme gradient boosting (SHAP XGBoost) models, alongside Support Vector Regression (SVR), to predict chl–a levels in diverse reaches and identify the key determinants. The XGBoost outperformed the RF model in the test, training and validation datasets. In the SHAP XGBoost, pH was the most important characteristic, followed by mean average temperature (AT). The SVR demonstrated that AT is vital for the upper and middle catchment reaches, while pH is more important in the lower reaches. In partial dependence plots, the chl–a concentration depended highly on pH and AT. High pH and AT released P from stream colloids, lowered colloid adsorption, increasing chl–a concentration. We concluded that the SHAP XGBoost model could be used to identify the key determinants of chl–a from chemical and physical variables in the lotic system.
KW - Dissolved inorganic phosphorus
KW - Freshwater
KW - Periphyton
KW - Random forest
KW - SHAP XGBoost
UR - http://www.scopus.com/inward/record.url?scp=105013251617&partnerID=8YFLogxK
U2 - 10.1016/j.ecoinf.2025.103355
DO - 10.1016/j.ecoinf.2025.103355
M3 - Article
AN - SCOPUS:105013251617
SN - 1574-9541
VL - 91
JO - Ecological Informatics
JF - Ecological Informatics
M1 - 103355
ER -