TY - JOUR
T1 - Developing machine learning models with multi-source environmental data to predict wheat yield in China
AU - Li, Linchao
AU - Wang, Bin
AU - Feng, Puyu
AU - Li Liu, De
AU - He, Qinsi
AU - Zhang, Yajie
AU - Wang, Yakai
AU - Li, Siyi
AU - Lu, Xiaoliang
AU - Yue, Chao
AU - Li, Yi
AU - He, Jianqiang
AU - Feng, Hao
AU - Yang, Guijun
AU - Yu, Qiang
N1 - Publisher Copyright:
© 2022
PY - 2022/3
Y1 - 2022/3
N2 - Crop yield is controlled by different environmental factors. Multi-source data for site-specific soils, climates, and remotely sensed vegetation indices are essential for yield prediction. Algorithms of data-model fusion for crop growth monitoring and yield prediction are complicated and need to be optimized to deal with model uncertainty. This study integrated multi-source environmental variables (e.g., satellite-based vegetation indices, climate data, and soil properties) into random forest (RF) and support vector machine (SVM) models for wheat yield prediction in China. The performance of both RF and SVM models was investigated using different types of vegetation indices associated with other predictors. Relative importance and partial dependence analyses were used to identify the main predictors and their relationships with wheat yield. We found that using remotely sensed vegetation indices improved our model precision, and that near-infrared reflectance of terrestrial vegetation (NIRv) was slightly better than normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) in predicting yield. NIRv was better in detecting climate stress on crops, and could capture more information regarding crop growth and yield formation. Compared with the SVM model, the RF model with NIRv and other covariates had better performance in wheat yield prediction, with R2 and RMSE being 0.74 and 758 kg/ha respectively. We also found that NIRv from jointing to heading was the most important predictor in determining yield, followed by solar radiation (especially during tillering–heading), relative humidity (during planting–tillering), soil organic carbon, and wind speed (throughout the growing season). In addition, wheat yield exhibited threshold-like responses to most factors based on our RF model. These threshold values can help to better understand how different environmental factors limit wheat yield, which will provide useful information for climate-adaptive crop management. Our findings demonstrated the potential of using NIRv for yield prediction. This approach is broadly applicable to other regions globally using publicly available data.
AB - Crop yield is controlled by different environmental factors. Multi-source data for site-specific soils, climates, and remotely sensed vegetation indices are essential for yield prediction. Algorithms of data-model fusion for crop growth monitoring and yield prediction are complicated and need to be optimized to deal with model uncertainty. This study integrated multi-source environmental variables (e.g., satellite-based vegetation indices, climate data, and soil properties) into random forest (RF) and support vector machine (SVM) models for wheat yield prediction in China. The performance of both RF and SVM models was investigated using different types of vegetation indices associated with other predictors. Relative importance and partial dependence analyses were used to identify the main predictors and their relationships with wheat yield. We found that using remotely sensed vegetation indices improved our model precision, and that near-infrared reflectance of terrestrial vegetation (NIRv) was slightly better than normalized difference vegetation index (NDVI) and enhanced vegetation index (EVI) in predicting yield. NIRv was better in detecting climate stress on crops, and could capture more information regarding crop growth and yield formation. Compared with the SVM model, the RF model with NIRv and other covariates had better performance in wheat yield prediction, with R2 and RMSE being 0.74 and 758 kg/ha respectively. We also found that NIRv from jointing to heading was the most important predictor in determining yield, followed by solar radiation (especially during tillering–heading), relative humidity (during planting–tillering), soil organic carbon, and wind speed (throughout the growing season). In addition, wheat yield exhibited threshold-like responses to most factors based on our RF model. These threshold values can help to better understand how different environmental factors limit wheat yield, which will provide useful information for climate-adaptive crop management. Our findings demonstrated the potential of using NIRv for yield prediction. This approach is broadly applicable to other regions globally using publicly available data.
KW - NIRv
KW - Random forest
KW - Support vector machine
KW - Vegetation indices
KW - Wheat
KW - Yield prediction
UR - http://www.scopus.com/inward/record.url?scp=85124797314&partnerID=8YFLogxK
U2 - 10.1016/j.compag.2022.106790
DO - 10.1016/j.compag.2022.106790
M3 - Article
AN - SCOPUS:85124797314
SN - 0168-1699
VL - 194
JO - Computers and Electronics in Agriculture
JF - Computers and Electronics in Agriculture
M1 - 106790
ER -