TY - GEN
T1 - Air quality data pre-processing : a novel algorithm to impute missing values in univariate time series
AU - Wijesekara, Lakmini
AU - Liyanage, Liwan
PY - 2021
Y1 - 2021
N2 - ![CDATA[Missing values are ubiquitous in air pollution datasets as the data is being collected through sensors. Preprocessing these data plays a vital role in obtaining accurate results in the downstream analyses. This task becomes even more challenging as time is an implicit variable that cannot be ignored. Existing methods that deal with missing data in time series perform reasonably well in situations where the percentage of missing values is relatively low and the gap size is small. However, the need for the development of robust methods, particularly for large gaps, is still persistent. This paper proposes a novel algorithm (FBReg) to impute univariate air pollution variables by applying a bi-directional method based on regularized regression models. The performance of the method is evaluated against two baseline models, Mean imputation and Last observation carried forward (LOCF), as well as two well- established methods, Kalman smoothing on structural time series models and Kalman smoothing on ARIMA (Auto-Regressive Integrated Moving Average) models. The proposed algorithm outperforms the considered methods and exhibits consistent performance with exponentially distributed missing values under the MCAR (Missing Completely at Random) mechanism, as well as with large gaps.]]
AB - ![CDATA[Missing values are ubiquitous in air pollution datasets as the data is being collected through sensors. Preprocessing these data plays a vital role in obtaining accurate results in the downstream analyses. This task becomes even more challenging as time is an implicit variable that cannot be ignored. Existing methods that deal with missing data in time series perform reasonably well in situations where the percentage of missing values is relatively low and the gap size is small. However, the need for the development of robust methods, particularly for large gaps, is still persistent. This paper proposes a novel algorithm (FBReg) to impute univariate air pollution variables by applying a bi-directional method based on regularized regression models. The performance of the method is evaluated against two baseline models, Mean imputation and Last observation carried forward (LOCF), as well as two well- established methods, Kalman smoothing on structural time series models and Kalman smoothing on ARIMA (Auto-Regressive Integrated Moving Average) models. The proposed algorithm outperforms the considered methods and exhibits consistent performance with exponentially distributed missing values under the MCAR (Missing Completely at Random) mechanism, as well as with large gaps.]]
UR - https://hdl.handle.net/1959.7/uws:62401
U2 - 10.1109/ICTAI52525.2021.00159
DO - 10.1109/ICTAI52525.2021.00159
M3 - Conference Paper
SN - 9781665408981
SP - 996
EP - 1001
BT - Proceedings of the 33rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2021), Virtual Conference, 1-3 November 2021
PB - IEEE
T2 - International Conference on Tools with Artificial Intelligence
Y2 - 1 November 2021
ER -