Abstract
Missing values are ubiquitous in air pollution datasets as the data is being collected through sensors. Preprocessing these data plays a vital role in obtaining accurate results in the downstream analyses. This task becomes even more challenging as time is an implicit variable that cannot be ignored. Existing methods that deal with missing data in time series perform reasonably well in situations where the percentage of missing values is relatively low and the gap size is small. However, the need for the development of robust methods, particularly for large gaps, is still persistent. This paper proposes a novel algorithm (FBReg) to impute univariate air pollution variables by applying a bi-directional method based on regularized regression models. The performance of the method is evaluated against two baseline models, Mean imputation and Last observation carried forward (LOCF), as well as two well- established methods, Kalman smoothing on structural time series models and Kalman smoothing on ARIMA (Auto-Regressive Integrated Moving Average) models. The proposed algorithm outperforms the considered methods and exhibits consistent performance with exponentially distributed missing values under the MCAR (Missing Completely at Random) mechanism, as well as with large gaps.
Original language | English |
---|---|
Title of host publication | Proceedings of the 33rd IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2021), Virtual Conference, 1-3 November 2021 |
Publisher | IEEE |
Pages | 996-1001 |
Number of pages | 6 |
ISBN (Print) | 9781665408981 |
DOIs | |
Publication status | Published - 2021 |
Event | International Conference on Tools with Artificial Intelligence - Duration: 1 Nov 2021 → … |
Publication series
Name | |
---|---|
ISSN (Print) | 2375-0197 |
Conference
Conference | International Conference on Tools with Artificial Intelligence |
---|---|
Period | 1/11/21 → … |
Bibliographical note
Publisher Copyright:© 2021 IEEE.