TY - JOUR
T1 - Dynamic swarm class rebalancing for the process mining of rare events
AU - Li, Jinyan
AU - Wu, Yaoyang
AU - Fong, Simon
AU - Wong, Raymond K.
AU - Chu, Victor W.
AU - Ong, Kok‑leong
AU - Wong, Kelvin K. L.
PY - 2021
Y1 - 2021
N2 - Process mining is becoming an indispensable method in workflow model reconstructions, offering insights into mission critical systems. The efficacy of process mining depends on whether the underlying data mining algorithms can accurately classify or predict future events from process logs. However, exceptional events are scarce in most operational processes. Hence, the process logs generated from these processes are highly imbalanced. It is quite often that any model learned from imbalanced data tends to be overly generalized toward the normal classes but under-trained to recognize the rare classes. In this paper, we propose 3 methods to rectify this class imbalance problem. They are founded upon a meta-heuristic-swarm intelligence algorithm. The first method, and also the base of the remaining 2 methods, is Dynamic Multi-objective Rebalancing Algorithm, which considers both high accuracy and high confidence level of classification in its objective function, and it is draw upon the particle swarm optimization algorithm. The other two algorithms are hybrid methods by combining the first base method with over-sampling and under-sampling techniques. Experiments are conducted using the three above-mentioned methods to process rebalanced dataset, as well as using other classic resampling methods for comparison. According to the results, our proposed methods show satisfactory performance over other comparison methods, and we extracted meaningful decision rules from a rebalanced dataset in process mining.
AB - Process mining is becoming an indispensable method in workflow model reconstructions, offering insights into mission critical systems. The efficacy of process mining depends on whether the underlying data mining algorithms can accurately classify or predict future events from process logs. However, exceptional events are scarce in most operational processes. Hence, the process logs generated from these processes are highly imbalanced. It is quite often that any model learned from imbalanced data tends to be overly generalized toward the normal classes but under-trained to recognize the rare classes. In this paper, we propose 3 methods to rectify this class imbalance problem. They are founded upon a meta-heuristic-swarm intelligence algorithm. The first method, and also the base of the remaining 2 methods, is Dynamic Multi-objective Rebalancing Algorithm, which considers both high accuracy and high confidence level of classification in its objective function, and it is draw upon the particle swarm optimization algorithm. The other two algorithms are hybrid methods by combining the first base method with over-sampling and under-sampling techniques. Experiments are conducted using the three above-mentioned methods to process rebalanced dataset, as well as using other classic resampling methods for comparison. According to the results, our proposed methods show satisfactory performance over other comparison methods, and we extracted meaningful decision rules from a rebalanced dataset in process mining.
UR - https://hdl.handle.net/1959.7/uws:61043
U2 - 10.1007/s11227-020-03500-x
DO - 10.1007/s11227-020-03500-x
M3 - Article
SN - 0920-8542
VL - 77
SP - 7549
EP - 7583
JO - Journal of Supercomputing
JF - Journal of Supercomputing
IS - 7
ER -