TY - JOUR
T1 - Probabilistic lossy counting
T2 - An efficient algorithm for finding heavy hitters
AU - Dimitropoulos, Xenofontas
AU - Hurley, Paul
AU - Kind, Andreas
PY - 2008
Y1 - 2008
N2 - Knowledge of the largest traffic flows in a network is important for many network management applications. The problem of finding these flows is known as the heavy-hitter problem and has been the subject of many studies in the past years. One of the most efficient and well-known algorithms for finding heavy hitters is lossy counting [29]. In this work we introduce probabilistic lossy counting (PLC), which enhances lossy counting in computing network traffic heavy hitters. PLC uses on a tighter error bound on the estimated sizes of traffic flows and provides probabilistic rather than deterministic guarantees on its accuracy. The probabilistic-based error bound substantially improves the memory consumption of the algorithm. In addition, PLC reduces the rate of false positives of lossy counting and achieves a low estimation error, although slightly higher than that of lossy counting. We compare PLC with state-of-the-art algorithms for finding heavy hitters. Our experiments using real traffic traces find that PLC has 1) between 34.4% and 74% lower memory consumption, 2) between 37.9% and 40.5% fewer false positives than lossy counting, and 3) a small estimation error.
AB - Knowledge of the largest traffic flows in a network is important for many network management applications. The problem of finding these flows is known as the heavy-hitter problem and has been the subject of many studies in the past years. One of the most efficient and well-known algorithms for finding heavy hitters is lossy counting [29]. In this work we introduce probabilistic lossy counting (PLC), which enhances lossy counting in computing network traffic heavy hitters. PLC uses on a tighter error bound on the estimated sizes of traffic flows and provides probabilistic rather than deterministic guarantees on its accuracy. The probabilistic-based error bound substantially improves the memory consumption of the algorithm. In addition, PLC reduces the rate of false positives of lossy counting and achieves a low estimation error, although slightly higher than that of lossy counting. We compare PLC with state-of-the-art algorithms for finding heavy hitters. Our experiments using real traffic traces find that PLC has 1) between 34.4% and 74% lower memory consumption, 2) between 37.9% and 40.5% fewer false positives than lossy counting, and 3) a small estimation error.
KW - Data streams
KW - Heavy hitters
UR - http://www.scopus.com/inward/record.url?scp=49049087701&partnerID=8YFLogxK
U2 - 10.1145/1341431.1341433
DO - 10.1145/1341431.1341433
M3 - Article
AN - SCOPUS:49049087701
SN - 0146-4833
VL - 38
SP - 5
EP - 16
JO - Computer Communications Review
JF - Computer Communications Review
IS - 1
ER -