TY - GEN
T1 - Finding critical thresholds for defining bursts in event logs
AU - Lahiri, Bibudh
AU - Akrotirianakis, Ioannis
AU - Moerchen, Fabian
PY - 2013
Y1 - 2013
N2 - A burst, i.e., an unusally high frequency of occurrence of an event in a time-window, is interesting in many monitoring applications that give rise to temporal data as it often indicates an abnormal activity. While the problem of detecting bursts from time-series data has been well addressed, the question of what choice of thresholds, on the number of events as well as on the window size, makes a window "unusally bursty" remains a relevant one. We consider the problem of finding critical values of both these thresholds. Since for most applications, we hardly have any apriori idea of what combination of thresholds is critical, the range of possible values for either threshold can be very large. We formulate finding the combination of critical thresholds as a two-dimensional search problem and design efficient deteministic and randomized divide-and-conquer heuristics. For the deterministic heuristic, we show that under some weak assumptions, the computational overhead is logarithmic in the sizes of the ranges. Under identical assumptions, the expected computational overhead of the randomized heuristic in the worst case is also logarithmic. Using data obtained from logs of medical equipment, we conduct extensive simulations that reinforce our theoretical results, and show that on average, the randomized heuristic beats its deteministic counterpart in practice.
AB - A burst, i.e., an unusally high frequency of occurrence of an event in a time-window, is interesting in many monitoring applications that give rise to temporal data as it often indicates an abnormal activity. While the problem of detecting bursts from time-series data has been well addressed, the question of what choice of thresholds, on the number of events as well as on the window size, makes a window "unusally bursty" remains a relevant one. We consider the problem of finding critical values of both these thresholds. Since for most applications, we hardly have any apriori idea of what combination of thresholds is critical, the range of possible values for either threshold can be very large. We formulate finding the combination of critical thresholds as a two-dimensional search problem and design efficient deteministic and randomized divide-and-conquer heuristics. For the deterministic heuristic, we show that under some weak assumptions, the computational overhead is logarithmic in the sizes of the ranges. Under identical assumptions, the expected computational overhead of the randomized heuristic in the worst case is also logarithmic. Using data obtained from logs of medical equipment, we conduct extensive simulations that reinforce our theoretical results, and show that on average, the randomized heuristic beats its deteministic counterpart in practice.
KW - Analytics for temporal data
KW - Massive data analytics: Algorithms
UR - https://www.scopus.com/pages/publications/84894177779
UR - https://www.scopus.com/inward/citedby.url?scp=84894177779&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-37574-3_4
DO - 10.1007/978-3-642-37574-3_4
M3 - Conference contribution
AN - SCOPUS:84894177779
SN - 9783642375736
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 89
EP - 112
BT - Transactions on Large-Scale Data- and Knowledge-Centered Systems VIII - Special Issue on Advances in Data Warehousing and Knowledge Discovery
T2 - 13th International Conference on Data Warehousing and Knowledge Discovery, DaWaK 2011
Y2 - 29 August 2011 through 2 September 2011
ER -