Context-aware incremental clustering of alerts in monitoring systems

Lior Turgeman, Yaniv Avrashi, Gabriella Vagner, Nadeem Azaizah, Someshwar Katkar

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

The highly complex nature of today's modern hybrid IT applications continues to present an increasing challenge for operation teams relying on traditional monitoring approaches. In monitoring systems, incidents occur frequently due to a variety of causes, from updates to software and hardware, to changes in operation environment. These incidents could significantly degrade the system's availability and customers’ satisfaction. In many cases, investigating an incident in such an environment could feel like looking for a needle in a haystack - and you may not even know how the needle looks like until you see it. In that regard, one of the main challenges is how to efficiently analyze multiple sets of alert messages stemming from disparate monitoring tools and collectors across the application stack, in real-time. Such an analysis can provide trustworthy detection of system states at various critical points, thus helping teams to detect, frame, analyze and resolve incidents or failures in a relatively short time, especially if an accurate system's topological dependencies are absent. In this work, we suggest a new approach to determining relations among alerts – forming “events”. The suggested approach directly models the event's likelihood, by first embedding alerts’ corresponding metrics into a common latent space where the distance among metrics can be naturally defined, using a word2vec model, and then cluster alerts by employing a tailored incremental clustering algorithm. The suggested approach allows controlling the trade-off between the model's sensitivity to clusters’ noise-robustness, thus spanning a wide range of clustering mechanisms, as well as adapting clusters’ outcomes to the level and properties of the noise expected in input data.

Original languageEnglish
Article number118489
JournalExpert Systems with Applications
Volume210
DOIs
StatePublished - 30 Dec 2022

Bibliographical note

Publisher Copyright:
© 2022 Elsevier Ltd

Keywords

  • Alerts
  • Clustering
  • Embedding
  • Metric ID
  • Monitoring
  • Negative sampling
  • Pair-wise similarity
  • Skip-gram

Fingerprint

Dive into the research topics of 'Context-aware incremental clustering of alerts in monitoring systems'. Together they form a unique fingerprint.

Cite this