CISPA
Browse
cispa_all_3413.pdf (810.19 kB)

Mining Easily Understandable Models from Complex Event Logs

Download (810.19 kB)
conference contribution
posted on 2023-11-29, 18:16 authored by Boris WiegandBoris Wiegand, Dietrich Klakow, Jilles VreekenJilles Vreeken
We consider the problem of discovering accurate, yet easily understandable graph-based models from complex event sequence data. Real-world event data, such as production logs, exhibit complex behaviors. These include sequences, choices, loops, optionals, and combinations thereof that make it hard to gain insight into what is going on, and how we can improve the process. Current approaches do not solve this problem satisfyingly, as their modeling language is too restricted to capture complex behavior or they return models that are still too difficult to understand. We formalize the problem in terms of the Minimum Description Length (MDL) principle, by which we say that the best model provides the shortest lossless description of the data. The resulting problem is NP-hard, and hence we propose the greedy PROSEQO algorithm to discover good models from data. PROSEQO iteratively simplifies the current description by removing nodes, edges, and applying patterns, until MDL tells us to stop. For whenever this result is still too complex, we propose PROSIMPLE, which iteratively removes further edges until we satisfy a user-specified threshold. Through an extensive set of experiments, we show both methods perform very well in practice. They return simple models that reconstruct the ground truth well, need only little data to do so, are robust against noise, and scale well. A case study shows that, unlike the state of the art, we discover easily understandable models that capture the key aspects of the data generation process.

History

Preferred Citation

Boris Wiegand, Dietrich Klakow and Jilles Vreeken. Mining Easily Understandable Models from Complex Event Logs. In: SIAM International Conference on Data Mining (SDM). 2021.

Primary Research Area

  • Algorithmic Foundations and Cryptography

Name of Conference

SIAM International Conference on Data Mining (SDM)

Legacy Posted Date

2021-05-09

Open Access Type

  • Green

BibTeX

@inproceedings{cispa_all_3413, title = "Mining Easily Understandable Models from Complex Event Logs", author = "Wiegand, Boris and Klakow, Dietrich and Vreeken, Jilles", booktitle="{SIAM International Conference on Data Mining (SDM)}", year="2021", }

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC