CISPA
Browse
3580305.3599264.pdf (1.12 MB)

Below the Surface: Summarizing Event Sequences with Generalized Sequential Patterns

Download (1.12 MB)
conference contribution
posted on 2024-04-04, 11:14 authored by Joscha CüppersJoscha Cüppers, Jilles VreekenJilles Vreeken
We study the problem of succinctly summarizing a database of event sequences in terms of generalized sequential patterns. That is, we are interested in patterns that are not exclusively defined over observed surface-level events, as is usual, but rather may additionally include generalized events that can match a set of events. To avoid spurious and redundant results we define the problem in terms of the Minimum Description Length principle, by which we are after that set of patterns and generalizations that together best compress the data without loss. The resulting optimization problem does not lend itself for exact search, which is why we propose the heuristic Flock algorithm to efficiently find high-quality models in practice. Extensive experiments on synthetic and real-world data show that Flock results in compact and easily interpretable models that accurately recover the ground truth, including rare instances of generalized patterns. Additionally Flock recovers how generalized events within patterns depend on each other, and overall provides clearer insight into the data-generating process than using state of the art algorithms that only consider surface-level patterns.

History

Primary Research Area

  • Trustworthy Information Processing

Name of Conference

ACM International Conference on Knowledge Discovery and Data Mining (KDD)

Journal

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

Page Range

348-357

Publisher

Association for Computing Machinery (ACM)

Open Access Type

  • Not Open Access

BibTeX

@conference{Cüppers:Vreeken:2023, title = "Below the Surface: Summarizing Event Sequences with Generalized Sequential Patterns", author = "Cüppers, Joscha" AND "Vreeken, Jilles", year = 2023, month = 8, journal = "Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining", pages = "348--357", publisher = "Association for Computing Machinery (ACM)", doi = "10.1145/3580305.3599264" }

Usage metrics

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC