We study the problem of succinctly summarizing a database of event sequences in terms of generalized sequential patterns. That is, we are interested in patterns that are not exclusively defined over observed surface-level events, as is usual, but rather may additionally include generalized events that can match a set of events. To avoid spurious and redundant results we define the problem in terms of the Minimum Description Length principle, by which we are after that set of patterns and generalizations that together best compress the data without loss. The resulting optimization problem does not lend itself for exact search, which is why we propose the heuristic Flock algorithm to efficiently find high-quality models in practice. Extensive experiments on synthetic and real-world data show that Flock results in compact and easily interpretable models that accurately recover the ground truth, including rare instances of generalized patterns. Additionally Flock recovers how generalized events within patterns depend on each other, and overall provides clearer insight into the data-generating process than using state of the art algorithms that only consider surface-level patterns.
History
Primary Research Area
Trustworthy Information Processing
Name of Conference
ACM International Conference on Knowledge Discovery and Data Mining (KDD)
Journal
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Page Range
348-357
Publisher
Association for Computing Machinery (ACM)
Open Access Type
Not Open Access
BibTeX
@conference{Cüppers:Vreeken:2023,
title = "Below the Surface: Summarizing Event Sequences with Generalized Sequential Patterns",
author = "Cüppers, Joscha" AND "Vreeken, Jilles",
year = 2023,
month = 8,
journal = "Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining",
pages = "348--357",
publisher = "Association for Computing Machinery (ACM)",
doi = "10.1145/3580305.3599264"
}