We are interested in discovering those patterns from data with an empirical frequency that is significantly differently than expec- ted. To avoid spurious results, yet achieve high statistical power, we propose to sequentially control for false discoveries during the search. To avoid redundancy, we propose to update our expect- ations whenever we discover a significant pattern. To efficiently consider the exponentially sized search space, we employ an easy- to-compute upper bound on significance, and propose an effective search strategy for sets of significant patterns. Through an extens- ive set of experiments on synthetic data, we show that our method, Spass, recovers the ground truth reliably, does so efficiently, and without redundancy. On real-world data we show it works well on both single and multiple classes, on low and high dimensional data, and through case studies that it discovers meaningful results.
History
Preferred Citation
Sebastian Dalleiger and Jilles Vreeken. Discovering Significant Patterns under Sequential False Discovery Control. In: ACM International Conference on Knowledge Discovery and Data Mining (KDD). 2022.
Primary Research Area
Trustworthy Information Processing
Name of Conference
ACM International Conference on Knowledge Discovery and Data Mining (KDD)
Legacy Posted Date
2022-07-15
Open Access Type
Unknown
BibTeX
@inproceedings{cispa_all_3726,
title = "Discovering Significant Patterns under Sequential False Discovery Control",
author = "Dalleiger, Sebastian and Vreeken, Jilles",
booktitle="{ACM International Conference on Knowledge Discovery and Data Mining (KDD)}",
year="2022",
}