CISPA
Browse

Sets of Robust Rules, and How to Find Them

Download (1.44 MB)
conference contribution
posted on 2023-11-29, 18:10 authored by Jonas Fischer, Jilles VreekenJilles Vreeken
Association rules are among the most important concepts in data mining. Rules of the form X → Y are simple to understand, simple to act upon, yet can model important local dependencies in data. The problem is, however, that there are so many of them. Both traditional and state-of-the-art frameworks typically yield millions of rules, rather than identifying a small set of rules that capture the most important dependencies of the data. In this paper, we define the problem of association rule mining in terms of the Minimum Description Length principle. That is, we identify the best set of rules as the one that most succinctly describes the data. We show that the resulting optimization problem does not lend itself for exact search, and hence propose Grab, a greedy heuristic to efficiently discover good sets of rules directly from data. Through an extensive set of experiments we show that, unlike the state-of-the-art, Grab does reliably recover the ground truth. On real world data we show it finds reasonable numbers of rules, that upon close inspection give clear insight in the local distribution of the data.

History

Preferred Citation

Jonas Fischer and Jilles Vreeken. Sets of Robust Rules, and How to Find Them. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database (ECML PKDD). 2019.

Primary Research Area

  • Empirical and Behavioral Security

Name of Conference

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database (ECML PKDD)

Legacy Posted Date

2019-06-23

Open Access Type

  • Unknown

BibTeX

@inproceedings{cispa_all_2924, title = "Sets of Robust Rules, and How to Find Them", author = "Fischer, Jonas and Vreeken, Jilles", booktitle="{European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Database (ECML PKDD)}", year="2019", }

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC