CISPA
Browse

Explainable Data Decompositions

Download (558.85 kB)
conference contribution
posted on 2023-11-29, 18:12 authored by Sebastian Dalleiger, Jilles VreekenJilles Vreeken
Our goal is to discover the components of a dataset, characterize why we deem these components, explain how these components are different from each other, as well as identify what properties they share among each other. As is usual, we consider regions in the data to be components if they show significantly different distributions. What is not usual, however, is that we parameterize these distributions with patterns that are informative for one or more components. We do so because these patterns allow us to characterize what is going on in our data as well as explain our decomposition. We define the problem in terms of a regularized maximum likelihood, in which we use the Maximum Entropy principle to model each data component with a set of patterns. As the search space is large and unstructured, we propose the deterministic Disc algorithm to efficiently discover high-quality decompositions via an alternating optimization approach. Empirical evaluation on synthetic and real-world data shows that efficiently discovers meaningful components and accurately characterises these in easily understandable terms.

History

Preferred Citation

Sebastian Dalleiger and Jilles Vreeken. Explainable Data Decompositions. In: National Conference of the American Association for Artificial Intelligence (AAAI). 2020.

Primary Research Area

  • Trustworthy Information Processing

Secondary Research Area

  • Empirical and Behavioral Security

Name of Conference

National Conference of the American Association for Artificial Intelligence (AAAI)

Legacy Posted Date

2019-12-09

Open Access Type

  • Unknown

BibTeX

@inproceedings{cispa_all_3006, title = "Explainable Data Decompositions", author = "Dalleiger, Sebastian and Vreeken, Jilles", booktitle="{National Conference of the American Association for Artificial Intelligence (AAAI)}", year="2020", }

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC