CISPA
Browse
- No file added yet -

Discovering Reliable Correlations in Categorical Data

Download (278.61 kB)
conference contribution
posted on 2023-11-29, 18:11 authored by Panagiotis Mandros, Mario Boley, Jilles VreekenJilles Vreeken
In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through two case studies we confirm that our discovery framework identifies interesting and meaningful correlations.

History

Preferred Citation

Panagiotis Mandros, Mario Boley and Jilles Vreeken. Discovering Reliable Correlations in Categorical Data. In: IEEE International Conference on Data Mining (ICDM). 2019.

Primary Research Area

  • Algorithmic Foundations and Cryptography

Secondary Research Area

  • Empirical and Behavioral Security

Name of Conference

IEEE International Conference on Data Mining (ICDM)

Legacy Posted Date

2019-12-09

Open Access Type

  • Unknown

BibTeX

@inproceedings{cispa_all_3004, title = "Discovering Reliable Correlations in Categorical Data", author = "Mandros, Panagiotis and Boley, Mario and Vreeken, Jilles", booktitle="{IEEE International Conference on Data Mining (ICDM)}", year="2019", }

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC