CISPA
Browse
cispa_all_2912.pdf (2.21 MB)

Generating Realistic Synthetic Population Datasets

Download (2.21 MB)
journal contribution
posted on 2023-11-29, 18:07 authored by Hao Wu, Yue Ning, Prithwish Chakraborty, Jilles VreekenJilles Vreeken, Nikolaj Tatti, Naren Ramakrishnan
Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An e�cient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and US census datasets, and demonstrate its feasibility using an epidemic simulation application.

History

Preferred Citation

Hao Wu, Yue Ning, Prithwish Chakraborty, Jilles Vreeken, Nikolaj Tatti and Naren Ramakrishnan. Generating Realistic Synthetic Population Datasets. In: ACM Transactions on Knowledge Discovery and Data Mining. 2018.

Primary Research Area

  • Trustworthy Information Processing

Legacy Posted Date

2019-06-07

Journal

ACM Transactions on Knowledge Discovery and Data Mining

Pages

1 - 45

Open Access Type

  • Unknown

Sub Type

  • Article

BibTeX

@article{cispa_all_2912, title = "Generating Realistic Synthetic Population Datasets", author = "Wu, Hao and Ning, Yue and Chakraborty, Prithwish and Vreeken, Jilles and Tatti, Nikolaj and Ramakrishnan, Naren", journal="{ACM Transactions on Knowledge Discovery and Data Mining}", year="2018", }

Usage metrics

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC