Completing missing prevalence rates for multiple chronic diseases by jointly leveraging both intra- and inter-disease population health data correlations

Yujie Feng, Jiangtao Wang, Yasha Wang, Sumi Helal

    Research output: Chapter in Book/Report/Conference proceedingConference proceedingpeer-review

    2 Citations (Scopus)
    114 Downloads (Pure)


    Population health data are becoming more and more publicly available on the Internet than ever before. Such datasets offer a great potential for enabling a better understanding of the health of populations, and inform health professionals and policy makers for better resource planning, disease management and prevention across different regions. However, due to the laborious and high-cost nature of collecting such public health data, it is a common place to find many missing entries on these datasets, which challenges the utility of the data and hinders reliable analysis and understanding. To tackle this problem, this paper proposes a deep-learning-based approach, called Compressive Population Health (CPH), to infer and recover (to complete) the missing prevalence rate entries of multiple chronic diseases. The key insight of CPH relies on the combined exploitation of both intra-disease and inter-disease correlation opportunities. Specifically, we first propose a Convolutional Neural Network (CNN) based approach to extract and model both of these two types of correlations, and then adopt a Generative Adversarial Network (GAN) based prevalence inference model to jointly fuse them to facility the prevalence rates data recovery of missing entries. We extensively evaluate the inference model based on real-world public health datasets publicly available on the Web. Results show that our inference method outperforms other baseline methods in various settings and with a significantly improved accuracy (from 14.8% to 9.1%).

    Original languageEnglish
    Title of host publicationThe Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021
    Editors Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, Leila Zia
    PublisherAssociation for Computing Machinery, Inc
    Number of pages11
    ISBN (Electronic)9781450383127
    Publication statusPublished - 19 Apr 2021
    Event2021 World Wide Web Conference - Ljubljana, Slovenia
    Duration: 19 Apr 202123 Apr 2021

    Publication series

    NameProceedings of the Web Conference 2021


    Conference2021 World Wide Web Conference
    Abbreviated titleWWW 2021

    Bibliographical note

    This paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.

    Funding Information:
    This work was supported by NSFC (National Natural Science Foundation of China) under Grant No. 61872010, the National Science and Technology Major Project (No. 2018ZX10201002), and the Project 2019BD005 supported by PKU-Baidu fund.

    Publisher Copyright:
    © 2021 ACM.


    • Generative adversarial network
    • Missing data recovery
    • Population health

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Software


    Dive into the research topics of 'Completing missing prevalence rates for multiple chronic diseases by jointly leveraging both intra- and inter-disease population health data correlations'. Together they form a unique fingerprint.

    Cite this