Population health data are becoming more and more publicly available on the Internet than ever before. Such datasets offer a great potential for enabling a better understanding of the health of populations, and inform health professionals and policy makers for better resource planning, disease management and prevention across different regions. However, due to the laborious and high-cost nature of collecting such public health data, it is a common place to find many missing entries on these datasets, which challenges the utility of the data and hinders reliable analysis and understanding. To tackle this problem, this paper proposes a deep-learning-based approach, called Compressive Population Health (CPH), to infer and recover (to complete) the missing prevalence rate entries of multiple chronic diseases. The key insight of CPH relies on the combined exploitation of both intra-disease and inter-disease correlation opportunities. Specifically, we first propose a Convolutional Neural Network (CNN) based approach to extract and model both of these two types of correlations, and then adopt a Generative Adversarial Network (GAN) based prevalence inference model to jointly fuse them to facility the prevalence rates data recovery of missing entries. We extensively evaluate the inference model based on real-world public health datasets publicly available on the Web. Results show that our inference method outperforms other baseline methods in various settings and with a significantly improved accuracy (from 14.8% to 9.1%).
|Title of host publication||The Web Conference 2021 - Proceedings of the World Wide Web Conference, WWW 2021|
|Editors||Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, Leila Zia|
|Publisher||Association for Computing Machinery, Inc|
|Number of pages||11|
|Publication status||Published - 19 Apr 2021|
|Event||2021 World Wide Web Conference - Ljubljana, Slovenia|
Duration: 19 Apr 2021 → 23 Apr 2021
|Name||Proceedings of the Web Conference 2021|
|Conference||2021 World Wide Web Conference|
|Abbreviated title||WWW 2021|
|Period||19/04/21 → 23/04/21|
Bibliographical noteThis paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.
This work was supported by NSFC (National Natural Science Foundation of China) under Grant No. 61872010, the National Science and Technology Major Project (No. 2018ZX10201002), and the Project 2019BD005 supported by PKU-Baidu fund.
Â© 2021 ACM.
- Generative adversarial network
- Missing data recovery
- Population health
ASJC Scopus subject areas
- Computer Networks and Communications