Abstract
As a fundamental component of the public health system, population health monitoring plays an important role in health policy-shaping. However, due to the high-cost nature of traditional data collection approaches, many sparse-sampling-completion algorithms are proposed to solve this problem. Existing data-completion methods are usually based on adjacent-spatial correlations, but this correlation isn't sufficient to ensure accurate inference when prevalence data for its neighboring areas are also missing due to cost constraints. To tackle this problem, we propose a novel deep-learning-based prevalence inference model called Spatial-attention and Demographic-augmented Generative Adversarial Imputation Network (SDA-GAIN). SDA-GAIN can improve accuracy by learning novel “health semantic space similarities” between cross-space areas. The key insight of SDA-GAIN is that we use the Transformer-based model to learn healthy semantic similarities between areas, and use the GAN-based model to make a high-accuracy completion. We further introduce demographic data to augment the model's ability to learn a better health semantic representation through using CNN. Extensive experiments show that SDA-GAIN outperforms other state-of-the-art approaches at low sampling rates (lower than 30%) which has a significant benefit on saving sampling costs. Also by visualizing the health semantic similarity learned by SDA-GAIN, the results are very similar to the real situation.
Original language | English |
---|---|
Pages (from-to) | (In-Press) |
Number of pages | 14 |
Journal | IEEE Transactions on Big Data |
Volume | (In-Press) |
Early online date | 8 Dec 2022 |
DOIs | |
Publication status | E-pub ahead of print - 8 Dec 2022 |
Keywords
- Population Health
- Missing Data Recovery
- Spatial Correlation
- Transformer
- Generative Adversarial Network