A Neighborhood-Similarity-Based Imputation Algorithm for Healthcare Data Sets: A Comparative Study

Colin Wilcox, Vasileios Giagos, Soufiene Djahel

Research output: Contribution to journalArticlepeer-review

6 Downloads (Pure)


The increasing computerisation of medical services has highlighted inconsistencies in the way in which patients’ historic medical data were recorded. Differences in process and practice between medical services and facilities have led to many incomplete and inaccurate medical histories being recorded. To create a single point of truth going forward, it is necessary to correct these inconsistencies. A common way to do this has been to use imputation techniques to predict missing data values based on the known values in the data set. In this paper, we propose a neighborhood similarity measure-based imputation technique and analyze its achieved prediction accuracy in comparison with a number of traditional imputation methods using both an incomplete anonymized diabetes medical data set and a number of simulated data sets as the sources of our data. The aim is to determine whether any improvement could be made in the accuracy of predicting a diabetes diagnosis using the known outcomes of the diabetes patients’ data set. The obtained results have proven the effectiveness of our proposed approach compared to other state-of-the-art single-pass imputation techniques.

Original languageEnglish
Article number4809
Number of pages18
Issue number23
Publication statusPublished - 28 Nov 2023

Bibliographical note

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).


  • healthcare
  • imputation algorithms
  • incomplete data
  • neighborhood similarity

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Signal Processing
  • Hardware and Architecture
  • Computer Networks and Communications
  • Electrical and Electronic Engineering


Dive into the research topics of 'A Neighborhood-Similarity-Based Imputation Algorithm for Healthcare Data Sets: A Comparative Study'. Together they form a unique fingerprint.

Cite this