TY - GEN
T1 - Machine Learning for Enhanced Underwriting: Predicting Premiums in Health Insurance
AU - Abdelshafy, Nada
AU - Abdelshafy, Mohamed
AU - Ghazanfar, Mustansar Ali
AU - Saglam, Rahime Belen
PY - 2025/4/9
Y1 - 2025/4/9
N2 - In the evolving field of Insurance, Machine Learning (ML) has emerged as a transformative tool for enhancing the accuracy and efficiency of insurance underwriting, particularly in health insurance, a sector experiencing increased demand post-pandemic. This research develops an ML model aimed at improving premium prediction within this domain. Utilizing a health insurance database from Kaggle, we conducted a comparative analysis of several ML models - XGBoost, Random Forest, Neural Network Regression, and a baseline Linear Regression - to establish performance benchmarks. Our methodology included rigorous data analysis and application of ML model optimization techniques such as feature engineering and selection, hyperparameter tuning through Grid Search, Random Search, and Bayesian Optimization, as well as overfitting prevention strategies like Pruning and Early Stopping. The XGBoost model demonstrated superior performance, achieving a Mean Absolute Error (MAE) of 2582.3, Root Mean Square Error (RMSE) of 4625.0, and an R2 value of 86.59%. This research not only advances the application of ML in predictive premium pricing but also provides a structured approach for future studies in the insurance industry's underwriting processes.
AB - In the evolving field of Insurance, Machine Learning (ML) has emerged as a transformative tool for enhancing the accuracy and efficiency of insurance underwriting, particularly in health insurance, a sector experiencing increased demand post-pandemic. This research develops an ML model aimed at improving premium prediction within this domain. Utilizing a health insurance database from Kaggle, we conducted a comparative analysis of several ML models - XGBoost, Random Forest, Neural Network Regression, and a baseline Linear Regression - to establish performance benchmarks. Our methodology included rigorous data analysis and application of ML model optimization techniques such as feature engineering and selection, hyperparameter tuning through Grid Search, Random Search, and Bayesian Optimization, as well as overfitting prevention strategies like Pruning and Early Stopping. The XGBoost model demonstrated superior performance, achieving a Mean Absolute Error (MAE) of 2582.3, Root Mean Square Error (RMSE) of 4625.0, and an R2 value of 86.59%. This research not only advances the application of ML in predictive premium pricing but also provides a structured approach for future studies in the insurance industry's underwriting processes.
KW - Feature Selection and Engineering
KW - Health Insurance
KW - Hyperparameter Tuning
KW - InsurTech
KW - Insurance Underwriting
KW - Machine Learning
KW - Neural Network
KW - Premium Prediction
KW - Random Forest
KW - Regression
KW - XGBoost
UR - https://www.scopus.com/pages/publications/105003409913
U2 - 10.1109/raai64504.2024.10949528
DO - 10.1109/raai64504.2024.10949528
M3 - Conference proceeding
SN - 9798331520045
T3 - 2024 4th International Conference on Robotics, Automation and Artificial Intelligence (RAAI)
SP - 330
EP - 339
BT - 2024 4th International Conference on Robotics, Automation and Artificial Intelligence (RAAI)
PB - IEEE
T2 - 4th International Conference on Robotics, Automation and Artificial Intelligence (RAAI)
Y2 - 19 December 2024 through 21 December 2024
ER -