BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches

Pinakshi Panda, Sukant Kishoro Bisoy, Amrutanshu Panigrahi, Abhilash Pati, Bibhuprasad Sahu, Zheshan Guo, Haipeng Liu, Prince Jain

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)
3 Downloads (Pure)

Abstract

Background: Cancer rates are rising rapidly, causing global mortality. According to the World Health Organization (WHO), 9.9 million people died from cancer in 2020. Machine learning (ML) helps identify cancer early, reducing deaths. An ML-based cancer diagnostic model can use the patient’s genetic information, such as microarray data. Microarray data are high dimensional, which can degrade the performance of the ML-based models. For this, feature selection becomes essential. Methods: Swarm Optimization Algorithm (SSA), Improved Maximum Relevance and Minimum Redundancy (IMRMR), and Boruta form the basis of this work’s ML-based model BIMSSA. The BIMSSA model implements a pipelined feature selection method to effectively handle high-dimensional microarray data. Initially, Boruta and IMRMR were applied to extract relevant gene expression aspects. Then, SSA was implemented to optimize feature size. To optimize feature space, five separate machine learning classifiers, Support Vector Machine (SVM), Random Forest (RF), Extreme Learning Machine (ELM), AdaBoost, and XGBoost, were applied as the base learners. Then, majority voting was used to build an ensemble of the top three algorithms. The ensemble ML-based model BIMSSA was evaluated using microarray data from four different cancer types: Adult acute lymphoblastic leukemia and Acute myelogenous leukemia (ALL-AML), Lymphoma, Mixed-lineage leukemia (MLL), and Small round blue cell tumors (SRBCT). Results: In terms of accuracy, the proposed BIMSSA (Boruta + IMRMR + SSA) achieved 96.7% for ALL-AML, 96.2% for Lymphoma, 95.1% for MLL, and 97.1% for the SRBCT cancer datasets, according to the empirical evaluations. Conclusion: The results show that the proposed approach can accurately predict different forms of cancer, which is useful for both physicians and researchers.
Original languageEnglish
Article number1491602
Number of pages22
JournalFrontiers in Genetics
Volume15
Early online date6 Jan 2025
DOIs
Publication statusE-pub ahead of print - 6 Jan 2025

Bibliographical note

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by the Innovational Fund for Scientific and Technological Personnel of Hainan Province (Grant No. KJRC 2023L01), and the South China Sea Rising Star Project of Hainan Province.

FundersFunder number
South China Sea Rising Star Project of Hainan Province
Innovational Fund for Scientific and Technological Personnel of Hainan ProvinceKJRC 2023L01

    Keywords

    • microarray data
    • cancer prediction
    • feature selection
    • ensemble learning
    • swarm intelligence

    Fingerprint

    Dive into the research topics of 'BIMSSA: enhancing cancer prediction with salp swarm optimization and ensemble machine learning approaches'. Together they form a unique fingerprint.

    Cite this