A new performance measure for class imbalance learning: Application to bioinformatics problems

Rukshan Batuwita, Vasile Palade

Research output: Chapter in Book/Report/Conference proceedingConference proceedingpeer-review

39 Citations (Scopus)

Abstract

In class imbalance learning, the performance measure used for the model selection would play a vital role. It has been well-studied in the past research that the most widely used performance measure, the overall accuracy of the model, can lead to sub-optimal classification models when learning from imbalanced datasets. In order to overcome this problem, other performance measures, such as the Geometric-mean (Gm) and F-measure (Fm), have been used for imbalanced dataset learning. Training a classifier system with an imbalanced dataset (where the positive class is the minority class) would usually produce sub-optimal models having a higher Specificity (SP) and a lower Sensitivity (SE). By applying class imbalance learning methods, we would often be able to increase the SE by sacrificing some amount of SP. In some type of real world imbalanced classification problems, such as the gene finding Bioinformatics problems, it is important to improve the SE as much as possible by keeping the reduction of SP to the minimum. In this paper, we show that with respect to this type of classification problems the existing performance measures used in class imbalance learning (Gm and Fm) can still result in sub-optimal classification models. In order to circumvent these problems, we introduced a new performance measure, called Adjusted Geometric-mean (AGm). We show, both analytically and empirically on two real-world Bioinformatics datasets, that AGm can perform better than Gm and Fm metrics.

Original languageEnglish
Title of host publication8th International Conference on Machine Learning and Applications, ICMLA 2009
Pages545-550
Number of pages6
DOIs
Publication statusPublished - 2009
Externally publishedYes
Event8th International Conference on Machine Learning and Applications, ICMLA 2009 - Miami Beach, United States
Duration: 13 Dec 200915 Dec 2009

Conference

Conference8th International Conference on Machine Learning and Applications, ICMLA 2009
Country/TerritoryUnited States
CityMiami Beach
Period13/12/0915/12/09

Keywords

  • Bioinformatics
  • Class imbalance learning
  • Model selection
  • Performance measures
  • SVMs

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'A new performance measure for class imbalance learning: Application to bioinformatics problems'. Together they form a unique fingerprint.

Cite this