Abstract
In class imbalance learning, the performance measure used for the model selection would play a vital role. It has been well-studied in the past research that the most widely used performance measure, the overall accuracy of the model, can lead to sub-optimal classification models when learning from imbalanced datasets. In order to overcome this problem, other performance measures, such as the Geometric-mean (Gm) and F-measure (Fm), have been used for imbalanced dataset learning. Training a classifier system with an imbalanced dataset (where the positive class is the minority class) would usually produce sub-optimal models having a higher Specificity (SP) and a lower Sensitivity (SE). By applying class imbalance learning methods, we would often be able to increase the SE by sacrificing some amount of SP. In some type of real world imbalanced classification problems, such as the gene finding Bioinformatics problems, it is important to improve the SE as much as possible by keeping the reduction of SP to the minimum. In this paper, we show that with respect to this type of classification problems the existing performance measures used in class imbalance learning (Gm and Fm) can still result in sub-optimal classification models. In order to circumvent these problems, we introduced a new performance measure, called Adjusted Geometric-mean (AGm). We show, both analytically and empirically on two real-world Bioinformatics datasets, that AGm can perform better than Gm and Fm metrics.
Original language | English |
---|---|
Title of host publication | 8th International Conference on Machine Learning and Applications, ICMLA 2009 |
Pages | 545-550 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 2009 |
Externally published | Yes |
Event | 8th International Conference on Machine Learning and Applications, ICMLA 2009 - Miami Beach, United States Duration: 13 Dec 2009 → 15 Dec 2009 |
Conference
Conference | 8th International Conference on Machine Learning and Applications, ICMLA 2009 |
---|---|
Country/Territory | United States |
City | Miami Beach |
Period | 13/12/09 → 15/12/09 |
Keywords
- Bioinformatics
- Class imbalance learning
- Model selection
- Performance measures
- SVMs
ASJC Scopus subject areas
- Computer Science Applications
- Human-Computer Interaction
- Software