Abstract
Random undersampling and oversampling are simple but well-known resampling methods applied to solve the problem of class imbalance. In this paper we show that the random oversampling method can produce better classification results than the random undersampling method, since the oversampling can increase the minority class recognition rate by sacrificing less amount of majority class recognition rate than the undersampling method. However, the random oversampling method would increase the computational cost associated with the SVM training largely due to the addition of new training examples. In this paper we present an investigation carried out to develop efficient resampling methods that can produce comparable classification results to the random oversampling results, but with the use of less amount of data. The main idea of the proposed methods is to first select the most informative data examples located closer to the class boundary region by using the separating hyperplane found by training an SVM model on the original imbalanced dataset, and then use only those examples in resampling. We demonstrate that it would be possible to obtain comparable classification results to the random oversampling results through two sets of efficient resampling methods which use 50% less amount of data and 75% less amount of data, respectively, compared to the sizes of the datasets generated by the random oversampling method.
Original language | English |
---|---|
Title of host publication | 2010 IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010 |
Publisher | IEEE |
ISBN (Print) | 9781424469178 |
DOIs | |
Publication status | Published - 2010 |
Externally published | Yes |
Event | 2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010 - Barcelona, Spain Duration: 18 Jul 2010 → 23 Jul 2010 |
Conference
Conference | 2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010 |
---|---|
Country/Territory | Spain |
City | Barcelona |
Period | 18/07/10 → 23/07/10 |
Keywords
- Support vector machines
- Training
- Testing
- Computational modeling
- Machine learning
- Digital signal processing
- Computational efficiency
ASJC Scopus subject areas
- Software
- Artificial Intelligence