Efficient resampling methods for training support vector machines with imbalanced datasets

Rukshan Batuwita, Vasile Palade

Research output: Chapter in Book/Report/Conference proceedingConference proceedingpeer-review

86 Citations (Scopus)

Abstract

Random undersampling and oversampling are simple but well-known resampling methods applied to solve the problem of class imbalance. In this paper we show that the random oversampling method can produce better classification results than the random undersampling method, since the oversampling can increase the minority class recognition rate by sacrificing less amount of majority class recognition rate than the undersampling method. However, the random oversampling method would increase the computational cost associated with the SVM training largely due to the addition of new training examples. In this paper we present an investigation carried out to develop efficient resampling methods that can produce comparable classification results to the random oversampling results, but with the use of less amount of data. The main idea of the proposed methods is to first select the most informative data examples located closer to the class boundary region by using the separating hyperplane found by training an SVM model on the original imbalanced dataset, and then use only those examples in resampling. We demonstrate that it would be possible to obtain comparable classification results to the random oversampling results through two sets of efficient resampling methods which use 50% less amount of data and 75% less amount of data, respectively, compared to the sizes of the datasets generated by the random oversampling method.

Original languageEnglish
Title of host publication2010 IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010
PublisherIEEE
ISBN (Print)9781424469178
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010 - Barcelona, Spain
Duration: 18 Jul 201023 Jul 2010

Conference

Conference2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010 - 2010 International Joint Conference on Neural Networks, IJCNN 2010
Country/TerritorySpain
CityBarcelona
Period18/07/1023/07/10

Keywords

  • Support vector machines
  • Training
  • Testing
  • Computational modeling
  • Machine learning
  • Digital signal processing
  • Computational efficiency

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Efficient resampling methods for training support vector machines with imbalanced datasets'. Together they form a unique fingerprint.

Cite this