Improving generalization ability of Instance-transfer Based Imbalanced Sentiment Classification of Turn-Level Interactive Chinese Texts

Feng Tian, Fan Wu, Xiang Fei, Nazaraf Shah, Qinhua Zheng, Yuanyuan Wang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Generally, a classification model achieving better generalization ability means the model performs better on the future incoming data, otherwise the history dataset. Increasing the generalization ability of multi-domain and imbalanced multi-class emotion classification of turn-level interactive Chinese texts poses the challenges due to its high dimension and sparse feature values in its feature space. Moreover, the properties of different feature spaces or diverse data distributions in various domains of target dataset (T) and source dataset (S) make it difficult to employ multi-class and multi-domain instance transfer. To address these challenges, we propose a data-level sampling approach for multi-class and multi-domain instance transfer which is inspired by transfer learning. To verify the validity of our proposed method, an imbalanced dataset is taken as target dataset, while three datasets, one collected from Bulletin Board System of Xi'an Jiaotong University and other two datasets collected from China microblog platform Weibo, as source datasets. The experimental results show that the proposed approach outperforms classic algorithms by alleviating the imbalanced problem in interactive texts effectively. Moreover, a classification model that is trained on immigrated datasets produced by employing our proposed method achieves the best ability of generalization.
Original languageEnglish
Pages (from-to)155-167
Number of pages13
JournalService Oriented Computing and Applications
Volume13
Issue number2
Early online date17 Jun 2019
DOIs
Publication statusPublished - Jun 2019

Fingerprint

Bulletin boards
Sampling
Sentiment classification
Transfer learning
Emotion
China

Keywords

  • Generalization ability
  • Imbalanced sentiment classification
  • Instance immigration-based sampling
  • Interactive Chinese texts
  • Multi-class
  • Multi-domain

ASJC Scopus subject areas

  • Software
  • Management Information Systems
  • Information Systems
  • Hardware and Architecture

Cite this

Improving generalization ability of Instance-transfer Based Imbalanced Sentiment Classification of Turn-Level Interactive Chinese Texts. / Tian, Feng; Wu, Fan; Fei, Xiang; Shah, Nazaraf; Zheng, Qinhua; Wang, Yuanyuan.

In: Service Oriented Computing and Applications, Vol. 13, No. 2, 06.2019, p. 155-167.

Research output: Contribution to journalArticle

@article{07bd4554272046c186eb11be3775b67a,
title = "Improving generalization ability of Instance-transfer Based Imbalanced Sentiment Classification of Turn-Level Interactive Chinese Texts",
abstract = "Generally, a classification model achieving better generalization ability means the model performs better on the future incoming data, otherwise the history dataset. Increasing the generalization ability of multi-domain and imbalanced multi-class emotion classification of turn-level interactive Chinese texts poses the challenges due to its high dimension and sparse feature values in its feature space. Moreover, the properties of different feature spaces or diverse data distributions in various domains of target dataset (T) and source dataset (S) make it difficult to employ multi-class and multi-domain instance transfer. To address these challenges, we propose a data-level sampling approach for multi-class and multi-domain instance transfer which is inspired by transfer learning. To verify the validity of our proposed method, an imbalanced dataset is taken as target dataset, while three datasets, one collected from Bulletin Board System of Xi'an Jiaotong University and other two datasets collected from China microblog platform Weibo, as source datasets. The experimental results show that the proposed approach outperforms classic algorithms by alleviating the imbalanced problem in interactive texts effectively. Moreover, a classification model that is trained on immigrated datasets produced by employing our proposed method achieves the best ability of generalization.",
keywords = "Generalization ability, Imbalanced sentiment classification, Instance immigration-based sampling, Interactive Chinese texts, Multi-class, Multi-domain",
author = "Feng Tian and Fan Wu and Xiang Fei and Nazaraf Shah and Qinhua Zheng and Yuanyuan Wang",
year = "2019",
month = "6",
doi = "10.1007/s11761-019-00264-y",
language = "English",
volume = "13",
pages = "155--167",
journal = "Service Oriented Computing and Applications",
issn = "1863-2386",
publisher = "Springer Verlag",
number = "2",

}

TY - JOUR

T1 - Improving generalization ability of Instance-transfer Based Imbalanced Sentiment Classification of Turn-Level Interactive Chinese Texts

AU - Tian, Feng

AU - Wu, Fan

AU - Fei, Xiang

AU - Shah, Nazaraf

AU - Zheng, Qinhua

AU - Wang, Yuanyuan

PY - 2019/6

Y1 - 2019/6

N2 - Generally, a classification model achieving better generalization ability means the model performs better on the future incoming data, otherwise the history dataset. Increasing the generalization ability of multi-domain and imbalanced multi-class emotion classification of turn-level interactive Chinese texts poses the challenges due to its high dimension and sparse feature values in its feature space. Moreover, the properties of different feature spaces or diverse data distributions in various domains of target dataset (T) and source dataset (S) make it difficult to employ multi-class and multi-domain instance transfer. To address these challenges, we propose a data-level sampling approach for multi-class and multi-domain instance transfer which is inspired by transfer learning. To verify the validity of our proposed method, an imbalanced dataset is taken as target dataset, while three datasets, one collected from Bulletin Board System of Xi'an Jiaotong University and other two datasets collected from China microblog platform Weibo, as source datasets. The experimental results show that the proposed approach outperforms classic algorithms by alleviating the imbalanced problem in interactive texts effectively. Moreover, a classification model that is trained on immigrated datasets produced by employing our proposed method achieves the best ability of generalization.

AB - Generally, a classification model achieving better generalization ability means the model performs better on the future incoming data, otherwise the history dataset. Increasing the generalization ability of multi-domain and imbalanced multi-class emotion classification of turn-level interactive Chinese texts poses the challenges due to its high dimension and sparse feature values in its feature space. Moreover, the properties of different feature spaces or diverse data distributions in various domains of target dataset (T) and source dataset (S) make it difficult to employ multi-class and multi-domain instance transfer. To address these challenges, we propose a data-level sampling approach for multi-class and multi-domain instance transfer which is inspired by transfer learning. To verify the validity of our proposed method, an imbalanced dataset is taken as target dataset, while three datasets, one collected from Bulletin Board System of Xi'an Jiaotong University and other two datasets collected from China microblog platform Weibo, as source datasets. The experimental results show that the proposed approach outperforms classic algorithms by alleviating the imbalanced problem in interactive texts effectively. Moreover, a classification model that is trained on immigrated datasets produced by employing our proposed method achieves the best ability of generalization.

KW - Generalization ability

KW - Imbalanced sentiment classification

KW - Instance immigration-based sampling

KW - Interactive Chinese texts

KW - Multi-class

KW - Multi-domain

UR - http://www.scopus.com/inward/record.url?scp=85067874778&partnerID=8YFLogxK

U2 - 10.1007/s11761-019-00264-y

DO - 10.1007/s11761-019-00264-y

M3 - Article

VL - 13

SP - 155

EP - 167

JO - Service Oriented Computing and Applications

JF - Service Oriented Computing and Applications

SN - 1863-2386

IS - 2

ER -