A topic sentence-based instance transfer method for imbalanced sentiment classification of Chinese product reviews

Feng Tian, Fan Wu, Kuo-Ming Chao, Qinghua Zheng, Nazaraf Shah, Tian Lan, Jia Yue

  • 2 Citations

Abstract

The increasing interest in sentiment classification of product reviews is due to its potential application for improving e-commerce services and quality of the products. However, in realistic e-commerce environments, the review-related data are imbalanced, and this leads to a problem in which minority class information tends to be ignored during the training phase of a classification model. To address this problem, we propose a topic sentence-based instance transfer method to process imbalanced Chinese product reviews by using an auxiliary dataset (source dataset). The proposed method incorporates a rule and supervised learning hybrid approach to identify a topic sentence of each product review and adds the feature set of the topic sentence to the feature space of sentiment classification. Next, to measure the transferability of instances in source dataset, a greedy algorithm based on information gain of top-N common features is used to select common features. Then, a common feature-based cosine similarity of instances between source dataset and target dataset is introduced to select the transferable instances. Furthermore, a synthetic minority over-sampling technique (Smote) based method is adopted to overcome feature space inconsistency between the source dataset and target dataset. Finally, we immigrate the instances selected in source dataset into target dataset to form a new dataset for the training of classification model. Two datasets collected from Jingdong and Dangdang are the target dataset and source dataset. The experimental results verify that, considering the ability of generalization, our proposed method helps a support vector machine (SVM) to outperform other classification methods, such as the J48, Naive Bayes, Random Forest and Random Committee methods, when applied to datasets produced by resampling and Smote.

NOTICE: this is the author’s version of a work that was accepted for publication in Electronic Commerce Research and Applications. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Electronic Commerce Research and Applications, [16, March-April,(2016)] DOI: 10.1016/j.elerap.2015.10.003

© 2016, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
Original languageEnglish
Pages (from-to)66–76
JournalElectronic Commerce Research and Applications
Volume16
Issue numberMarch–April
Early online date30 Oct 2015
DOIs
StatePublished - 2016

Fingerprint

Minorities
Electronic commerce
Sampling
Supervised learning
Quality control
Support vector machines
Support vector machine
Inconsistency
Editing
Peer review
Resampling
Attribution

Keywords

  • Classification methods
  • Imbalanced sample classification
  • Instance transfer methods
  • Product reviews
  • Topic sentence analysis

Cite this

A topic sentence-based instance transfer method for imbalanced sentiment classification of Chinese product reviews. / Tian, Feng; Wu, Fan; Chao, Kuo-Ming; Zheng, Qinghua; Shah, Nazaraf; Lan, Tian; Yue, Jia.

In: Electronic Commerce Research and Applications, Vol. 16, No. March–April, 2016, p. 66–76.

Research output: Contribution to journalArticle

Tian, Feng; Wu, Fan; Chao, Kuo-Ming; Zheng, Qinghua; Shah, Nazaraf; Lan, Tian; Yue, Jia / A topic sentence-based instance transfer method for imbalanced sentiment classification of Chinese product reviews.

In: Electronic Commerce Research and Applications, Vol. 16, No. March–April, 2016, p. 66–76.

Research output: Contribution to journalArticle

@article{53eec8c6f6be4f83a9d31dea8f774b02,
title = "A topic sentence-based instance transfer method for imbalanced sentiment classification of Chinese product reviews",
abstract = "The increasing interest in sentiment classification of product reviews is due to its potential application for improving e-commerce services and quality of the products. However, in realistic e-commerce environments, the review-related data are imbalanced, and this leads to a problem in which minority class information tends to be ignored during the training phase of a classification model. To address this problem, we propose a topic sentence-based instance transfer method to process imbalanced Chinese product reviews by using an auxiliary dataset (source dataset). The proposed method incorporates a rule and supervised learning hybrid approach to identify a topic sentence of each product review and adds the feature set of the topic sentence to the feature space of sentiment classification. Next, to measure the transferability of instances in source dataset, a greedy algorithm based on information gain of top-N common features is used to select common features. Then, a common feature-based cosine similarity of instances between source dataset and target dataset is introduced to select the transferable instances. Furthermore, a synthetic minority over-sampling technique (Smote) based method is adopted to overcome feature space inconsistency between the source dataset and target dataset. Finally, we immigrate the instances selected in source dataset into target dataset to form a new dataset for the training of classification model. Two datasets collected from Jingdong and Dangdang are the target dataset and source dataset. The experimental results verify that, considering the ability of generalization, our proposed method helps a support vector machine (SVM) to outperform other classification methods, such as the J48, Naive Bayes, Random Forest and Random Committee methods, when applied to datasets produced by resampling and Smote.NOTICE: this is the author’s version of a work that was accepted for publication in Electronic Commerce Research and Applications. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Electronic Commerce Research and Applications, [16, March-April,(2016)] DOI: 10.1016/j.elerap.2015.10.003© 2016, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International",
keywords = "Classification methods, Imbalanced sample classification, Instance transfer methods, Product reviews, Topic sentence analysis",
author = "Feng Tian and Fan Wu and Kuo-Ming Chao and Qinghua Zheng and Nazaraf Shah and Tian Lan and Jia Yue",
note = "Due to publisher policy, the full text will not be available on the repository until 30th April 2017.",
year = "2016",
doi = "10.1016/j.elerap.2015.10.003",
volume = "16",
pages = "66–76",
journal = "Electronic Commerce Research and Applications",
issn = "1567-4223",
publisher = "Elsevier",
number = "March–April",

}

TY - JOUR

T1 - A topic sentence-based instance transfer method for imbalanced sentiment classification of Chinese product reviews

AU - Tian,Feng

AU - Wu,Fan

AU - Chao,Kuo-Ming

AU - Zheng,Qinghua

AU - Shah,Nazaraf

AU - Lan,Tian

AU - Yue,Jia

N1 - Due to publisher policy, the full text will not be available on the repository until 30th April 2017.

PY - 2016

Y1 - 2016

N2 - The increasing interest in sentiment classification of product reviews is due to its potential application for improving e-commerce services and quality of the products. However, in realistic e-commerce environments, the review-related data are imbalanced, and this leads to a problem in which minority class information tends to be ignored during the training phase of a classification model. To address this problem, we propose a topic sentence-based instance transfer method to process imbalanced Chinese product reviews by using an auxiliary dataset (source dataset). The proposed method incorporates a rule and supervised learning hybrid approach to identify a topic sentence of each product review and adds the feature set of the topic sentence to the feature space of sentiment classification. Next, to measure the transferability of instances in source dataset, a greedy algorithm based on information gain of top-N common features is used to select common features. Then, a common feature-based cosine similarity of instances between source dataset and target dataset is introduced to select the transferable instances. Furthermore, a synthetic minority over-sampling technique (Smote) based method is adopted to overcome feature space inconsistency between the source dataset and target dataset. Finally, we immigrate the instances selected in source dataset into target dataset to form a new dataset for the training of classification model. Two datasets collected from Jingdong and Dangdang are the target dataset and source dataset. The experimental results verify that, considering the ability of generalization, our proposed method helps a support vector machine (SVM) to outperform other classification methods, such as the J48, Naive Bayes, Random Forest and Random Committee methods, when applied to datasets produced by resampling and Smote.NOTICE: this is the author’s version of a work that was accepted for publication in Electronic Commerce Research and Applications. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Electronic Commerce Research and Applications, [16, March-April,(2016)] DOI: 10.1016/j.elerap.2015.10.003© 2016, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

AB - The increasing interest in sentiment classification of product reviews is due to its potential application for improving e-commerce services and quality of the products. However, in realistic e-commerce environments, the review-related data are imbalanced, and this leads to a problem in which minority class information tends to be ignored during the training phase of a classification model. To address this problem, we propose a topic sentence-based instance transfer method to process imbalanced Chinese product reviews by using an auxiliary dataset (source dataset). The proposed method incorporates a rule and supervised learning hybrid approach to identify a topic sentence of each product review and adds the feature set of the topic sentence to the feature space of sentiment classification. Next, to measure the transferability of instances in source dataset, a greedy algorithm based on information gain of top-N common features is used to select common features. Then, a common feature-based cosine similarity of instances between source dataset and target dataset is introduced to select the transferable instances. Furthermore, a synthetic minority over-sampling technique (Smote) based method is adopted to overcome feature space inconsistency between the source dataset and target dataset. Finally, we immigrate the instances selected in source dataset into target dataset to form a new dataset for the training of classification model. Two datasets collected from Jingdong and Dangdang are the target dataset and source dataset. The experimental results verify that, considering the ability of generalization, our proposed method helps a support vector machine (SVM) to outperform other classification methods, such as the J48, Naive Bayes, Random Forest and Random Committee methods, when applied to datasets produced by resampling and Smote.NOTICE: this is the author’s version of a work that was accepted for publication in Electronic Commerce Research and Applications. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Electronic Commerce Research and Applications, [16, March-April,(2016)] DOI: 10.1016/j.elerap.2015.10.003© 2016, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International

KW - Classification methods

KW - Imbalanced sample classification

KW - Instance transfer methods

KW - Product reviews

KW - Topic sentence analysis

U2 - 10.1016/j.elerap.2015.10.003

DO - 10.1016/j.elerap.2015.10.003

M3 - Article

VL - 16

SP - 66

EP - 76

JO - Electronic Commerce Research and Applications

T2 - Electronic Commerce Research and Applications

JF - Electronic Commerce Research and Applications

SN - 1567-4223

SN - 1873-7846

IS - March–April

ER -