Word Representation with Salient Features

Ming Zhang, Vasile Palade, Yan Wang, Zhicheng Ji

Research output: Contribution to journalArticle

1 Citation (Scopus)
7 Downloads (Pure)

Abstract

Inspired from the idea that the contexts in which a word occurs are of different significance, this paper proposes a novel method, called word representation with Salient Features (SaFe), to represent words using salient features selected from the context words. The SaFe method employs the point-wise mutual information (PMI) method with scaled context window to measure word association between a target word and its context. Then, contexts having word associations will be selected as salient features, where the number of salient features for a given word is decided by the ratio between the number of unique contexts and the total counts of occurrences in the whole corpus. The SaFe method can be used with the positive PMI matrix (PPMI), with each row representing a word, hence the name SaFe-PPMI. Moreover, the SaFe-PPMI model can be further decomposed by using the truncated singular vector decomposition technique to obtain dense vectors. In addition to efficient computation, the new models can achieve remarkable improvements in seven semantic relatedness tasks, and they show superior performance when compared with the state-of-the-art models.

Original languageEnglish
Article number8611085
Pages (from-to)30157-30173
Number of pages17
JournalIEEE Access
Volume7
DOIs
Publication statusPublished - 14 Jan 2019

Fingerprint

Semantics
Association reactions
Decomposition

Bibliographical note

Open Access journal published under a CC BY 4.0 license

Keywords

  • Point-wise Mutual Information, Salient Features, Singular Vector Decomposition,
  • Word Representation
  • Singular Vector Decomposition
  • Salient Features
  • word representation
  • salient features
  • singular vector decomposition
  • Point-wise mutual information

ASJC Scopus subject areas

  • Engineering(all)
  • Materials Science(all)
  • Computer Science(all)

Cite this

Word Representation with Salient Features. / Zhang, Ming; Palade, Vasile; Wang, Yan; Ji, Zhicheng.

In: IEEE Access, Vol. 7, 8611085, 14.01.2019, p. 30157-30173.

Research output: Contribution to journalArticle

Zhang, M, Palade, V, Wang, Y & Ji, Z 2019, 'Word Representation with Salient Features' IEEE Access, vol. 7, 8611085, pp. 30157-30173. https://doi.org/10.1109/ACCESS.2019.2892817
Zhang, Ming ; Palade, Vasile ; Wang, Yan ; Ji, Zhicheng. / Word Representation with Salient Features. In: IEEE Access. 2019 ; Vol. 7. pp. 30157-30173.
@article{fd3ea847bbf54e3d806cdbf3a1afa80d,
title = "Word Representation with Salient Features",
abstract = "Inspired from the idea that the contexts in which a word occurs are of different significance, this paper proposes a novel method, called word representation with Salient Features (SaFe), to represent words using salient features selected from the context words. The SaFe method employs the point-wise mutual information (PMI) method with scaled context window to measure word association between a target word and its context. Then, contexts having word associations will be selected as salient features, where the number of salient features for a given word is decided by the ratio between the number of unique contexts and the total counts of occurrences in the whole corpus. The SaFe method can be used with the positive PMI matrix (PPMI), with each row representing a word, hence the name SaFe-PPMI. Moreover, the SaFe-PPMI model can be further decomposed by using the truncated singular vector decomposition technique to obtain dense vectors. In addition to efficient computation, the new models can achieve remarkable improvements in seven semantic relatedness tasks, and they show superior performance when compared with the state-of-the-art models.",
keywords = "Point-wise Mutual Information, Salient Features, Singular Vector Decomposition,, Word Representation, Singular Vector Decomposition, Salient Features, word representation, salient features, singular vector decomposition, Point-wise mutual information",
author = "Ming Zhang and Vasile Palade and Yan Wang and Zhicheng Ji",
note = "Open Access journal published under a CC BY 4.0 license",
year = "2019",
month = "1",
day = "14",
doi = "10.1109/ACCESS.2019.2892817",
language = "English",
volume = "7",
pages = "30157--30173",
journal = "IEEE Access",
issn = "2169-3536",
publisher = "IEEE",

}

TY - JOUR

T1 - Word Representation with Salient Features

AU - Zhang, Ming

AU - Palade, Vasile

AU - Wang, Yan

AU - Ji, Zhicheng

N1 - Open Access journal published under a CC BY 4.0 license

PY - 2019/1/14

Y1 - 2019/1/14

N2 - Inspired from the idea that the contexts in which a word occurs are of different significance, this paper proposes a novel method, called word representation with Salient Features (SaFe), to represent words using salient features selected from the context words. The SaFe method employs the point-wise mutual information (PMI) method with scaled context window to measure word association between a target word and its context. Then, contexts having word associations will be selected as salient features, where the number of salient features for a given word is decided by the ratio between the number of unique contexts and the total counts of occurrences in the whole corpus. The SaFe method can be used with the positive PMI matrix (PPMI), with each row representing a word, hence the name SaFe-PPMI. Moreover, the SaFe-PPMI model can be further decomposed by using the truncated singular vector decomposition technique to obtain dense vectors. In addition to efficient computation, the new models can achieve remarkable improvements in seven semantic relatedness tasks, and they show superior performance when compared with the state-of-the-art models.

AB - Inspired from the idea that the contexts in which a word occurs are of different significance, this paper proposes a novel method, called word representation with Salient Features (SaFe), to represent words using salient features selected from the context words. The SaFe method employs the point-wise mutual information (PMI) method with scaled context window to measure word association between a target word and its context. Then, contexts having word associations will be selected as salient features, where the number of salient features for a given word is decided by the ratio between the number of unique contexts and the total counts of occurrences in the whole corpus. The SaFe method can be used with the positive PMI matrix (PPMI), with each row representing a word, hence the name SaFe-PPMI. Moreover, the SaFe-PPMI model can be further decomposed by using the truncated singular vector decomposition technique to obtain dense vectors. In addition to efficient computation, the new models can achieve remarkable improvements in seven semantic relatedness tasks, and they show superior performance when compared with the state-of-the-art models.

KW - Point-wise Mutual Information, Salient Features, Singular Vector Decomposition,

KW - Word Representation

KW - Singular Vector Decomposition

KW - Salient Features

KW - word representation

KW - salient features

KW - singular vector decomposition

KW - Point-wise mutual information

UR - http://www.scopus.com/inward/record.url?scp=85065241472&partnerID=8YFLogxK

U2 - 10.1109/ACCESS.2019.2892817

DO - 10.1109/ACCESS.2019.2892817

M3 - Article

VL - 7

SP - 30157

EP - 30173

JO - IEEE Access

JF - IEEE Access

SN - 2169-3536

M1 - 8611085

ER -