Word representation using refined contexts

Ming Zhang, Vasile Palade, Yan Wang, Zhicheng Ji

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, inspired from the idea that the contextual distances and positions may have a substantial impact on distinguishing the relationships between words, a novel word association method with two weighting schemes for refining contexts, named as Weighted Point-wise Mutual Information with Contextual Distances and Positions (PMIDP), is proposed to eliminate the noisy and redundant information hidden in an imbalanced corpus. One weighting scheme, called PMIdist, revises the Point-wise Mutual Information (PMI) method by scaling the co-occurrence counts according to the distance between a word and the context. The second weighting scheme is a ratio that can measure the contextual position variation within the window of a given word. Then, the refined word association in PMIDP is defined as the multiplication of the two proposed weighting schemes, which essentially aims to flexibly adjust the word association when solving target-oriented similarity tasks. The proposed word association method has been applied on two widely known models, i.e., the positive PMI matrix with truncated Singular Vector Decomposition (PPMI-SVD) model and the Global Vectors (GloVe) model. Experimental results demonstrate that the PMIDP method can significantly improve the performances of the two models on both semantic and relational similarity tasks and show advantages when compared with other state-of-the-art models.

Original languageEnglish
Pages (from-to)(in press)
JournalApplied Intelligence
Volume(in press)
Early online date4 Feb 2022
DOIs
Publication statusE-pub ahead of print - 4 Feb 2022

Bibliographical note

Funding Information:
This project was supported by the National Natural Science Foundation of China (No. 62072350) and National Natural Science Foundation of China (No. 62106180).

Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Keywords

  • Contextual distance
  • Contextual position variation
  • Point-wise mutual information
  • Singular vector decomposition
  • Word representation

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Word representation using refined contexts'. Together they form a unique fingerprint.

Cite this