Word representation using refined contexts

Ming Zhang, Vasile Palade, Yan Wang, Zhicheng Ji

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)
96 Downloads (Pure)

Abstract

In this paper, inspired from the idea that the contextual distances and positions may have a substantial impact on distinguishing the relationships between words, a novel word association method with two weighting schemes for refining contexts, named as Weighted Point-wise Mutual Information with Contextual Distances and Positions (PMIDP), is proposed to eliminate the noisy and redundant information hidden in an imbalanced corpus. One weighting scheme, called PMIdist, revises the Point-wise Mutual Information (PMI) method by scaling the co-occurrence counts according to the distance between a word and the context. The second weighting scheme is a ratio that can measure the contextual position variation within the window of a given word. Then, the refined word association in PMIDP is defined as the multiplication of the two proposed weighting schemes, which essentially aims to flexibly adjust the word association when solving target-oriented similarity tasks. The proposed word association method has been applied on two widely known models, i.e., the positive PMI matrix with truncated Singular Vector Decomposition (PPMI-SVD) model and the Global Vectors (GloVe) model. Experimental results demonstrate that the PMIDP method can significantly improve the performances of the two models on both semantic and relational similarity tasks and show advantages when compared with other state-of-the-art models.

Original languageEnglish
Pages (from-to)12347-12368
Number of pages22
JournalApplied Intelligence
Volume52
Issue number11
Early online date4 Feb 2022
DOIs
Publication statusPublished - Sept 2022

Bibliographical note

Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.

This document is the author’s post-print version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it.

Funder

This project was supported by the National Natural Science Foundation of China (No. 62072350) and National Natural Science Foundation of China (No. 62106180).

Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

Funding

This project was supported by the National Natural Science Foundation of China (No. 62072350) and National Natural Science Foundation of China (No. 62106180).

FundersFunder number
National Natural Science Foundation of China62072350, 62106180
National Natural Science Foundation of China

    Keywords

    • Contextual distance
    • Contextual position variation
    • Point-wise mutual information
    • Singular vector decomposition
    • Word representation

    ASJC Scopus subject areas

    • Artificial Intelligence

    Fingerprint

    Dive into the research topics of 'Word representation using refined contexts'. Together they form a unique fingerprint.

    Cite this