Abstract
In this paper, inspired from the idea that the contextual distances and positions may have a substantial impact on distinguishing the relationships between words, a novel word association method with two weighting schemes for refining contexts, named as Weighted Point-wise Mutual Information with Contextual Distances and Positions (PMIDP), is proposed to eliminate the noisy and redundant information hidden in an imbalanced corpus. One weighting scheme, called PMIdist, revises the Point-wise Mutual Information (PMI) method by scaling the co-occurrence counts according to the distance between a word and the context. The second weighting scheme is a ratio that can measure the contextual position variation within the window of a given word. Then, the refined word association in PMIDP is defined as the multiplication of the two proposed weighting schemes, which essentially aims to flexibly adjust the word association when solving target-oriented similarity tasks. The proposed word association method has been applied on two widely known models, i.e., the positive PMI matrix with truncated Singular Vector Decomposition (PPMI-SVD) model and the Global Vectors (GloVe) model. Experimental results demonstrate that the PMIDP method can significantly improve the performances of the two models on both semantic and relational similarity tasks and show advantages when compared with other state-of-the-art models.
Original language | English |
---|---|
Pages (from-to) | 12347-12368 |
Number of pages | 22 |
Journal | Applied Intelligence |
Volume | 52 |
Issue number | 11 |
Early online date | 4 Feb 2022 |
DOIs | |
Publication status | Published - Sept 2022 |
Bibliographical note
Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.This document is the author’s post-print version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it.
Funder
This project was supported by the National Natural Science Foundation of China (No. 62072350) and National Natural Science Foundation of China (No. 62106180).Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
Funding
This project was supported by the National Natural Science Foundation of China (No. 62072350) and National Natural Science Foundation of China (No. 62106180).
Funders | Funder number |
---|---|
National Natural Science Foundation of China | 62072350, 62106180 |
National Natural Science Foundation of China |
Keywords
- Contextual distance
- Contextual position variation
- Point-wise mutual information
- Singular vector decomposition
- Word representation
ASJC Scopus subject areas
- Artificial Intelligence