In this paper, inspired from the idea that the contextual distances and positions may have a substantial impact on distinguishing the relationships between words, a novel word association method with two weighting schemes for refining contexts, named as Weighted Point-wise Mutual Information with Contextual Distances and Positions (PMIDP), is proposed to eliminate the noisy and redundant information hidden in an imbalanced corpus. One weighting scheme, called PMIdist, revises the Point-wise Mutual Information (PMI) method by scaling the co-occurrence counts according to the distance between a word and the context. The second weighting scheme is a ratio that can measure the contextual position variation within the window of a given word. Then, the refined word association in PMIDP is defined as the multiplication of the two proposed weighting schemes, which essentially aims to flexibly adjust the word association when solving target-oriented similarity tasks. The proposed word association method has been applied on two widely known models, i.e., the positive PMI matrix with truncated Singular Vector Decomposition (PPMI-SVD) model and the Global Vectors (GloVe) model. Experimental results demonstrate that the PMIDP method can significantly improve the performances of the two models on both semantic and relational similarity tasks and show advantages when compared with other state-of-the-art models.
Bibliographical noteFunding Information:
This project was supported by the National Natural Science Foundation of China (No. 62072350) and National Natural Science Foundation of China (No. 62106180).
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
- Contextual distance
- Contextual position variation
- Point-wise mutual information
- Singular vector decomposition
- Word representation
ASJC Scopus subject areas
- Artificial Intelligence