Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure

Chaobo He, Xiang Fei, Hanchao Li, Yong Tang, Hai Liu, Shuangyin Liu

Research output: Contribution to journalArticle

3 Citations (Scopus)
7 Downloads (Pure)

Abstract

Nonnegative Matrix Factorization (NMF) has become a powerful model for community discovery in complex networks. Existing NMF-based methods for community discovery often factorize the corresponding adjacent matrix of complex networks to obtain its community indicator matrix. However, the adjacent matrix cannot represent the global structure feature of complex networks very well, and this leads to the performance degradation of community discovery. Besides, most of existing methods are not robust and scalable enough, so they are not effective to deal with complex networks with noises and large-scales. Aiming at these problems above, in this paper we propose a method for community discovery using distributed robust NMF with SimRank similarity measure. This method selects SimRank measure to construct the feature matrix, which can more accurately represent the global structure feature of complex networks. To improve the robustness, we select ℓ2;1 norm instead of the widely used Frobenius norm to construct its NMFbased community discovery model. In addition, to improve the scalability, we implement its key components by using MapReduce distributed computing framework, including computing SimRank feature matrix and iteratively solving the NMF-based model for community discovery. We conduct extensive experiments on several typical complex networks. The results show that our method has better performance and robustness than other representative NMF-based methods for community discovery. Moreover, our method presents good scalability, and hence can be used to discover communities in the largescale complex networks.
Original languageEnglish
Pages (from-to)5601-5624
Number of pages24
JournalThe Journal of Supercomputing
Volume74
Issue number10
Early online date26 Jul 2018
DOIs
Publication statusPublished - Oct 2018

Fingerprint

Non-negative Matrix Factorization
Factorization
Similarity Measure
Complex networks
Complex Networks
Scalability
Adjacent
Robustness
Community
Factorise
Frobenius norm
MapReduce
Distributed Computing
Distributed computer systems
Degradation
Model
Norm
Computing

Bibliographical note

The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-018-2500-9

Copyright © and Moral Rights are retained by the author(s) and/ or other copyright
owners. A copy can be downloaded for personal non-commercial research or study,
without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.

Keywords

  • Community discovery
  • Robust nonnegative matrix factorization
  • SimRank
  • MapReduce
  • Complex networks

Cite this

Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure. / He, Chaobo; Fei, Xiang; Li, Hanchao; Tang, Yong; Liu, Hai; Liu, Shuangyin.

In: The Journal of Supercomputing, Vol. 74, No. 10, 10.2018, p. 5601-5624.

Research output: Contribution to journalArticle

He, Chaobo ; Fei, Xiang ; Li, Hanchao ; Tang, Yong ; Liu, Hai ; Liu, Shuangyin. / Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure. In: The Journal of Supercomputing. 2018 ; Vol. 74, No. 10. pp. 5601-5624.
@article{99fc78cbb49840dda6aaee403c715b13,
title = "Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure",
abstract = "Nonnegative Matrix Factorization (NMF) has become a powerful model for community discovery in complex networks. Existing NMF-based methods for community discovery often factorize the corresponding adjacent matrix of complex networks to obtain its community indicator matrix. However, the adjacent matrix cannot represent the global structure feature of complex networks very well, and this leads to the performance degradation of community discovery. Besides, most of existing methods are not robust and scalable enough, so they are not effective to deal with complex networks with noises and large-scales. Aiming at these problems above, in this paper we propose a method for community discovery using distributed robust NMF with SimRank similarity measure. This method selects SimRank measure to construct the feature matrix, which can more accurately represent the global structure feature of complex networks. To improve the robustness, we select ℓ2;1 norm instead of the widely used Frobenius norm to construct its NMFbased community discovery model. In addition, to improve the scalability, we implement its key components by using MapReduce distributed computing framework, including computing SimRank feature matrix and iteratively solving the NMF-based model for community discovery. We conduct extensive experiments on several typical complex networks. The results show that our method has better performance and robustness than other representative NMF-based methods for community discovery. Moreover, our method presents good scalability, and hence can be used to discover communities in the largescale complex networks.",
keywords = "Community discovery, Robust nonnegative matrix factorization, SimRank, MapReduce, Complex networks",
author = "Chaobo He and Xiang Fei and Hanchao Li and Yong Tang and Hai Liu and Shuangyin Liu",
note = "The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-018-2500-9 Copyright {\circledC} and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.",
year = "2018",
month = "10",
doi = "10.1007/s11227-018-2500-9",
language = "English",
volume = "74",
pages = "5601--5624",
journal = "Journal of Supercomputing",
issn = "0920-8542",
publisher = "Springer Verlag",
number = "10",

}

TY - JOUR

T1 - Improving NMF-based community discovery using distributed robust nonnegative matrix factorization with SimRank similarity measure

AU - He, Chaobo

AU - Fei, Xiang

AU - Li, Hanchao

AU - Tang, Yong

AU - Liu, Hai

AU - Liu, Shuangyin

N1 - The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-018-2500-9 Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.

PY - 2018/10

Y1 - 2018/10

N2 - Nonnegative Matrix Factorization (NMF) has become a powerful model for community discovery in complex networks. Existing NMF-based methods for community discovery often factorize the corresponding adjacent matrix of complex networks to obtain its community indicator matrix. However, the adjacent matrix cannot represent the global structure feature of complex networks very well, and this leads to the performance degradation of community discovery. Besides, most of existing methods are not robust and scalable enough, so they are not effective to deal with complex networks with noises and large-scales. Aiming at these problems above, in this paper we propose a method for community discovery using distributed robust NMF with SimRank similarity measure. This method selects SimRank measure to construct the feature matrix, which can more accurately represent the global structure feature of complex networks. To improve the robustness, we select ℓ2;1 norm instead of the widely used Frobenius norm to construct its NMFbased community discovery model. In addition, to improve the scalability, we implement its key components by using MapReduce distributed computing framework, including computing SimRank feature matrix and iteratively solving the NMF-based model for community discovery. We conduct extensive experiments on several typical complex networks. The results show that our method has better performance and robustness than other representative NMF-based methods for community discovery. Moreover, our method presents good scalability, and hence can be used to discover communities in the largescale complex networks.

AB - Nonnegative Matrix Factorization (NMF) has become a powerful model for community discovery in complex networks. Existing NMF-based methods for community discovery often factorize the corresponding adjacent matrix of complex networks to obtain its community indicator matrix. However, the adjacent matrix cannot represent the global structure feature of complex networks very well, and this leads to the performance degradation of community discovery. Besides, most of existing methods are not robust and scalable enough, so they are not effective to deal with complex networks with noises and large-scales. Aiming at these problems above, in this paper we propose a method for community discovery using distributed robust NMF with SimRank similarity measure. This method selects SimRank measure to construct the feature matrix, which can more accurately represent the global structure feature of complex networks. To improve the robustness, we select ℓ2;1 norm instead of the widely used Frobenius norm to construct its NMFbased community discovery model. In addition, to improve the scalability, we implement its key components by using MapReduce distributed computing framework, including computing SimRank feature matrix and iteratively solving the NMF-based model for community discovery. We conduct extensive experiments on several typical complex networks. The results show that our method has better performance and robustness than other representative NMF-based methods for community discovery. Moreover, our method presents good scalability, and hence can be used to discover communities in the largescale complex networks.

KW - Community discovery

KW - Robust nonnegative matrix factorization

KW - SimRank

KW - MapReduce

KW - Complex networks

U2 - 10.1007/s11227-018-2500-9

DO - 10.1007/s11227-018-2500-9

M3 - Article

VL - 74

SP - 5601

EP - 5624

JO - Journal of Supercomputing

JF - Journal of Supercomputing

SN - 0920-8542

IS - 10

ER -