Abstract
In the age of Web 2.0, a substantial amount of unstructured content are distributed through multiple text streams in an asynchronous fashion, which makes it increasingly difficult to glean and distill useful information. An effective way to explore the information in text streams is topic modelling, which can further facilitate other applications such as search, information browsing, and pattern mining. In this paper, we propose a semantic graph based topic modelling approach for structuring asynchronous text streams. Our model integrates topic mining and time synchronization, two core modules for addressing the problem, into a unified model. Specifically, for handling the lexical gap issues, we use global semantic graphs of each timestamp for capturing the hidden interaction among entities from all the text streams. For dealing with the sources asynchronism problem, local semantic graphs are employed to discover similar topics of different entities that can be potentially separated by time gaps. Our experiment on two real-world datasets shows that the proposed model significantly outperforms the existing ones.
Original language | English |
---|---|
Title of host publication | WWW '17: Proceedings of the 26th International Conference on World Wide Web |
Publisher | ACM |
Pages | 1201-1209 |
Number of pages | 9 |
ISBN (Print) | 9781450349130 |
DOIs | |
Publication status | Published - 3 Apr 2017 |
Externally published | Yes |
Event | 26th International Conference on World Wide Web - Perth, Australia Duration: 3 Apr 2017 → 7 Apr 2017 |
Conference
Conference | 26th International Conference on World Wide Web |
---|---|
Abbreviated title | WWW'17' |
Country/Territory | Australia |
City | Perth |
Period | 3/04/17 → 7/04/17 |