Term Based Semantic Clusters for Very Short Text Classification.

Jasper Paalman, Shantanu Mullick, Kalliopi Zervanou, Yingqian Zhang

Research output: Chapter in Book/Report/Conference proceedingConference proceedingpeer-review

2 Citations (Scopus)


Very short texts, such as tweets and in-voices, present challenges in classification.Although term occurrences are strong in-dicators of content, in very short texts, thesparsity of these texts makes it difficult tocapture important semantic relationships.A solution calls for a method that not onlyconsiders term occurrence, but also han-dles sparseness well. In this work, we in-troduce such an approach, theTermBasedSemanticClusters (TBSeC) that employsterms to create distinctive semantic con-cept clusters. These clusters are rankedusing a semantic similarity function whichin turn defines a semantic feature spacethat can be used for text classification. Ourmethod is evaluated in an invoice classifi-cation task. Compared to well-known con-tent representation methods the proposedmethod performs competitively.
Original languageEnglish
Title of host publicationProceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
EditorsGalia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova
PublisherIncoma Ltd
Number of pages10
ISBN (Electronic)9789544520557
Publication statusPublished - Sep 2019
Externally publishedYes
EventInternational Conference on Recent Advances in Natural Language Processing - Varna, Bulgaria
Duration: 2 Sep 20194 Sep 2019
Conference number: 12


ConferenceInternational Conference on Recent Advances in Natural Language Processing
Abbreviated titleRANLP 2019
Internet address

Bibliographical note

DBLP's bibliographic metadata records provided through http://dblp.org/search/publ/api are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.


Dive into the research topics of 'Term Based Semantic Clusters for Very Short Text Classification.'. Together they form a unique fingerprint.

Cite this