Abstract
Very short texts, such as tweets and in-voices, present challenges in classification.Although term occurrences are strong in-dicators of content, in very short texts, thesparsity of these texts makes it difficult tocapture important semantic relationships.A solution calls for a method that not onlyconsiders term occurrence, but also han-dles sparseness well. In this work, we in-troduce such an approach, theTermBasedSemanticClusters (TBSeC) that employsterms to create distinctive semantic con-cept clusters. These clusters are rankedusing a semantic similarity function whichin turn defines a semantic feature spacethat can be used for text classification. Ourmethod is evaluated in an invoice classifi-cation task. Compared to well-known con-tent representation methods the proposedmethod performs competitively.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) |
| Editors | Galia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova |
| Publisher | Incoma Ltd |
| Pages | 878-887 |
| Number of pages | 10 |
| ISBN (Electronic) | 9789544520557 |
| DOIs | |
| Publication status | Published - Sept 2019 |
| Externally published | Yes |
| Event | International Conference on Recent Advances in Natural Language Processing - Varna, Bulgaria Duration: 2 Sept 2019 → 4 Sept 2019 Conference number: 12 http://ranlp.org/archive/ranlp2019/start.php |
Conference
| Conference | International Conference on Recent Advances in Natural Language Processing |
|---|---|
| Abbreviated title | RANLP 2019 |
| Country/Territory | Bulgaria |
| City | Varna |
| Period | 2/09/19 → 4/09/19 |
| Internet address |