A fast and efficient semantic short text similarity metric

David Croft, Simon Coupland, Jethro Shell, Stephen Brown

    Research output: Chapter in Book/Report/Conference proceedingConference proceedingpeer-review

    32 Citations (Scopus)

    Abstract

    The semantic comparison of short sections of text is an emerging aspect of Natural Language Processing (NLP). In this paper we present a novel Short Text Semantic Similarity (STSS) method, Lightweight Semantic Similarity (LSS), to address the issues that arise with sparse text representation. The proposed approach captures the semantic information contained when comparing text to process the similarity. The methodology combines semantic term similarities with a vector similarity method used within statistical analysis. A modification of the term vectors using synset similarity values addresses issues that are encountered with sparse text. LSS is shown to be comparable to current semantic similarity approaches, LSA and STASIS, whilst having a lower computational footprint.
    Original languageEnglish
    Title of host publication13th UK Workshop on Computational Intelligence (UKCI), 2013
    PublisherIEEE
    Pages221-227
    Number of pages7
    ISBN (Print)978­1­4799­1568­2
    DOIs
    Publication statusPublished - 2013
    Event13th UK Workshop on Computational Intelligence (UKCI) 2013 - University of Surrey, Guildford, United Kingdom
    Duration: 9 Sept 201311 Sept 2013
    Conference number: 13
    http://ukci2013.cs.surrey.ac.uk/

    Workshop

    Workshop13th UK Workshop on Computational Intelligence (UKCI) 2013
    Abbreviated titleUKCI 2013
    Country/TerritoryUnited Kingdom
    CityGuildford
    Period9/09/1311/09/13
    Internet address

    Keywords

    • Vectors
    • Semantics
    • Measurment
    • Natural language processing
    • Educational institutions
    • Media
    • Electronic mail

    Fingerprint

    Dive into the research topics of 'A fast and efficient semantic short text similarity metric'. Together they form a unique fingerprint.

    Cite this