RELISH-DB a large expert-curated database for benchmarking biomedical literature search.

Peter Brown, RELISH Consortium , Yaoqi Zhou

    Research output: Contribution to journalArticle

    Abstract

    Document recommendation systems for searching relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium of more than 1,500 scientists in 84 countries, who have collectively annotated the relevance of over 180,000 PubMed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data covers 76% of all unique PubMed MESH descriptors. No systematic biases were observed across different experience levels, research fields, or even time spent on annotations. More importantly, the same document pairs annotated by different researchers are highly consistent with each other. We further show that three representative baseline methods [Okapi Best Matching 25 (BM25), Term Frequency–Inverse Document Frequency (TF-IDF), and PubMed Related Articles (PMRA)] have similar overall performance. The database server located at https://relishdb.ict.griffith.edu.au is freely available for data downloading and blind testing of new methods.
    Original languageEnglish
    Pages (from-to)1-61
    Number of pages61
    JournalScientific data
    Publication statusSubmitted - 31 Jan 2019

    Fingerprint

    benchmarking
    Benchmarking
    Seed
    expert
    Annotation
    Recommender systems
    Cover
    Recommendation System
    Servers
    Gold
    Descriptors
    gold standard
    Baseline
    Testing
    Server
    Mesh
    Benchmark
    Term
    literature
    Data base

    Cite this

    Brown, P., RELISH Consortium , & Zhou, Y. (2019). RELISH-DB a large expert-curated database for benchmarking biomedical literature search.. Manuscript submitted for publication

    RELISH-DB a large expert-curated database for benchmarking biomedical literature search. / Brown, Peter; RELISH Consortium ; Zhou, Yaoqi.

    In: Scientific data, 31.01.2019, p. 1-61.

    Research output: Contribution to journalArticle

    Brown, Peter ; RELISH Consortium ; Zhou, Yaoqi. / RELISH-DB a large expert-curated database for benchmarking biomedical literature search. In: Scientific data. 2019 ; pp. 1-61.
    @article{04045a357a1c4e29a5cb7c3fd29f2a97,
    title = "RELISH-DB a large expert-curated database for benchmarking biomedical literature search.",
    abstract = "Document recommendation systems for searching relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium of more than 1,500 scientists in 84 countries, who have collectively annotated the relevance of over 180,000 PubMed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data covers 76{\%} of all unique PubMed MESH descriptors. No systematic biases were observed across different experience levels, research fields, or even time spent on annotations. More importantly, the same document pairs annotated by different researchers are highly consistent with each other. We further show that three representative baseline methods [Okapi Best Matching 25 (BM25), Term Frequency–Inverse Document Frequency (TF-IDF), and PubMed Related Articles (PMRA)] have similar overall performance. The database server located at https://relishdb.ict.griffith.edu.au is freely available for data downloading and blind testing of new methods.",
    author = "Peter Brown and {RELISH Consortium} and Yaoqi Zhou and Kim Bul",
    year = "2019",
    month = "1",
    day = "31",
    language = "English",
    pages = "1--61",
    journal = "Scientific data",
    issn = "2052-4463",
    publisher = "Nature Publishing Group",

    }

    TY - JOUR

    T1 - RELISH-DB a large expert-curated database for benchmarking biomedical literature search.

    AU - Brown, Peter

    AU - RELISH Consortium

    AU - Zhou, Yaoqi

    AU - Bul, Kim

    PY - 2019/1/31

    Y1 - 2019/1/31

    N2 - Document recommendation systems for searching relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium of more than 1,500 scientists in 84 countries, who have collectively annotated the relevance of over 180,000 PubMed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data covers 76% of all unique PubMed MESH descriptors. No systematic biases were observed across different experience levels, research fields, or even time spent on annotations. More importantly, the same document pairs annotated by different researchers are highly consistent with each other. We further show that three representative baseline methods [Okapi Best Matching 25 (BM25), Term Frequency–Inverse Document Frequency (TF-IDF), and PubMed Related Articles (PMRA)] have similar overall performance. The database server located at https://relishdb.ict.griffith.edu.au is freely available for data downloading and blind testing of new methods.

    AB - Document recommendation systems for searching relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium of more than 1,500 scientists in 84 countries, who have collectively annotated the relevance of over 180,000 PubMed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data covers 76% of all unique PubMed MESH descriptors. No systematic biases were observed across different experience levels, research fields, or even time spent on annotations. More importantly, the same document pairs annotated by different researchers are highly consistent with each other. We further show that three representative baseline methods [Okapi Best Matching 25 (BM25), Term Frequency–Inverse Document Frequency (TF-IDF), and PubMed Related Articles (PMRA)] have similar overall performance. The database server located at https://relishdb.ict.griffith.edu.au is freely available for data downloading and blind testing of new methods.

    M3 - Article

    SP - 1

    EP - 61

    JO - Scientific data

    JF - Scientific data

    SN - 2052-4463

    ER -