Deep learning for mining information from figures in biomedical literature

  • Ibrahim Almakky

    Student thesis: Doctoral ThesisDoctor of Philosophy


    The biomedical field has witnessed an exponential growth over the past two decades due to technological leaps, such as next-generation sequencing, and rising concerns over healthcare and food security. Findings in biomedical research are mostly published in research articles, which are stored online in open-access databases. In such articles, researchers tend to use figures to illustrate and summarise some of the most important information concerning experimental settings and results. This information is often not retrievable from the articles’ body of text, and therefore, methods have to be put in place to extract information from those figures. Such information can be later used towards the retrieval of the figures themselves or the articles that contain them.

    This thesis explores the development of deep learning algorithms to facilitate the task of information extraction from biomedical figures. More specifically, the thesis focuses on the visual aspects of biomedical figures and what information can be extracted from the figure-image. With this goal in mind, this research investigates different aspects of representation learning and deep neural networks.

    The thesis presents novel contributions, starting with a supervised deep representation learning method for classification. The development of this method is aimed at automatically extracting features that can enhance the classification performance of deep neural networks in general, and on biomedical figures in particular. Following that, a variety of deep learning approaches for the automatic extraction of visual features from biomedical figures were developed and put forward towards classification. Finally, a novel deep convolutional neural network was proposed to simplify the text localisation problem into a reconstruction one. With promising results for text localisation, text within biomedical figures can be extracted from the detected text regions and employed for figure indexing.
    Date of AwardMay 2020
    Original languageEnglish
    Awarding Institution
    • Coventry University
    SupervisorYih-Ling Hedley (Supervisor) & Vasile Palade (Supervisor)

    Cite this