Long non-coding RNAs (lncRNAs) are a class of non-coding RNAs which play a significant role in several biological processes. Accurate identification and sub-classification of lncRNAs is crucial for exploring their characteristic functions in the genome as most coding potential computation (CPC) tools fail to accurately identify, classify and predict their biological functions in plant species. In this study, a novel computational framework called LncRNA identification and function prediction tool (LIFT) has been developed, which implements least absolute shrinkage and selection operator (LASSO) optimisation and iterative random forests classification for selection of optimal features, a novel position-based classification (PBC) method for sub-classifying lncRNAs into different classes, and a Bayesian-based function prediction approach for annotating lncRNA transcripts. Using LASSO, LIFT selected 31 optimal features and achieved a 15-30% improvement in the prediction accuracy on plant species when evaluated against state-of-the-art CPC tools. Using PBC, LIFT successfully identified the intergenic and antisense transcripts with greater accuracy in the A. thaliana and Z. mays datasets.
|Number of pages
|International Journal of Bioinformatics Research and Applications
|Early online date
|17 Jan 2022
|E-pub ahead of print - 17 Jan 2022
Bibliographical noteCopyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders.
This document is the author’s post-print version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it.
- Bayesian Markov random fields
- Function prediction
- Iterative random forests
- Least absolute shrinkage and selection operator
- Long non-coding RNAs
- Position-based classification
ASJC Scopus subject areas