ORCID ID
kshah2@uno.edu
Date of Award
12-2022
Degree Type
Thesis-Restricted
Degree Name
M.S.
Degree Program
Computer Science
Department
Computer Science
Major Professor
Md Tamjidul Hoque
Abstract
Noncoding RNAs (ncRNAs) play a significant role in several fundamental biological processes by binding to RNA-binding proteins (RBPs); hence, it is necessary to study ncRNA-protein interaction (RPI). Several classic and deep-learning machine learning models have been pro-posed to predict RPI. These models first need to collect features of RNA and protein, such as physicochemical properties, secondary and tertiary structure, et cetera, before feeding them into the model. More recently, after the advancement of high throughput sequenc-ing and the improvement in Natural Language Processing (NLP), transformer models like BERT-RBP and Evolutionary Scaling Model (ESM) can be trained to automatically extract feature representations, containing both low and high-level information, from RNA and pro-tein sequences directly. This method could make manual feature collection optional. Hence, in this study, we compare the performance of such language-based features against manually created features to predict the interaction probability between a protein and an RNA.
Recommended Citation
Shah, Krishna, "ncRNA-protein Interaction Prediction using Language-based Features" (2022). University of New Orleans Theses and Dissertations. 3028.
https://scholarworks.uno.edu/td/3028
Rights
The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.