Date of Award
8-2022
Degree Type
Thesis
Degree Name
M.S.
Degree Program
Computer Science
Department
Computer Science
Major Professor
Dr. Md Tamjidul Hoque
Second Advisor
Dr. Christopher M Summa
Third Advisor
Dr. Atriya Sen
Abstract
Protein-protein interactions in a cell are essential to the characterization and performance of various fundamental biological processes. Due to the tedious, resource-expensive, and time-consuming experimental processes, computational techniques to solve protein pair interaction difficulties have emerged as an active research area in bioinformatics. This research seeks to develop an innovative machine learning-based technique that predicts the interaction of a protein pair based on carefully selected input features and exploits information-rich evolutionary information. We developed a protein-protein interaction predictor, PPILS, that leverages the evolutionary knowledge from the protein language model. We examined several distinct neural network architectures: CNN+LSTM, Transformer, Encoder-Decoder, and FNN and found that the encoder-decoder architecture with light attention performs the best. The method is straightforward; there are only four learnable weight matrices. The model will receive protein representations from the language model, perform one convolution on them to get attention coefficients, and then normalize them along the length dimension using the SoftMax function to generate attention. A second convolution is applied to input features to create values. Then, take the element-wise product of attention and values to construct a representation of the protein. After calculating the sum over the length dimension, a fixed-size protein representation is obtained. This is then concatenated with the maximum length dimension of the data and fed to the decoder. The decoder is our classification engine to predict protein interactions. We found that the PPILS outperformed other cutting-edge techniques for PPI prediction. We believe the proposed method could serve as an essential tool in protein-protein interaction prediction, further accelerating the protein drug discovery process.
Recommended Citation
Howladar, Nayan, "Protein-Protein Interaction Prediction from Language of Biological Coding" (2022). University of New Orleans Theses and Dissertations. 3013.
https://scholarworks.uno.edu/td/3013
Included in
Amino Acids, Peptides, and Proteins Commons, Artificial Intelligence and Robotics Commons
Rights
The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.