Date of Award

8-2022

Degree Type

Thesis

Degree Name

M.S.

Degree Program

Computer Science

Department

Computer Science

Major Professor

Dr. Md Tamjidul Hoque

Second Advisor

Dr. Christopher M Summa

Third Advisor

Dr. Atriya Sen

Abstract

Protein-protein interactions in a cell are essential to the characterization and performance of various fundamental biological processes. Due to the tedious, resource-expensive, and time-consuming experimental processes, computational techniques to solve protein pair interaction difficulties have emerged as an active research area in bioinformatics. This research seeks to develop an innovative machine learning-based technique that predicts the interaction of a protein pair based on carefully selected input features and exploits information-rich evolutionary information. We developed a protein-protein interaction predictor, PPILS, that leverages the evolutionary knowledge from the protein language model. We examined several distinct neural network architectures: CNN+LSTM, Transformer, Encoder-Decoder, and FNN and found that the encoder-decoder architecture with light attention performs the best. The method is straightforward; there are only four learnable weight matrices. The model will receive protein representations from the language model, perform one convolution on them to get attention coefficients, and then normalize them along the length dimension using the SoftMax function to generate attention. A second convolution is applied to input features to create values. Then, take the element-wise product of attention and values to construct a representation of the protein. After calculating the sum over the length dimension, a fixed-size protein representation is obtained. This is then concatenated with the maximum length dimension of the data and fed to the decoder. The decoder is our classification engine to predict protein interactions. We found that the PPILS outperformed other cutting-edge techniques for PPI prediction. We believe the proposed method could serve as an essential tool in protein-protein interaction prediction, further accelerating the protein drug discovery process.

Rights

The University of New Orleans and its agents retain the non-exclusive license to archive and make accessible this dissertation or thesis in whole or in part in all forms of media, now or hereafter known. The author retains all other ownership rights to the copyright of the thesis or dissertation.

Share

COinS