Date of Award
Dr. Md Tamjidul Hoque
Dr. Christopher M Summa
Dr. Atriya Sen
Protein-protein interactions in a cell are essential to the characterization and performance of various fundamental biological processes. Due to the tedious, resource-expensive, and time-consuming experimental processes, computational techniques to solve protein pair interaction difficulties have emerged as an active research area in bioinformatics. This research seeks to develop an innovative machine learning-based technique that predicts the interaction of a protein pair based on carefully selected input features and exploits information-rich evolutionary information. We developed a protein-protein interaction predictor, PPILS, that leverages the evolutionary knowledge from the protein language model. We examined several distinct neural network architectures: CNN+LSTM, Transformer, Encoder-Decoder, and FNN and found that the encoder-decoder architecture with light attention performs the best. The method is straightforward; there are only four learnable weight matrices. The model will receive protein representations from the language model, perform one convolution on them to get attention coefficients, and then normalize them along the length dimension using the SoftMax function to generate attention. A second convolution is applied to input features to create values. Then, take the element-wise product of attention and values to construct a representation of the protein. After calculating the sum over the length dimension, a fixed-size protein representation is obtained. This is then concatenated with the maximum length dimension of the data and fed to the decoder. The decoder is our classification engine to predict protein interactions. We found that the PPILS outperformed other cutting-edge techniques for PPI prediction. We believe the proposed method could serve as an essential tool in protein-protein interaction prediction, further accelerating the protein drug discovery process.
Howladar, Nayan, "Protein-Protein Interaction Prediction from Language of Biological Coding" (2022). University of New Orleans Theses and Dissertations. 3013.