Event Title

Prediction of Intrinsically Disordered Protein Regions using Machine Learning Methods

Presenter Information

Hoang Nguyen

College(s)

College of Sciences

Submission Type

Oral Presentation

Description

Many biologically active proteins/protein regions fail to form a stable three-dimensional structure, yet they exhibit biological functions. These proteins are called Intrinsically disordered proteins (IDPs), and the regions are called Intrinsically disordered regions (IDRs). They play vital roles in various biological processes. These disordered regions have significant implications in properly annotating function and drug design for critical diseases. IDRs are structurally and functionally very different from ordered proteins and therefore require special experimental and computational tools for identification and analyses. Thus, the identification of IDRs is a time-consuming task. This research aims to develop a machine learning method to predict disordered regions (IDRs) of proteins. The structural properties of proteins, i.e., secondary structures information, backbone angles, half-sphere exposure, contact numbers, and solvent accessible surface area (ASA), provide useful information about disordered proteins. Furthermore, we incorporate other features, i.e., Position-Specific Scoring Matrix (PSSM), Close Neighbor Correlation Coefficients, to enrich the feature set. We explored some well-known classification algorithms, i.e., Light Gradient Boosting Machine, Logistic Regression, Extra Tree Classifier, Extreme Gradient Boosting method, to select the best machine learning method. We evaluate the proposed method with a training dataset using 10-fold cross-validations and test the model with two independent test datasets. The obtained results are comparable with the existing methods.

Comments

Honorable Mention, Undergraduate Presentation

This document is currently not available here.

Share

COinS
 

Prediction of Intrinsically Disordered Protein Regions using Machine Learning Methods

Many biologically active proteins/protein regions fail to form a stable three-dimensional structure, yet they exhibit biological functions. These proteins are called Intrinsically disordered proteins (IDPs), and the regions are called Intrinsically disordered regions (IDRs). They play vital roles in various biological processes. These disordered regions have significant implications in properly annotating function and drug design for critical diseases. IDRs are structurally and functionally very different from ordered proteins and therefore require special experimental and computational tools for identification and analyses. Thus, the identification of IDRs is a time-consuming task. This research aims to develop a machine learning method to predict disordered regions (IDRs) of proteins. The structural properties of proteins, i.e., secondary structures information, backbone angles, half-sphere exposure, contact numbers, and solvent accessible surface area (ASA), provide useful information about disordered proteins. Furthermore, we incorporate other features, i.e., Position-Specific Scoring Matrix (PSSM), Close Neighbor Correlation Coefficients, to enrich the feature set. We explored some well-known classification algorithms, i.e., Light Gradient Boosting Machine, Logistic Regression, Extra Tree Classifier, Extreme Gradient Boosting method, to select the best machine learning method. We evaluate the proposed method with a training dataset using 10-fold cross-validations and test the model with two independent test datasets. The obtained results are comparable with the existing methods.