Event Title
Prediction of Intrinsically Disordered Protein Regions using Machine Learning Methods
Faculty Sponsor
Md Tamjidul Hoque
College(s)
College of Sciences
Submission Type
Oral Presentation
Description
Many biologically active proteins/protein regions fail to form a stable three-dimensional structure, yet they exhibit biological functions. These proteins are called Intrinsically disordered proteins (IDPs), and the regions are called Intrinsically disordered regions (IDRs). They play vital roles in various biological processes. These disordered regions have significant implications in properly annotating function and drug design for critical diseases. IDRs are structurally and functionally very different from ordered proteins and therefore require special experimental and computational tools for identification and analyses. Thus, the identification of IDRs is a time-consuming task. This research aims to develop a machine learning method to predict disordered regions (IDRs) of proteins. The structural properties of proteins, i.e., secondary structures information, backbone angles, half-sphere exposure, contact numbers, and solvent accessible surface area (ASA), provide useful information about disordered proteins. Furthermore, we incorporate other features, i.e., Position-Specific Scoring Matrix (PSSM), Close Neighbor Correlation Coefficients, to enrich the feature set. We explored some well-known classification algorithms, i.e., Light Gradient Boosting Machine, Logistic Regression, Extra Tree Classifier, Extreme Gradient Boosting method, to select the best machine learning method. We evaluate the proposed method with a training dataset using 10-fold cross-validations and test the model with two independent test datasets. The obtained results are comparable with the existing methods.
Prediction of Intrinsically Disordered Protein Regions using Machine Learning Methods
Many biologically active proteins/protein regions fail to form a stable three-dimensional structure, yet they exhibit biological functions. These proteins are called Intrinsically disordered proteins (IDPs), and the regions are called Intrinsically disordered regions (IDRs). They play vital roles in various biological processes. These disordered regions have significant implications in properly annotating function and drug design for critical diseases. IDRs are structurally and functionally very different from ordered proteins and therefore require special experimental and computational tools for identification and analyses. Thus, the identification of IDRs is a time-consuming task. This research aims to develop a machine learning method to predict disordered regions (IDRs) of proteins. The structural properties of proteins, i.e., secondary structures information, backbone angles, half-sphere exposure, contact numbers, and solvent accessible surface area (ASA), provide useful information about disordered proteins. Furthermore, we incorporate other features, i.e., Position-Specific Scoring Matrix (PSSM), Close Neighbor Correlation Coefficients, to enrich the feature set. We explored some well-known classification algorithms, i.e., Light Gradient Boosting Machine, Logistic Regression, Extra Tree Classifier, Extreme Gradient Boosting method, to select the best machine learning method. We evaluate the proposed method with a training dataset using 10-fold cross-validations and test the model with two independent test datasets. The obtained results are comparable with the existing methods.
Comments
Honorable Mention, Undergraduate Presentation