Predicting Protein Thermostability through Deep Learning Leveraging 3D Structural Information
In protein engineering, improving thermostability is essential for many industrial and pharmaceutical applications. However, the experimental process of identifying stabilizing mutations is time-consuming due to the enormous search space. With the increasing availability of protein structural and thermostability data, computational approaches using deep learning to identify thermostable candidates are gaining popularity. In this work, we present and benchmark a novel graph neural network, ProtGCN, that incorporates geometric and energetic details of proteins to predict changes in Gibbs free energy (ΔG), a key indicator of thermostability, upon single point mutations. Unlike conventional methods that rely on sequence or structural features, our model uses protein graphs with rich node features, carefully preprocessed from a comprehensive dataset of approximately 4149 mutated sequences across 117 protein families. In addition, ProtGCN is enhanced by incorporating embeddings from the Evolutionary Scale Modeling (ESM) protein language model into the protein graphs. This integration allows ProtGCN (ESM) to outperform comparison models, achieving competitive performance with XGBoost and a protein language model-based multi-layer perceptron on all evaluation metrics, and outperforming all models on further analyses. A strength of ProtGCN (ESM) is its ability to correctly identify and predict stabilizing and destabilizing mutations with extreme effects, which are typically underrepresented in thermostability datasets. These results suggest a promising direction for future computational protein engineering research.
- Publikationsart
- Konferenzbeiträge
- Titel
- Predicting Protein Thermostability through Deep Learning Leveraging 3D Structural Information
- Medien
- Biological Materials Science - A workshop on biogenic, bioinspired, biomimetic and biohybrid materials for innovative optical, photonics and optoelectronics applications
- Band
- 2024
- Autoren
- Ashima Khanna , Florian Haselbeck , Dominik Grimm
- Veröffentlichungsdatum
- 06.06.2024
- Zitation
- Khanna, Ashima; Haselbeck, Florian; Grimm, Dominik (2024): Predicting Protein Thermostability through Deep Learning Leveraging 3D Structural Information . Biological Materials Science - A workshop on biogenic, bioinspired, biomimetic and biohybrid materials for innovative optical, photonics and optoelectronics applications 2024.