emocpd:基于微环境与深度学习氨基酸预测的蛋白质设计
首发时间:2024-02-05
摘要:计算蛋白质设计是指使用计算方法设计蛋白质,虽然最近几年取得了重大的进展,但目前仍是一个具有挑战性的任务。传统方法使用能量函数和启发算法进行序列设计的效率较低,不满足生物分子大数据时代的需求,其准确率也会受到能量函数和搜索算法的限制。现有的基于深度学习的方法受限于网络的学习能力,无法从稀疏的蛋白质结构中学习到有效信息,使得蛋白质设计的准确率较低。针对以上不足,本文构建了一个名为emocpd深度神经网络的模型,旨在通过分析氨基酸周围的三维原子环境来预测组成蛋白质的每个氨基酸的类别,并根据预测出的具有高概率的潜在氨基酸类别来对蛋白质进行优化,从而实现蛋白质设计的目标。实验结果表明,emocpd在训练集上达到了80%以上的准确率,在两个独立测试集上分别达到了68.33%和62.32%的准确率,超过了所对比的最好方法10%以上。在蛋白质设计上,emocpd的top 3预测结果中包含了wtpetases121e、thernopetaser224q、n233k等设计fastpetase酶的关键突变,有效验证了emocpd在设计优秀蛋白质的潜力。本文提出的方法丰富了蛋白质的设计工具,使用该模型有望提高蛋白质的设计能力与效率。
关键词:
for information in english, please click here
emocpd: protein design based on amino acid microenvironment and deep learning
abstract:computational protein design, which refers to the use of computational methods to design proteins, is still a challenging task, although significant progress has been made in recent years. traditional methods using energy functions and heuristic algorithms for sequence design are inefficient and do not meet the needs of the biomolecular big data era, and their accuracy may be limited by energy functions and search algorithms. existing deep learning-based methods are limited by the learning ability of the network, which cannot effectively learn information from sparse protein structures, resulting in low protein design accuracy. to address the above shortcomings, this paper constructs a model called emocpd deep neural network, which aims to predict the class of each amino acid constituting a protein by analyzing the three-dimensional atomic environment around the amino acids, and optimize the protein based on the predicted potential amino acid classes with high probability, thus achieving the goal of protein design. the experimental results show that emocpd achieves more than 80% accuracy on the training set, and 68.33% and 62.32% accuracy on two independent test sets, outperforming the best methods compared by more than 10%. in terms of protein design, the top 3 prediction results of emocpd contained key mutations for designing fastpetase enzymes such as wtpetases121e, thernopetaser224q and n233k, effectively validating the potential of emocpd in designing excellent proteins. the method proposed in this paper enriches the protein design tools, and the use of this model is expected to improve the ability and efficiency of protein design.
keywords:
基金:
论文图表:
引用
导出参考文献
no.****
动态公开评议
共计0人参与
勘误表
emocpd:基于微环境与深度学习氨基酸预测的蛋白质设计
评论
全部评论0/1000