PepSeek: Universal Functional Peptide Discovery with Cooperation Between Specialized Deep Learning Models and Large Language Model
PepSeek: Universal Functional Peptide Discovery with Cooperation Between Specialized Deep Learning Models and Large Language Model
Gong, H.; Wang, Y.; Kong, Q.; Li, X.; Li, L.; Wan, B.; Zhao, Y.; Chen, G.; Chen, J.; Zhang, J.; Yu, Y.; Yang, X.; Zuo, X.; Li, Y.
AbstractRecent advances in computational foundation models have revitalized the pipeline for scientific discovery. However, due to the scarcity of wet-lab validated data, developing large foundation models tailored for functional peptide discovery remains costly and challenging. In contrast, conventional specialized deep learning models require labor-intensive data collection and exhibit poor generalization to unseen data or tasks. Here, we introduce PepSeek, a universal approach for peptide discovery that synergistically integrates the most advanced large language model (LLM) with specialized small models. PepSeek harnesses the robust reasoning and generalization capabilities of LLM while leveraging the high predictive accuracy of specialized models trained for tasks such as antimicrobial activity prediction and functional peptide generation. Our framework uniquely combines the flexibility and broad applicability of LLMs with the task-specific precision of specialized models. We have devised multiple collaborative strategies and task-specific modules demonstrating leading performance in peptide identification and generation. Notably, PepSeek achieves remarkable zero-shot prediction accuracy for peptides with diverse functionalities. We used PepSeek to identify a group of broad-spectrum antimicrobial peptide that exhibits low toxicity and high activity against drug-resistant bacteria, with the best surpassing all peptides currently undergoing clinical trials. Our framework also establishes a new pipeline for scientific discovery with the help of LLM and specialized models.