Prediction of Antibody Non-Specificity using Protein Language Models and Biophysical Parameters
Prediction of Antibody Non-Specificity using Protein Language Models and Biophysical Parameters
Sakhnini, L. I.; Beltrame, L.; Fulle, S.; Sormanni, P.; Henriksen, A.; Lorenzen, N.; Vendruscolo, M.; Granata, D.
AbstractThe development of therapeutic antibodies requires optimizing target binding affinity and pharmacodynamics, while ensuring high developability potential, including minimizing non-specific binding. In this study, we address this problem by predicting antibody non-specificity by two complementary approaches: (i) antibody sequence embeddings by protein language models (PLMs), and (ii) a comprehensive set of sequence-based biophysical descriptors. These models were trained on human and mouse antibody data from Boughter et al. (2020) and tested on three public datasets: Jain et al. (2017), Shehata et al. (2019) and Harvey et al. (2022). We show that non-specificity is best predicted from the heavy variable domain and heavy-chain complementary variable regions (CDRs). The top performing PLM, a heavy variable domain-based ESM 1v LogisticReg model, resulted in 10-fold cross-validation accuracy of up to 71%. Our biophysical descriptor-based analysis identified the isoelectric point as a key driver of non-specificity. Our findings underscore the importance of biophysical properties in predicting antibody non-specificity and highlight the potential of protein language models for the development of antibody-based therapeutics. To illustrate the use of our approach in the development of lead candidates with high developability potential, we show that it can be extended to therapeutic antibodies and nanobodies.