Science Cast

FoldToken4: Consistent & Hierarchical Fold Language

librarianAugust 5, 2024 8:26am

Views (102)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

FoldToken4: Consistent & Hierarchical Fold Language

bioRxivPDFAugust 4, 2024 12:00am

Authors

Gao, Z.; Tan, C.; Li, S. Z.

Abstract

Creating protein structure language has attracted increasing attention in unifing the modality of protein sequence and structure. While recent works, such as FoldToken1&2&3 have made great progress in this direction, the relationship between languages created by different models at different scales is still unclear. Moreover, models at multiple scales (different code space size, like 25, 26, {middle dot} {middle dot} {middle dot}, 212) need to be trained separately, leading to redundant efforts. We raise the question: Could a single model create multiscale fold languages? In this paper, we propose FoldToken4 to learn the consistency and hierarchy of multiscale fold languages. By introducing multiscale code adapters and code mixing techniques, FoldToken4 can generate multiscale languages from the same model, and discover the hierarchical token-mapping relationships across scales. To the best of our knowledge, FoldToken4 is the first effort to learn multi-scale token consistency and hierarchy in VQ research; Also, it should be more novel in protein structure language learning.

TwitterandLinkedIn

0 comments

Add comment

FoldToken4: Consistent & Hierarchical Fold Language

FoldToken4: Consistent & Hierarchical Fold Language

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments