FoldToken4: Consistent & Hierarchical Fold Language

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

FoldToken4: Consistent & Hierarchical Fold Language

Authors

Gao, Z.; Tan, C.; Li, S. Z.

Abstract

Creating protein structure language has attracted increasing attention in unifing the modality of protein sequence and structure. While recent works, such as FoldToken1&2&3 have made great progress in this direction, the relationship between languages created by different models at different scales is still unclear. Moreover, models at multiple scales (different code space size, like 25, 26, {middle dot} {middle dot} {middle dot}, 212) need to be trained separately, leading to redundant efforts. We raise the question: Could a single model create multiscale fold languages? In this paper, we propose FoldToken4 to learn the consistency and hierarchy of multiscale fold languages. By introducing multiscale code adapters and code mixing techniques, FoldToken4 can generate multiscale languages from the same model, and discover the hierarchical token-mapping relationships across scales. To the best of our knowledge, FoldToken4 is the first effort to learn multi-scale token consistency and hierarchy in VQ research; Also, it should be more novel in protein structure language learning.

Follow Us on

0 comments

Add comment