Science Cast

Emergent Biological Realism in RL-Trained DNA Language Models

librarianMarch 27, 2026 4:04am

Views (2)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Emergent Biological Realism in RL-Trained DNA Language Models

bioRxivPDFMarch 26, 2026 12:00am

Authors

Thiel, M.; Cunningham, A.; Barnes, C. P.

Abstract

Reinforcement learning has driven the mass adoption of large language models by unlocking unexpected capabilities, yet this approach remains largely underexplored for generative DNA models. We investigate whether similar post-training techniques can induce emergent biological realism in DNA language models, using plasmid generation as a testbed due to plasmids' relative simplicity, well-characterized functional constraints, and ubiquity in biotechnology. Using Group Relative Policy Optimization with a reward function based on constraints from engineered biology, our model achieves a 77% quality control pass rate compared to 5% for the pretrained baseline. Remarkably, beyond explicitly optimized features, the model exhibits surprising biological parallels: generated sequences match natural plasmids in thermodynamic stability, codon usage patterns, and ORF length distributions, properties not explicitly optimized in the reward function. These results suggest that RL post-training can steer DNA language models toward biologically coherent regions of sequence space, analogous to how such techniques unlock unexpected capabilities in natural language models, particularly in verifiable domains.

TwitterandLinkedIn

0 comments

Add comment

Emergent Biological Realism in RL-Trained DNA Language Models

Emergent Biological Realism in RL-Trained DNA Language Models

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments