Genomic Foundation Models Reveal Chromatin-Domain-Scale Transposable Element Impacts on Rice Genome Architecture

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Genomic Foundation Models Reveal Chromatin-Domain-Scale Transposable Element Impacts on Rice Genome Architecture

Authors

fan, j.; Zhao, H.; lv, q.; wang, x.; Man, r.; xie, n.; zhao, z.

Abstract

Alignment-based detection of transposable element (TE) insertion polymorphisms suffers from reference bias and multi-mapping errors in repetitive genomic regions, creating a fundamental validation bottleneck for population-scale structural variant catalogs. Here, we demonstrate that the OneGenome-Rice (OGR) genomic foundation model (GFM)-a 1.25 billion parameter Mixtral architecture trained on 422 rice genomes without TE annotations-provides an entirely orthogonal, alignment-free approach that resolves TE-mediated structural divergence at chromatin-domain resolution. At the CTB4a cold-tolerance locus on chromosome 4, OGR embeddings revealed that the aus subpopulation (NONA_BOKRA) carries 2.2-fold higher structural divergence from indica than japonica, consistent with its 728 subpopulation-exclusive cold-protective TE insertions. Sliding-window analysis across 4.4 megabases identified a 25.6-fold divergence enhancement at TE clusters relative to the conserved CTB4a gene body. Critically, the minimal effective resolution was established at approximately 20 kilobases-corresponding to the median size of topologically associating domains (TADs) in the rice genome-while individual TE sites at 500 base pairs were undetectable (P = 0.94). Non-neural baselines confirmed the signal derives from learned representations of genomic context rather than simple nucleotide statistics. These findings establish GFMs as orthogonal validation tools for population-scale TE genotyping and provide computational evidence that TE functional effects are organized at the chromatin-domain level, with direct implications for prioritizing functional TE variants in crop breeding.

Follow Us on

0 comments

Add comment