Scaling and Generalization of Discrete Diffusion Models for Tumor Phylogenies

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Scaling and Generalization of Discrete Diffusion Models for Tumor Phylogenies

Authors

Sabata, S.; Schwartz, R.

Abstract

Tumor phylogenies - rooted trees encoding clonal ancestry and mutation acquisition - are central to understanding cancer evolution, yet generating realistic phylogenies remains challenging. We investigate whether discrete graph diffusion can learn the structural constraints of tumor phylogenies directly from data. Working with approximately 12,500 synthetic phylogenies across twelve evolutionary regimes, we train graph transformer models that denoise typed graphs through a learned reverse diffusion process. Scaling experiments reveal a non-monotonic capacity-performance relationship: a mid-scale model achieves high structural validity and close distributional match to held-out data, while a deeper model fails under fixed optimization hyperparameters. Low-data cross-regime experiments show that diverse training produces more transferable representations than single-regime specialization. These results establish that phylogenetic structural constraints can be learned implicitly through unconditional discrete diffusion, suggesting a viable path toward generative models of tumor evolution.

Follow Us on

0 comments

Add comment