Scaling and Generalization of Discrete Diffusion Models for Tumor Phylogenies
Scaling and Generalization of Discrete Diffusion Models for Tumor Phylogenies
Sabata, S.; Schwartz, R.
AbstractTumor phylogenies - rooted trees encoding clonal ancestry and mutation acquisition - are central to understanding cancer evolution, yet generating realistic phylogenies remains challenging. We investigate whether discrete graph diffusion can learn the structural constraints of tumor phylogenies directly from data. Working with approximately 12,500 synthetic phylogenies across twelve evolutionary regimes, we train graph transformer models that denoise typed graphs through a learned reverse diffusion process. Scaling experiments reveal a non-monotonic capacity-performance relationship: a mid-scale model achieves high structural validity and close distributional match to held-out data, while a deeper model fails under fixed optimization hyperparameters. Low-data cross-regime experiments show that diverse training produces more transferable representations than single-regime specialization. These results establish that phylogenetic structural constraints can be learned implicitly through unconditional discrete diffusion, suggesting a viable path toward generative models of tumor evolution.