A data-driven approach for star formation parameterization using symbolic regression
A data-driven approach for star formation parameterization using symbolic regression
Diane M. Salim, Matthew E. Orr, Blakesley Burkhart, Rachel S. Somerville, Miles Cramner
AbstractStar formation (SF) in the interstellar medium (ISM) is fundamental to understanding galaxy evolution and planet formation. However, efforts to develop closed-form analytic expressions that link SF with key influencing physical variables, such as gas density and turbulence, remain challenging. In this work, we leverage recent advancements in machine learning (ML) and use symbolic regression (SR) techniques to produce the first data-driven, ML-discovered analytic expressions for SF using the publicly available FIRE-2 simulation suites. Employing a pipeline based on training the genetic algorithm of SR from an open software package called PySR, in tandem with a custom loss function and a model selection technique which compares candidate equations to analytic approaches to describing SF, we produce symbolic representations of a predictive model for the star formation rate surface density ($\Sigma_\mathrm{SFR}$) averaged over both 10 Myr and 100 Myr based on eight extracted variables from FIRE-2 galaxies. The resulting model that PySR finds best describes SF, on both averaging timescales, features equations that incorporates the surface density of gas, $\Sigma_\mathrm{gas}$, the velocity dispersion of gas $\sigma_{\mathrm{gas,~z}}$ and the surface density of stars $\Sigma_\mathrm{*}$. Furthermore, we find that the equations found for the longer SFR timescale all converge to a scaling-relation-like equation, all of which also closely capture the intrinsic physical scatter of the data within the Kennicutt-Schmidt (KS) plane. This observed convergence to physically interpretable scaling relations at longer SFR timescales demonstrates that our method successfully identifies robust physical relationships rather than fitting to stochastic fluctuations.