BABAPPA: An Automated Pipeline for Codeml Mediated Selection Analysis Integrating PRANK and IQ-TREE2
BABAPPA: An Automated Pipeline for Codeml Mediated Selection Analysis Integrating PRANK and IQ-TREE2
Sinha, K.
AbstractDetection of episodic positive selection is fundamental to understanding adaptive evolution, yet conventional codeml-based workflows require extensive manual configuration and suffer from limited parallelization. To overcome these limitations, we developed BABAPPA (BAsh-Based Automated Parallel Positive selection Analysis), an end-to-end, fully automated pipeline that integrates sequence quality control, codon-aware alignment, phylogenetic inference, foreground branch designation, codeml analyses (site, branch and branch-site), likelihood ratio testing with Benjamini-Hochberg correction, and result summarization. Validation on three Brassicaceae orthogroups showed that BABAPPA yields identical positive-selection calls to a standard codeml workflow while reducing mean wall-clock time from 2,578.78 s to 1,430.22 s (a 44.6 % reduction, corresponding to an 80.31 % increase in processing speed; p = 8.227e-5). The robust automation of BABAPPA, combined with its significant efficiency improvements, makes it exceptionally well-suited for large-scale genomic surveys and scientific publications.