Short-Read Sequencing Benchmarking with Donor-Specific Assemblies
Short-Read Sequencing Benchmarking with Donor-Specific Assemblies
McGee, S. R.; Smith, J. D.; Frazar, C. D.; Ryke, E.; Vollger, M. R.; Kwon, Y.; Bennett, J. T.; Eichler, E. E.; Stergachis, A.; Wei, C.-L.
AbstractBackground High-throughput short-read sequencing has become a core technology for genomics, but the rapid expansion of available platforms has made it increasingly important to benchmark them under standardized conditions. A major challenge is that conventional reference-based comparisons confound true sequencing errors with inherited variation and reference bias, making it difficult to isolate platform-intrinsic performance. Results We benchmarked nine short-read chemistries across seven DNA sequencers using two highly characterized benchmark samples, HG002 and COLO829BL, together with donor-specific assemblies to measure sequencing errors against sample-matched genomic references. This strategy separated authentic platform errors from biological divergence and revealed substantial differences in substitution, indel, read-position, and sequence-context error profiles. Element AVITI UltraQ and Roche SBX-D showed the lowest substitution error rates, whereas Ultima and Roche chemistries exhibited the strongest indel-associated biases. We also found pronounced platform-specific effects in low-complexity regions and trinucleotide contexts, including homopolymer-associated errors and context-dependent substitution skews that are directly relevant to rare-variant detection. In addition, we show that donor-specific references are essential for unbiased base-quality recalibration because they minimize reference bias and more faithfully support cross-platform comparison and low-frequency variant-calling thresholds. Conclusions Donor-specific assembly-based benchmarking provides a robust framework for measuring true short-read sequencing errors and comparing platforms on a common, sample-matched basis. Our results establish a comprehensive reference for the community and show that authentic error profiles can guide platform selection, quality filtering, and improved detection of rare somatic variation.