SquiDBase: a community resource of raw nanopore data from microbes
SquiDBase: a community resource of raw nanopore data from microbes
Cuypers, W. L.; Ceylan, H.; Turcksin, E.; Raes, L.; de Vrij, N.; Michiels, J.; Coppens, S.; de Block, T.; Jansen, D.; Arien, K. K.; Selhorst, P.; Vercauteren, K.; Gauglitz, J. M.; Bittremieux, W.; Laukens, K.
AbstractExperimental data-driven research relies on raw data, which consist of unprocessed experimental outputs, whereas derived data are transformed through a number of processing steps to reveal specific insights. Such processing, however, can potentially introduce biases or information loss, compromising transparency and reproducibility. In nucleic acid sequencing, nucleotide sequences stored in the FASTQ format are widely shared, but FASTQ files are generated from platform-specific raw data outputs, which vary depending on the sequencing platform used. The raw data produced by Oxford Nanopore Technologies (ONT) sequencing devices contain valuable biological information and are also useful to improve data processing methods, which includes basecaller optimisation and modification detection. Increasing attention goes to exploring these raw signals to develop algorithms that could improve ONT device portability and enhance target enrichment efficiency through adaptive sampling. Despite these benefits, the storage and sharing of raw nanopore data remain limited due to technical constraints and the lack of appropriate, standardised and centralised infrastructure. To address this challenge, we developed SquiDBase (https://squidbase.org), a dedicated repository to collect raw microbial nanopore sequencing data. To maximise the utility of SquiDBase from its inception, we built SquiDPipe, a Nextflow pipeline for the automated removal of human or unwanted reads from raw nanopore data. Additionally, we sequenced 24 clinically relevant viruses and incorporated them into SquiDBase, significantly expanding the diversity of publicly available reference datasets. By offering a centralised, open-access raw data collection platform, SquiDBase facilitates data sharing, enhances reproducibility, and supports the development and benchmarking of novel computational tools, reinforcing open science in nanopore sequencing research.