GraphHDBSCAN*: Graph-based Hierarchical Clustering on High Dimensional Single-cell RNA Sequencing Data
GraphHDBSCAN*: Graph-based Hierarchical Clustering on High Dimensional Single-cell RNA Sequencing Data
Ghoreishi, S. A.; Szmigiel, A. W.; Nagai, J. S.; Gesteira Costa Filho, I.; Zimek, A.; Campello, R. J. G. B.
AbstractSingle-cell RNA sequencing (scRNA-seq) is widely used to resolve cellular heterogeneity across thousands to millions of cells. A major challenge is to identify biologically meaningful cell populations while preserving their hierarchical organization, because broad cell types frequently split into more specialized subtypes. However, state-of-the-art approaches mostly focus on flat partitions and ignore the hierarchical structure of single-cell data. Here we introduce GraphHDBSCAN*, a graph-based, hyperparameter-free extension of HDBSCAN* that performs hierarchical density-based clustering on a graph representation of the data, enabling robust recovery of both single-level and hierarchical relationships in high-dimensional and sparse datasets. We evaluate GraphHDBSCAN* across multiple scRNA-seq datasets and show that it recovers biologically meaningful hierarchies that reveal fine-grained structure in complex data, including monocyte subpopulations. In addition, the method yields high-quality flat partitions that outperform widely used community-detection methods.