Automated bioacoustic analysis aids understanding and protection of both
marine and terrestrial animals and their habitats across extensive
spatiotemporal scales, and typically involves analyzing vast collections of
acoustic data. With the advent of deep learning models, classification of
important signals from these datasets has markedly improved. These models power
critical data analyses for research and decision-making in biodiversity
mo...
more Automated bioacoustic analysis aids understanding and protection of both
marine and terrestrial animals and their habitats across extensive
spatiotemporal scales, and typically involves analyzing vast collections of
acoustic data. With the advent of deep learning models, classification of
important signals from these datasets has markedly improved. These models power
critical data analyses for research and decision-making in biodiversity
monitoring, animal behaviour studies, and natural resource management. However,
deep learning models are often data-hungry and require a significant amount of
labeled training data to perform well. While sufficient training data is
available for certain taxonomic groups (e.g., common bird species), many
classes (such as rare and endangered species, many non-bird taxa, and
call-type), lack enough data to train a robust model from scratch. This study
investigates the utility of feature embeddings extracted from large-scale audio
classification models to identify bioacoustic classes other than the ones these
models were originally trained on. We evaluate models on diverse datasets,
including different bird calls and dialect types, bat calls, marine mammals
calls, and amphibians calls. The embeddings extracted from the models trained
on bird vocalization data consistently allowed higher quality classification
than the embeddings trained on general audio datasets. The results of this
study indicate that high-quality feature embeddings from large-scale acoustic
bird classifiers can be harnessed for few-shot transfer learning, enabling the
learning of new classes from a limited quantity of training data. Our findings
reveal the potential for efficient analyses of novel bioacoustic tasks, even in
scenarios where available training data is limited to a few samples.
less