An Overview of ASVspoof Challenge - A Community Led Effort towards Voice Spoofing Detection Research
Posted: Jun 23, 2022
Dr. Bhusan Chettri earned his PhD in AI and Speech Technology from Queen Mary University of London. His research focussed on analysis and design of voice spoofing detection using machine learning and AI. In this article Bhusan Chettri gives an insight on ASVspoof following his experience as a participant in two consecutive ASVspoof challenges held in 2017 and 2019 edition. See this for related publications and this for his PhD research work.
It is well acknowledged how vulnerable today’s Automatic Speaker Verification (ASV) systems, trained on vast amount of speech data using complex deep learning algorithms, are. To address the issue, the ASV community came up with the idea of promoting the research in spoofing detection by providing common evaluation protocols (which enables fair comparison of research findings), free spoofing datasets by organising a bi-annual research challenge so called automatic speaker verification and spoofing countermeasures ASVspoof challenge. See the official website for further details here.
ASVspoof is an ASV community driven effort promoting research in developing anti-spoofing algorithms for secure voice biometrics. A number of independent research studies had confirmed the vulnerability of voice biometrics to spoofing attacks, before the ASVspoof series began in 2015. However, these studies were mostly performed on small in-house datasets comprising limited speakers and spoofing attack conditions. Therefore, research results were hard to reproduce and understanding the true generalisability of the reported anti-spoofing solutions in unseen attack conditions was difficult. The main motivation of the ASVspoof series was to overcome these issues by organizing open spoofing challenge evaluations, promoting awareness of the problem, making publicly available spoofing corpora comprising sufficiently varying attack conditions with standard evaluation protocols, and further ensuring transparent research leading to reproducible results.
The first ASVspoof challenge held in 2015 focused on the detection of artificial speech generated using either speech synthesis (TTS) or voice conversion (VC) algorithms in a text-independent setting. Clean speech recorded using high quality microphones was used as bonafide speech and seven VC and three TTS algorithms were used to produce spoofed speech. The second edition of the ASVspoof challenge held in 2017 focussed on text-dependent replay spoofing attack detection. The 2019 edition, ASVspoof 2019, combined both TTS, VC and replay attacks together, using advanced state-of-the art spoofing algorithms and methods to generate spoofed speech samples. The recent edition held in 2021, ASVspoof 2021 used both LA and PA attacks but this edition also added a new track: Audio Fake Detection challenge. In the 2021 edition, the training and development data were not provided to the challenge participants. They were required to use the ASVspoof 2019 edition training and development datasets to train and tune their anti-spoofing systems. This edition only provided the fresh new evaluation set to evaluate the models.
One key observation that is worth noting from the three ASVspoof challenges is the paradigm shift in the use of modelling approaches for spoofing detection. Gaussian mixture models (GMMs), which is a generative model, were popular during the first ASVspoof challenge in 2015 as evident from the winning system of this challenge which is a GMM-based system. However, the 2017 and 2019 spoofing challenges were mostly dominated by data-driven discriminatively trained deep models. The main task, however, in all the three editions of the ASVspoof challenge was to build a standalone countermeasure model (anti-spoofing algorithm) that determines if a given speech recording is bonafide or a fake recording (spoofed). As for the performance evaluation, the equal error rate (EER) was used as a primary metric in the 2015 and 2017 edition. As for the 2019 edition, a recently introduced tandem detection cost function (t-DCF) metric [Kinnunen et al., 2018] was used as a primary metric and EER as the secondary metric.
Thanks to the ASV community, we don't have to worry about putting our own money in purchasing the spoofing datasets. These are made public and can be downloaded at no cost. The ASVspoof 2017 dataset is the first publicly available replay spoofing dataset designed by playing back bonafide audio utterances and re-recording them in real ‘wild’ acoustic conditions. It has been widely used by researchers around the globe. It has two data versions: 1.0 and 2.0. The version 1.0 was used during the ASVspoof 2017 evaluation. Post evaluation due to biases found in the dataset, a corrected version was released by the ASVspoof organisers.
Datasets and metrics: Speaking about replay attack anti-spoofing datasets, Bhusan Chettri explains that the ASVspoof 2017 dataset is the first publicly available replay spoofing dataset designed by playing back bonafide audio utterances and re-recording them in real ‘wild’ acoustic conditions. It has been extensively used in research since its release in 2017 edition of ASVspoof series. The bonafide utterances were taken from a subset of RedDots dataset – which is a dataset for speaker verification collected under wild varied acoustic conditions.
The ASVspoof 2017 dataset has two different versions: version 1.0 and version 2.0. The version 1.0 was used during the official challenge evaluation in 2017. Post evaluation data anomalies were identified that showed biased model decisions, which eventually led to the release of version 2.0 dataset. The 2019 edition combined both the replay spoofing attacks (Physical access - PA) and text-to-speech and voice-conversion attack conditions (so called Logical access – LA) and released the LA and PA datasets respectively. Also, post 2019 challenge evaluation a real replayed utterances - a small subset of real replayed speech utterances were also made publicly available to perform research on replay spoofing attacks. During the latest edition of ASVspoof evaluation, the ASVspoof 2021, no training data were released to the challenge participants. The participants were required to use the ASVspoof 2019 training and development dataset to train and tune their anti-spoofing model parameters. Only a fresh set of evaluation set was released to the participants. For more details on this please see this.
Equal error rate metric (EER) was the primary (and the only metric) used to evaluate anti-spoofing performance during the 2015 and 2017 ASVspoof evaluation. However, in the 2019 edition EER was the secondary metric used where a new metric called tandem detection cost function that jointly optimises the performance of ASV and anti-spoofing system was used to evaluate the model performances of the challenge participants.
In the next article, Bhusan Chettri will be discussing more about different corpuses and the evaluation metrics used in voice anti-spoofing research.
- Bhusan Chettri scholar
- M. Sahidullah et. al. Introduction to Voice Presentation Attack Detection and Recent Advances, 2019.
- Bhusan Chettri. Voice biometric system security: Design and analysis of countermeasures for replay attacks. PhD thesis, Queen Mary University of London, August 2020.
- ASVspoof: The automatic speaker verification spoofing and countermeasures challenge website.
Asvspoof is an Asv community driven effort promoting research in developing anti-spoofing algorithms for secure voice biometrics.