Publications
2023
1.
Carofilis-Vasco, Andrés; Alegre, Enrique; Fidalgo, Eduardo; Fernández-Robles, Laura
Improvement of accent classification models through Grad-Transfer from Spectrograms and Gradient-weighted Class Activation Mapping Artículo de revista
En: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, (Publisher: IEEE).
Resumen | Enlaces | BibTeX | Etiquetas: Accent Classification, deep learning, Grad-Transfer, machine learning
@article{carofilis-vasco_improvement_2023,
title = {Improvement of accent classification models through Grad-Transfer from Spectrograms and Gradient-weighted Class Activation Mapping},
author = {Andrés Carofilis-Vasco and Enrique Alegre and Eduardo Fidalgo and Laura Fernández-Robles},
url = {https://ieeexplore.ieee.org/abstract/document/10190103},
year = {2023},
date = {2023-01-01},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
abstract = {This article introduces a new method for accent classification using a descriptor called Grad-Transfer, which is extracted using Gradient-weighted Class Activation Mapping (Grad-CAM) based on convolutional neural network (CNN) interpretability. The proposed methodology transfers the knowledge gained by CNNs to classical machine learning algorithms. The study shows that Grad-CAM highlights key regions of spectrograms important for accent prediction, and the generated Grad-Transfer descriptors effectively distinguish different accents. Experiments on the Voice Cloning Toolkit dataset demonstrate an improvement in accent classification accuracy and recall when using Grad-Transfer, outperforming models trained directly on spectrograms.},
note = {Publisher: IEEE},
keywords = {Accent Classification, deep learning, Grad-Transfer, machine learning},
pubstate = {published},
tppubtype = {article}
}
This article introduces a new method for accent classification using a descriptor called Grad-Transfer, which is extracted using Gradient-weighted Class Activation Mapping (Grad-CAM) based on convolutional neural network (CNN) interpretability. The proposed methodology transfers the knowledge gained by CNNs to classical machine learning algorithms. The study shows that Grad-CAM highlights key regions of spectrograms important for accent prediction, and the generated Grad-Transfer descriptors effectively distinguish different accents. Experiments on the Voice Cloning Toolkit dataset demonstrate an improvement in accent classification accuracy and recall when using Grad-Transfer, outperforming models trained directly on spectrograms.