Detail výsledku

Audio Enhancing With DNN Autoencoder For Speaker Recognition

PLCHOT, O.; BURGET, L.; ARONOWITZ, H.; MATĚJKA, P. Audio Enhancing With DNN Autoencoder For Speaker Recognition. In Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016. Shanghai: IEEE Signal Processing Society, 2016. p. 5090-5094. ISBN: 978-1-4799-9988-0.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Plchot Oldřich, Ing., Ph.D., UPGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., UPGM (FIT)
Aronowitz Hagai
Matějka Pavel, Ing., Ph.D., UPGM (FIT)

Abstrakt

In this paper we present a design of a DNN-based autoencoder for speech enhancement and its use for speaker recognition systems for distant microphones and noisy data. We started with augmenting the Fisher database with artificially noised and reverberated data and trained the autoencoder to map noisy and reverberated speech to its clean version. We use the autoencoder as a preprocessing step in the later stage of modelling in state-of-the-art text-dependent and text-independent speaker recognition systems. We report relative improvements up to 50% for the text-dependent system and up to 48% for the text-independent one. With text-independent system, we present a more detailed analysis on various conditions of NIST SRE 2010 and PRISM suggesting that the proposed preprocessig is a promising and efficient way to build a robust speaker recognition system for distant microphone and noisy data.

Klíčová slova

speaker recognition, denoising, de-reverberation,neural networks, DNN

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2016/plchot… PDF

Rok

2016

Strany

5090–5094

Sborník

Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016

Konference

41th IEEE International Conference on Acoustics, Speech and Signal Processing

ISBN

978-1-4799-9988-0

Vydavatel

IEEE Signal Processing Society

Místo

Shanghai

DOI

10.1109/ICASSP.2016.7472647

UT WoS

000388373405048

EID Scopus

2-s2.0-84973277824

BibTeX

@inproceedings{BUT130961,
  author="Oldřich {Plchot} and Lukáš {Burget} and Hagai {Aronowitz} and Pavel {Matějka}",
  title="Audio Enhancing With DNN Autoencoder For Speaker Recognition",
  booktitle="Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016",
  year="2016",
  pages="5090--5094",
  publisher="IEEE Signal Processing Society",
  address="Shanghai",
  doi="10.1109/ICASSP.2016.7472647",
  isbn="978-1-4799-9988-0",
  url="https://www.fit.vut.cz/research/publication/11139/"
}

Soubory

pdf plchot_icassp2016_0005090.pdf 245 kB

Projekty

Analytika velkých řečových dat pro kontaktní centra, EU, Horizon 2020, zahájení: 2015-01-01, ukončení: 2017-12-31, ukončen
DARPA - Robustní automatický přepis řeči (RATS) - RATS Patrol II, BBN, zahájení: 2015-02-23, ukončení: 2017-03-31, ukončen
Dolování infoRmAcí z řeči Pořízené vzdÁlenými miKrofony, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, zahájení: 2015-10-01, ukončení: 2020-09-30, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)