Publication Details

Robust Speaker Recognition Over Varying Channels

BURGET, L.; BRÜMMER, N.; REYNOLDS, D.; KENNY, P.; PELECANOS, J.; VOGT, R.; CASTALDO, F.; DEHAK, N.; DEHAK, R.; GLEMBEK, O.; KARAM, Z.; NOECKER, J.; NA, H.; COSTIN, C.; HUBEIKA, V.; KAJAREKAR, S.; SCHEFFER, N.; ČERNOCKÝ, J. Robust Speaker Recognition Over Varying Channels. Baltimore: Johns Hopkins University, 2008.

Czech title

Robustní rozpoznávání mluvčího v různých přenosových kanálech

Type

report

Language

English

Authors

Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Brümmer Niko
Reynolds Douglas
Kenny Patrick
Pelecanos Jason
Vogt Robbie
Castaldo Fabio
Dehak Najim
Dehak Reda
Glembek Ondřej, Ing., Ph.D.
Karam Zahi
Noecker John Jr.
Na Hye Young
Costin Ciprian
Hubeika Valiantsina, Ing.
Kajarekar Sachin, Msc.
Scheffer Nicolas
Černocký Jan, prof. Dr. Ing. (DCGM)

URL

Keywords

speaker recognition

Abstract

The report is on Robust Speaker Recognition Over Varying Channels

Annotation

Nowadays, speaker recognition is relatively mature with the basic scheme, where speaker model is trained using target speaker speech and speech from large number of non-target speakers. However, the speech from non-target speakers is typically used only for finding general speech distribution (e.g. UBM). It is not used to find the "directions" important for discriminating between speakers. This scheme is reliable when the training and test data come from the same channel. All current speaker recognition systems are however prone to errors when the channel changes (for example from IP telephone to mobile). In speaker recognition, the "channel" variability can include also to linguistic content of the message, emotions, etc. - all these factors should not be considered by a speaker recognition system. Several techniques, such as feature mapping, eigen-channel adaptation and NAP (nuisance attribute projection) have been devised in the past years to overcome the channel variability. These techniques make use of the large amount of data from many speakers to find and ignore directions with high with-in speaker variability. However, these techniques still do not utilize the data to directly search for directions important for discriminating between speakers.

In an attempt to overcome the above mentioned problem, the research will be concentrate on utilizing the large amount of training data currently available to research community to derive the information, that can help discriminate among speakers and discard the information that can not. We propose direct identification of directions in model parameter space that are the most important for discrimination between speakers. According to our experience from speech and language recognition, the use of discriminative training should significantly improve the performance of acoustic SID system. We also expect that discriminative training will make the explicit modeling of channel variability needless.

The research will be based on an excellent baseline - the STBU system for NIST 2006 SRE evaluations (NIST rules prohibit us to disclose the exact position of the system in the evaluations).

The data to be used during the workshop will include NIST SRE data (telephone) but we will not overhear the requests from the security/defense community and evaluate the investigated techniques also on other data sources (meetings, web-radio, etc) as well as on cross-channel conditions.

Published

2008

Pages

Publisher

Johns Hopkins University

Place

Baltimore

BibTeX

@techreport{BUT91211,
  author="Lukáš {Burget} and Niko {Brümmer} and Douglas {Reynolds} and Patrick {Kenny} and Jason {Pelecanos} and Robbie {Vogt} and Fabio {Castaldo} and Najim {Dehak} and Reda {Dehak} and Ondřej {Glembek} and Zahi {Karam} and John Jr. {Noecker} and Hye Young {Na} and Ciprian {Costin} and Valiantsina {Hubeika} and Sachin {Kajarekar} and Nicolas {Scheffer} and Jan {Černocký}",
  title="Robust Speaker Recognition Over Varying Channels",
  year="2008",
  publisher="Johns Hopkins University",
  address="Baltimore",
  pages="81",
  url="https://www.fit.vut.cz/research/publication/8893/"
}