Result Details

Voice-print transformation for migration between automatic speaker identification systems

GLEMBEK, O.; MATĚJKA, P.; BURGET, L.; SCHWARZ, P.; PEŠÁN, J.; PLCHOT, O. Voice-print transformation for migration between automatic speaker identification systems. Abstract book of the 7th European Academy of Forensic Science Conference. Praha: Criminal Police Department Prague, 2015. p. 345-345. ISBN: 978-80-260-8659-8.

Type

abstract

Language

English

Authors

Glembek Ondřej, Ing., Ph.D., DCGM (FIT)
Matějka Pavel, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Schwarz Petr, Ing., Ph.D., DCGM (FIT)
Pešán Jan, Ing., FIT (FIT), DCGM (FIT)
Plchot Oldřich, Ing., Ph.D., DCGM (FIT)
and others

Keywords

speaker recognition, i-vector transformation

URL

Annotation

This presentation discusses the scenario of migrating from one forensic automatic speaker identification system (FASIS) to another. In FASIS, an audio recording of a reference speaker is used to train the speaker model. This model is then compared with the model of the tested speaker and a comparison score (in the form of log-likelihood ratio) is computed. System migration is usually motivated by improving the system recognition accuracy, typically because of technological upgrade, or because of the necessity of processing new kind of data. Unfortunately, such migration usually results in the incompatibility of speaker models and, therefore, in the inability to compare two models. The solution would be to re-train the speaker models and rebuild a model database; however, it may and most likely will happen that the access to the original audio file is unavailable, e.g. due to legal issues. This work introduces a technique of transforming the original speaker models so that---with a slight loss in the accuracy---they are compatible with the new FASIS models. We presents the results on the NIST SRE 2010 evaluation tasks. Our system is based on the i-vector framework which converts arbitrarily long audio waveform to a fixed-length low-dimensional vector which serves as a speaker model. In this context, the i-vector is sometimes referred to as a voice-print. We use Artificial Neural Networks to restore the original speaker models by mapping them to the new domain. We show that there is approximately 20\% relative increase in error rates when substituting the new test speaker models with the restored ones. Normally, the incompatibility of the original speaker models without having the audio files available would make such task impossible.

Published

2015

Pages

345–345

Book

Abstract book of the 7th European Academy of Forensic Science Conference

Conference

7th European Academy of Forensic Science Conference

ISBN

978-80-260-8659-8

Publisher

Criminal Police Department Prague

Place

Praha

BibTeX

@misc{BUT168557,
  author="Ondřej {Glembek} and Pavel {Matějka} and Lukáš {Burget} and Petr {Schwarz} and Jan {Pešán} and Oldřich {Plchot}",
  title="Voice-print transformation for migration between automatic speaker identification systems",
  booktitle="Abstract book of the 7th European Academy of Forensic Science Conference",
  year="2015",
  pages="345--345",
  publisher="Criminal Police Department Prague",
  address="Praha",
  isbn="978-80-260-8659-8",
  url="https://www.fit.vut.cz/research/publication/10976/",
  note="Abstract"
}

Projects

Big speech data analytics for contact centers, EU, Horizon 2020, start: 2015-01-01, end: 2017-12-31, completed
Centrum excelence IT4Innovations, MŠMT, Operační program Výzkum a vývoj pro inovace, ED1.1.00/02.0070, start: 2011-01-01, end: 2015-12-31, completed
Enabling automatic speaker verification to broad spectrum of users in the security domain, MV, Program bezpečnostního výzkumu České republiky 2010 - 2015, VG20132015129, start: 2013-04-01, end: 2015-09-30, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)