Result Details

HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification

ZEINALI, H.; SAMETI, H.; BURGET, L. HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification. IEEE-ACM Transactions on Audio Speech and Language Processing, 2017, vol. 25, no. 7, p. 1421-1435. ISSN: 2329-9290.

Type

journal article

Language

English

Authors

Zeinali Hossein, Ph.D.
Sameti Hossein, FIT (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)

Abstract

This article is describes a new HMM structure for text-dependentspeaker verification, which enabled the authors to use the potentialof the HMM to model time sequences along with the establishedi-vector technique.

Keywords

Bottleneck features, DNN, hidden Markov model(HMM), i-vector, text-dependent speaker verification.

URL

Annotation

Abstract-The low-dimensional i-vector representation of speech segments is used in the state-of-the-art text-independent speaker verification systems. However, i-vectors were deemed unsuitable for the text-dependent task, where simpler and older speaker recognition approaches were found more effective. In this work,we propose a straightforward hiddenMarkovmodel (HMM) based extension of the i-vector approach, which allows i-vectors to be successfully applied to text-dependent speaker verification. In our approach, the Universal Background Model (UBM) for training phrase-independent i-vector extractor is based on a set of monophone HMMs instead of the standard Gaussian Mixture Model (GMM). To compensate for the channel variability, we propose to precondition i-vectors using a regularized variant of within-class covariance normalization, which can be robustly estimated in a phrase-dependent fashion on the small datasets available for the text-dependent task. The verification scores are cosine similarities between the i-vectors normalized using phrase-dependent s-norm. The experimental results on RSR2015 and RedDots databases confirm the effectiveness of the proposed approach, especially in rejecting test utterances with a wrong phrase. A simpleMFCC based i-vector/HMM system performs competitively when compared to very computationally expensive DNN-based approaches or the conventional relevance MAP GMM-UBM, which does not allow for compact speaker representations. To our knowledge, this paper presents the best published results obtained with a single system on both RSR2015 and RedDots dataset.

Published

2017

Pages

1421–1435

Journal

IEEE-ACM Transactions on Audio Speech and Language Processing, vol. 25, no. 7, ISSN 2329-9290

DOI

10.1109/TASLP.2017.2694708

UT WoS

000403311100002

EID Scopus

2-s2.0-85019876767

BibTeX

@article{BUT144447,
  author="Hossein {Zeinali} and Hossein {Sameti} and Lukáš {Burget}",
  title="HMM-Based Phrase-Independent i-Vector Extractor for Text-Dependent Speaker Verification",
  journal="IEEE-ACM Transactions on Audio Speech and Language Processing",
  year="2017",
  volume="25",
  number="7",
  pages="1421--1435",
  doi="10.1109/TASLP.2017.2694708",
  issn="2329-9290",
  url="http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7902120"
}

Files

pdf zeinali_ieee_acm transactions2017_07902120.pdf 795 kB

Projects

Big speech data analytics for contact centers, EU, Horizon 2020, start: 2015-01-01, end: 2017-12-31, completed
Information mining in speech acquired by distant microphones, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, start: 2015-10-01, end: 2020-09-30, completed
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, start: 2016-01-01, end: 2020-12-31, completed
Zpracování, zobrazování a analýza multimediálních a 3D dat, BUT, Vnitřní projekty VUT, FIT-S-17-3984, start: 2017-03-01, end: 2020-02-29, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)