Result Details

iVector-Based Discriminative Adaptation for Automatic Speech Recognition

KARAFIÁT, M.; BURGET, L.; MATĚJKA, P.; GLEMBEK, O.; ČERNOCKÝ, J. iVector-Based Discriminative Adaptation for Automatic Speech Recognition. Proceedings of ASRU 2011. Hilton Waikoloa Village, Big Island, Hawaii: IEEE Signal Processing Society, 2011. p. 152-157. ISBN: 978-1-4673-0366-8.

Type

conference paper

Language

English

Authors

Karafiát Martin, Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Matějka Pavel, Ing., Ph.D., DCGM (FIT)
Glembek Ondřej, Ing., Ph.D., FIT (FIT), DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

The iVector is alow-dimensional fixed-length representation of information about speaker and acoustic environment. Toutilize iVectors for adaptation, region dependent linear transforms(RDLT) are discriminatively trained using the MPE criterion on largeamounts of annotated data to extract the relevant information fromiVectors and to compensate speech features. The approach was tested onstandard CTS data. We found it to be complementary to common adaptationtechniques. On a well-tuned RDLT system with standard CMLLR adaptationwe reached an 0.8% additive absolute WER improvement.

Keywords

Automatic speech recognition, I-vector, Discriminative adaptation

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2011/karafiat…

Annotation

This work describes a novel technique for discriminative feature-level adaptation for automatic speech recognition. The concept of iVectors popular in speaker recognition is used to extract information about a speaker or acoustic environment from a speech segment. The iVector is a low-dimensional fixed-length representation of such information. To utilize iVectors for adaptation, region dependent linear transforms (RDLT) are discriminatively trained using the MPE criterion on large amounts of annotated data to extract the relevant information from iVectors and to compensate speech features. The approach was tested on standard CTS data. We found it to be complementary to common adaptation techniques. On a well-tuned RDLT system with standard CMLLR adaptation we reached an 0.8% additive absolute WER improvement.

Published

2011

Pages

152–157

Proceedings

Proceedings of ASRU 2011

Conference

IEEE 2011 Workshop on Automatic Speech Recognition and Understanding

ISBN

978-1-4673-0366-8

Publisher

IEEE Signal Processing Society

Place

Hilton Waikoloa Village, Big Island, Hawaii

BibTeX

@inproceedings{BUT76442,
  author="Martin {Karafiát} and Lukáš {Burget} and Pavel {Matějka} and Ondřej {Glembek} and Jan {Černocký}",
  title="iVector-Based Discriminative Adaptation for Automatic Speech Recognition",
  booktitle="Proceedings of ASRU 2011",
  year="2011",
  pages="152--157",
  publisher="IEEE Signal Processing Society",
  address="Hilton Waikoloa Village, Big Island, Hawaii",
  isbn="978-1-4673-0366-8",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2011/karafiat_asru2011_00152.pdf"
}

Projects

Multilingual recognition and search in speech for electronic dictionaries, MPO, TIP, FR-TI1/034, start: 2009-09-01, end: 2013-08-31, completed
Security-Oriented Research in Information Technology, MŠMT, Institucionální prostředky SR ČR (např. VZ, VC), MSM0021630528, start: 2007-01-01, end: 2013-12-31, running
Speech Recognition under Real-World Conditions, GACR, Standardní projekty, GA102/08/0707, start: 2008-01-01, end: 2011-12-31, completed
Technologies of speech processing for efficient human-machine communication, TAČR, Program aplikovaného výzkumu a experimentálního vývoje ALFA, TA01011328, start: 2011-01-01, end: 2014-12-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)