Result Details

Subspace Gaussian mixture models for speech recognition

POVEY, D.; BURGET, L.; AGARWAL, M.; AKYAZI, P.; FENG, K.; GHOSHAL, A.; GLEMBEK, O.; GOEL, N.; KARAFIÁT, M.; RASTROW, A.; ROSE, R.; SCHWARZ, P.; THOMAS, S. Subspace Gaussian mixture models for speech recognition. Proc. International Conference on Acoustics, Speech, and Signal Processing. Proc. International Conference on Acoustics, Speech, and Signal Processing. Dallas: IEEE Signal Processing Society, 2010. no. 3, p. 4330-4333. ISBN: 978-1-4244-4296-6. ISSN: 1520-6149.

Type

conference paper

Language

English

Authors

Povey Daniel
Burget Lukáš, doc. Ing., Ph.D., DCGM (FIT)
Agarwal Mohit
Akyazi Pinar
Feng Kai
Ghoshal Arnab
Glembek Ondřej, Ing., Ph.D., DCGM (FIT)
Goel Nagendra
Karafiát Martin, Ing., Ph.D., DCGM (FIT)
Rastrow Ariya
Rose Richard
Schwarz Petr, Ing., Ph.D., DCGM (FIT)
Thomas Samuel

Abstract

The paper is on subspace Gaussian mixture models for speech recognition. We describe an acoustic modeling approach in which all phonetic states share a common GMM structure.

Keywords

Speech Recognition, Hidden Markov Models, Gaussian Mixture Models

URL

https://www.fit.vut.cz/research/group/speech/public/publi/2010/povey_icassp2010…

Annotation

We describe an acoustic modeling approach in which all phonetic states share a common Gaussian Mixture Model structure, and the means and mixture weights vary in a subspace of the total parameter space. We call this a Subspace Gaussian Mixture Model (SGMM). Globally shared parameters define the subspace. This style of acoustic model allows for a much more compact representation and gives better results than a conventional modeling approach, particularly with smaller amounts of training data.

Published

2010

Pages

4330–4333

Journal

Proc. International Conference on Acoustics, Speech, and Signal Processing, vol. 2010, no. 3, ISSN 1520-6149

Proceedings

Proc. International Conference on Acoustics, Speech, and Signal Processing

Conference

International Conference on Acoustics, Speech, and Signal Processing 2010

ISBN

978-1-4244-4296-6

Publisher

IEEE Signal Processing Society

Place

Dallas

BibTeX

@inproceedings{BUT37026,
  author="Daniel {Povey} and Lukáš {Burget} and Mohit {Agarwal} and Pinar {Akyazi} and Kai {Feng} and Arnab {Ghoshal} and Ondřej {Glembek} and Nagendra {Goel} and Martin {Karafiát} and Ariya {Rastrow} and Richard {Rose} and Petr {Schwarz} and Samuel {Thomas}",
  title="Subspace Gaussian mixture models for speech recognition",
  booktitle="Proc. International Conference on Acoustics, Speech, and Signal Processing",
  year="2010",
  journal="Proc. International Conference on Acoustics, Speech, and Signal Processing",
  volume="2010",
  number="3",
  pages="4330--4333",
  publisher="IEEE Signal Processing Society",
  address="Dallas",
  isbn="978-1-4244-4296-6",
  issn="1520-6149",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/povey_icassp2010_4330.pdf"
}

Projects

Multilingual recognition and search in speech for electronic dictionaries, MPO, TIP, FR-TI1/034, start: 2009-09-01, end: 2013-08-31, completed

Research groups

Speech Data Mining Research Group BUT Speech@FIT (RG SPEECH)

Departments

Department of Computer Graphics and Multimedia (DCGM)