Publication Details

The Kaldi Speech Recognition Toolkit

POVEY, D.; GHOSHAL, A.; BOULIANNE, G.; BURGET, L.; GLEMBEK, O.; GOEL, N.; HANNEMANN, M.; MOTLÍČEK, P.; QIAN, Y.; SCHWARZ, P.; SILOVSKÝ, J.; STEMMER, G.; VESELÝ, K. The Kaldi Speech Recognition Toolkit. Proceedings of ASRU 2011. Hilton Waikoloa Village Resort, Hawaii: IEEE Signal Processing Society, 2011. p. 1-4. ISBN: 978-1-4673-0366-8.

Czech title

KALDI Toolkit pro rozpoznávání řeči

Type

conference paper

Language

English

Authors

Povey Daniel
Ghoshal Arnab
Boulianne Gilles
Burget Lukáš, doc. Ing., Ph.D. (DCGM)
Glembek Ondřej, Ing., Ph.D.
Goel Nagendra
Hannemann Mirko, Ph.D.
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
Qian Yanmin
Schwarz Petr, Ing., Ph.D. (DCGM)
Silovský Jan
Stemmer Georg
Veselý Karel, Ing., Ph.D. (DCGM)

URL

http://www.fit.vutbr.cz/research/groups/speech/publi/2011/povey_asru2011_Kaldi%20toolkit.pdf

Keywords

speech recognition, toolkit

Abstract

We described the design of Kaldi, a free and open-sourcespeech recognition toolkit. The toolkit currently supports modellingof context-dependent phones of arbitrary context lengths,and all commonly used techniques that can be estimated usingmaximum likelihood. It also supports the recently proposedSGMMs. Development of Kaldi is continuing and we areworking on using large language models in the FST framework,lattice generation and discriminative training.

Annotation

We describe the design of Kaldi, a free, open-source toolkit for speech recognition research. Kaldi provides a speech recognition system based on finite-state transducers (using the freely available OpenFst), together with detailed documentation and scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phonetic-context sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users.

Published

2011

Pages

1–4

Proceedings

Proceedings of ASRU 2011

Conference

IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Hilton Waikoloa Village Resort, Big Island, Hawaii, US

ISBN

978-1-4673-0366-8

Publisher

IEEE Signal Processing Society

Place

Hilton Waikoloa Village Resort, Hawaii

BibTeX

@inproceedings{BUT127200,
  author="Daniel {Povey} and Arnab {Ghoshal} and Gilles {Boulianne} and Lukáš {Burget} and Ondřej {Glembek} and Nagendra {Goel} and Mirko {Hannemann} and Petr {Motlíček} and Yanmin {Qian} and Petr {Schwarz} and Jan {Silovský} and Georg {Stemmer} and Karel {Veselý}",
  title="The Kaldi Speech Recognition Toolkit",
  booktitle="Proceedings of ASRU 2011",
  year="2011",
  pages="1--4",
  publisher="IEEE Signal Processing Society",
  address="Hilton Waikoloa Village Resort, Hawaii",
  isbn="978-1-4673-0366-8",
  url="http://www.fit.vutbr.cz/research/groups/speech/publi/2011/povey_asru2011_Kaldi%20toolkit.pdf"
}