Faculty of Information Technology, BUT

Course details

Modern Methods of Speech Processing

MZD Acad. year 2003/2004 Summer semester

From simple systems to stochastic modelling. Hidden Markov models. Large vocabulary continuous speech recognition. Language models. Speech production, speech perception: time and frequency. Data-driven methods for feature extraction. Speech databases. Excitation in speech coding, CELP. Speaker identification.


Language of instruction



Examination (written)

Time span

39 hrs lectures

Assessment points


Subject specific learning outcomes and competences

This course allows students to implement simple speech processinga pplications, as for example voice command of a process. However, first of all it enables them to join the development of complex systems for speech recognition and coding systems, using modern methods, in academic and industrial environments.

Learning objectives

Study literature

  • Moore, B.C.J., : An introduction to the psychology of hearing, Academic Press, 1989
  • Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998
  • Fukunaga, K.: Introduction to Statistical Pattern Recognition, Academic Press, 1990
  • Vapnik, V. N.: Statistical Learning Theory, Wiley-Interscience, 1998
  • Dutoit, T.: An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997

Fundamental literature

  • Psutka, J.: Komunikace s s počítačem mluvenou řečí. Academia, Praha, 1995
  • Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000
  • Texts from <a href=http://www.fit.vutbr.cz/~cernocky/speech/> http://www.fit.vutbr.cz/~cernocky/speech/

Syllabus of lectures

  • Review of notions: signal vectors and parameter matrices, basic statistics.
  • Stochastic modeling of parameters, modeling of time by state sequences.
  • Hidden Markov models: basic structure, training.
  • Recognition of speech using HMM: Viterbi search, token passing.
  • Pronunciation dictionaries and language models.
  • Speech production and derived parameters: LPC, Log area ratios, line spectral pairs.
  • Speech perception and derived parameters: Mel-frequency cepstral coefficients, Perceptual linear prediction.
  • Temporal properties of hearing - RASTA filtering.
  • Training the feature extractor on the data - linear discriminant analysis.
  • Speech databases: standards, contents, speakers, annotations.
  • Vocoders and modeling of the excitation: multi-pulse and stochastic excitations (GSM coding).
  • CELP coding: long-term predictor, codebooks. Very low bit-rate coders.
  • Current methods of speaker identification and verification.
Back to top