Course details

Speech Processing Systems

SRE Acad. year 2010/2011 Winter semester 5 credits

Current academic year

Guarantor

Language of instruction

Czech

Completion

Examination

Time span

  • 39 hrs lectures
  • 13 hrs projects

Department

Study literature

  • Psutka, J.: Komunikace s počítačem mluvenou řečí. Academia, Praha, 1995, ISBN 80-200-0203-0.
  • Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000, ISBN 0-471-35154-7.

Fundamental literature

  • Gussenhoven, J. and Jacobs, H.: Understanding Phonology, Oxford University Press, 1998, ISBN: 0-340-69218-9
  • Psutka, J.: Komunikace s počítačem mluvenou řečí. Academia, Praha, 1995, ISBN 80-200-0203-0.
  • Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000, ISBN 0-471-35154-7.
  • Moore, B.C.J.: An introduction to the psychology of hearing, Academic Press, 1989, ISBN 0-12-505627-3.
  • Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998, ISBN 0-262-10066-5.
  • Manning, C. and Schütze, H.: Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA: May 1999.

Syllabus of lectures

  1. Phonetics and phonology - syllable structure, phonological processes and distinctive features.
  2. Statistical pattern classification I. - Bayesian framework, Maximum likelihood learning, Gaussian mixture models. Features for GMM modeling.
  3. Statistical pattern classification II. - Artificial Neural Networks, Support vector machines. Sequence modeling - Hidden Markov models. 
  4. HMM training and adaptation - MLLR, MAP, discriminative training.
  5. HMM recognition - pronunciation dictionaries and networks, language modeling, decoding, lattices.
  6. Phoneme recognition. Keyword spotting and search - LVCSR, acoustic and phonetic lattices. Figure of Merit.
  7. Speaker identification and verification - GMM, SVM. Channel normalization and compensation - feature mapping, eigen-voices and nuissance attributes projection (NAP). Evaluation of speaker verification: DET curves, EER, cost function.
  8. Language identification - acoustic vs. phonotactic, evaluation.
  9. Speech coding - CELP framework - adaptive and stochastic codebooks, GSM standards.
  10. Language modeling 1 - n-gram models, class-based models
  11. Language modeling 2 - language-specific features, factored-language models
  12. Psycholinguistics - word recognition models, word associations
  13. Probabilistic parsing - inside-outside algorithm, dependency parsing

Course inclusion in study plans

Back to top