Course details

Audio and Speech Processing by Humans and Machines

ASD Acad. year 2023/2024 Winter semester

3 day intensive course
Interaction between humans and machines could be greatly enhanced through communication using human sensory signals such as speech. Knowledge of human information processing is critical in the design of such human-machine interfaces.

State doctoral exam - topics:

Which property of human hearing is used in almost all existing techniques for speech recognition.
Describe structure of human ear.
How is frequency analysis of sound done in the ear?
How is the information from ear communicated to human brain?
What is the general tendency of frequency resolution of human hearing? How does it differ from frequency resolution of the Fourier analysis?
What is auditory masking? What it can be good for and why?
What is simultaneous and forward masking in human hearing?
What does loudness of sound depend on?
At which frequencies we hear the best?
Describe some speech analysis techniques that use more advanced knowledge of human hearing.

Guarantor

Heřmanský Hynek, prof. Ing., Dr. Eng. (DCGM)

Language of instruction

Czech, English

Completion

Examination (oral)

Time span

39 hrs lectures

Assessment points

100 pts final exam

Department

Department of Computer Graphics and Multimedia (UPGM)

Learning objectives

The course covers concept of signal as a carrier of information, basic principles of processing of cognitive signals, and introduces selected phenomena in auditory and visual perception.
Students learn how to interpret empirical data, how to incorporate these data in models, and how to apply these models to engineering problems. Emphasis is on active research in auditory modelling that exploits special properties of speech.

Study literature

Ben Gold, Nelson Morgan, Dan Ellis: Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Wiley-Interscience; 2nd Edition, 2011.
Brian Moore: An Introduction to the Psychology of Hearing, 6th Edition, BRILL 2013.
Simon Haykin: Neural Networks And Learning Machines, Pearson Education; Third edition, 2016.

Syllabus of lectures

Day 1
Introduction to processing of information-bearing sensory signals such as speech. Fundamentals of information theory and of pattern classification. Fundamentals of speech production. Conventional techniques for speech analysis (concept of short-term analysis, band-pass filtering, fourier-like transforms, cepstrum, linear prediction).
Day 2
Fundamentals of human auditory perception. Perception of pitch and loudness. Spectral and temporal resolution of hearing. Masking in frequency and in time. Some important speech perception phenomena.
Day 3
Introduction to auditory-like speech analysis techniques. Linear discriminant analysis and its use for deriving optimized spectral basis Temporal domain for speech analysis. Dynamic features of speech and RASTA technique. Multi-stream speech recognition. Recognition from temporal patterns and nonlinear discriminant mapping approaches speech.

Progress assessment

Oral exam.

Course inclusion in study plans

Programme DIT, any year of study, Compulsory-Elective group O
Programme DIT, any year of study, Compulsory-Elective group O
Programme DIT-EN (in English), any year of study, Compulsory-Elective group O
Programme DIT-EN (in English), any year of study, Compulsory-Elective group O
Programme VTI-DR-4, field DVI4, any year of study, Elective
Programme VTI-DR-4, field DVI4, any year of study, Elective
Programme VTI-DR-4 (in English), field DVI4, any year of study, Elective
Programme VTI-DR-4 (in English), field DVI4, any year of study, Elective