Course details

Audio and Speech Processing by Humans and Machines

ASD Acad. year 2025/2026 Winter semester

To introduce engineering students to the principles of audio and visual signal processing by human listeners and machines with the aim of applying this knowledge to the design of technical systems for audio signal processing. Students will become aware of the possibilities of applying knowledge of human audiovisual perception in the design of engineering systems for signal processing in artificial intelligence
State doctoral exam - topics:

Which property of human hearing is used in almost all existing techniques for speech recognition.
Describe structure of human ear.
How is frequency analysis of sound done in the ear?
How is the information from ear communicated to human brain?
What is the general tendency of frequency resolution of human hearing? How does it differ from frequency resolution of the Fourier analysis?
What is auditory masking? What it can be good for and why?
What is simultaneous and forward masking in human hearing?
What does loudness of sound depend on?
At which frequencies we hear the best?
Describe some speech analysis techniques that use more advanced knowledge of human hearing.

Guarantor

Heřmanský Hynek, prof. Ing., Dr. Eng. (DCGM)

Language of instruction

Czech, English

Completion

Examination (oral)

Time span

39 hrs lectures

Assessment points

100 pts final exam

Department

Department of Computer Graphics and Multimedia (UPGM)

Learning objectives

To introduce engineering students to the principles of audio and visual signal processing by human listeners and machines with the aim of applying this knowledge in the design of technical systems for audio signal processing. Students will become aware of the possibilities of applying knowledge about human audiovisual perception in the design of engineering systems for signal processing in artificial intelligence

Study literature

Ben Gold, Nelson Morgan, Dan Ellis: Speech and Audio Signal Processing: Processing and Perception of Speech and Music, Wiley-Interscience; 2nd Edition, 2011.
Brian Moore: An Introduction to the Psychology of Hearing, 6th Edition, BRILL 2013.
Simon Haykin: Neural Networks And Learning Machines, Pearson Education; Third edition, 2016.

Syllabus of lectures

Introduction
Linking speech and hearing

Information in written and spoken language
Measurement of information
Channel capacity
Transmission of information through a communication channel
Information in printed text
Information in speech signals and in speech messages

Basic properties of hearing
Simultaneous and temporal masking
Critical bands of hearing
Pitch perception
Time in the perception of acoustic signals
Perception of signal modulations
Physiology of the auditory periphery
Physiology of higher hearing stages
Feedback and its consequences

Basic principles of speech production
Linear model of speech production
Propagation of sound in air
Quarter-wave resonator
Half-wave resonators
Consequences of narrowing the acoustic tract (introduction of redundancy in frequency)

Speech dynamics
Vocal tract movements
Correlation between vocal tract movements and dynamics of speech envelopes
Speech modulation spectrum
Speech intelligibility with modified dynamics
Coarticulation (introduction of temporal redundancies into speech)

Short-term spectral analysis
Overview of Fourier transform
Sampling and quantization
Short-term Fourier analysis
Uncertainty principle in spectral analysis
Cepstral analysis
Linear predictive analysis
Approximating the spectral envelope using LP
LP spectral transform
Perceptual techniques for estimating the spectral envelope
Using spectral dynamics (RASTA filters)

Data processing
Linear discriminant analysis and design of spectral projections
Linear discriminant analysis and design of temporal RASTA filters
Linear discriminant analysis and design of 2D spectro-temporal filters
Relations between speech and hearing

History of speech recognition
Newton, Radio Rex, Spectrogram, the first recognizers and the first lessons
Feature template comparison
Principles of stochastic recognition
Training and recognition using hidden Markov models
Artificial neural networks
Deriving posterior probabilities of speech sounds (DNN/HMM hybrid method)
Alternative uses of artificial neural networks (TANDEM)
Temporal pattern classifier (TRAPS)
Current techniques

Human speech recognition by humans

Words in context and out of context (parallel context channel)
Recognition of syllables filtered by high and low pass (Fletcher et al.)
Recognition accuracy and articulation index
Product of error probabilities in subbands
Possible implications in engineering

Progress assessment

Oral exam.

Course inclusion in study plans

Programme DIT, any year of study, Compulsory-Elective group O
Programme DIT, any year of study, Compulsory-Elective group O
Programme DIT-EN (in English), any year of study, Compulsory-Elective group O
Programme DIT-EN (in English), any year of study, Compulsory-Elective group O