Publication Details

Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

PRASAD, A.; ZULUAGA-GOMEZ, J.; MOTLÍČEK, P.; SARFJOO, S.; NIGMATULINA, I.; OHNEISER, O.; HELMKE, H. Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition. Proceedings of the 12th SESAR Innovation Days. Budapest: 2022. p. 1-9.

Czech title

Identifikace role mluvčího pro rozpoznávání řeči při řízení letového provozu na základě gramatiky

Type

conference paper

Language

English

Authors

Prasad Amrutha (DCGM)
ZULUAGA-GOMEZ, J.
Motlíček Petr, doc. Ing., Ph.D. (DCGM)
Sarfjoo Seyyed Saeed
NIGMATULINA, I.
OHNEISER, O.
HELMKE, H.

URL

Keywords

assistant based speech recognition, air traffic management, multitask acoustic modeling, speaker role classification, Kaldi

Abstract

Automatic Speech Recognition (ASR) for air traffic
control is generally trained by pooling Air Traffic Controller
(ATCO) and pilot data into one set. This is motivated by the
fact that pilot's voice communications are more scarce than
ATCOs. Due to this data imbalance and other reasons (e.g.,
varying acoustic conditions), the speech from ATCOs is usually
recognized more accurately than from pilots. Automatically
identifying the speaker roles is a challenging task, especially
in the case of the noisy voice recordings collected using Very
High Frequency (VHF) receivers or due to the unavailability
of the push-to-talk (PTT) signal, i.e., both audio channels are
mixed. In this work, we propose to (1) automatically segment the
ATCO and pilot data based on an intuitive approach exploiting
ASR transcripts and (2) subsequently consider an automatic
recognition of ATCOs' and pilots' voice as two separate tasks.
Our work is performed on VHF audio data with high noise
levels, i.e., signal-to-noise (SNR) ratios below 15 dB, as this data
is recognized to be helpful for various speech-based machinelearning
tasks. Specifically, for the speaker role identification
task, the module is represented by a simple yet efficient
knowledge-based system exploiting a grammar defined by the
International Civil Aviation Organization (ICAO). The system
accepts text as the input, either manually verified annotations
or automatically generated transcripts. The developed approach
provides an average accuracy in speaker role identification of
about 83%. Finally, we show that training an acoustic model
for ASR tasks separately (i.e., separate models for ATCOs and
pilots) or using a multitask approach is well suited for the noisy
data and outperforms the traditional ASR system where all data
is pooled together.

Published

2022

Pages

1–9

Proceedings

Proceedings of the 12th SESAR Innovation Days

Conference

12th SESAR Innovation Days, Budapešť, HU

Place

Budapest

BibTeX

@inproceedings{BUT185195,
  author="PRASAD, A. and ZULUAGA-GOMEZ, J. and MOTLÍČEK, P. and SARFJOO, S. and NIGMATULINA, I. and OHNEISER, O. and HELMKE, H.",
  title="Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition",
  booktitle="Proceedings of the 12th SESAR Innovation Days",
  year="2022",
  pages="1--9",
  address="Budapest",
  url="https://arxiv.org/abs/2108.12175"
}

Files

pdf prasad_published_SID_paper_68.pdf 2 MB