Detail výsledku

Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

PRASAD, A.; ZULUAGA-GOMEZ, J.; MOTLÍČEK, P.; SARFJOO, S.; NIGMATULINA, I.; OHNEISER, O.; HELMKE, H. Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition. Proceedings of the 12th SESAR Innovation Days. Budapest: 2022. p. 1-9.

Typ

článek ve sborníku konference

Jazyk

anglicky

Autoři

Prasad Amrutha
ZULUAGA-GOMEZ, J.
Motlíček Petr, doc. Ing., Ph.D., UPGM (FIT)
Sarfjoo Seyyed Saeed
NIGMATULINA, I.
OHNEISER, O.
HELMKE, H.

Abstrakt

Automatic Speech Recognition (ASR) for air traffic
control is generally trained by pooling Air Traffic Controller
(ATCO) and pilot data into one set. This is motivated by the
fact that pilot's voice communications are more scarce than
ATCOs. Due to this data imbalance and other reasons (e.g.,
varying acoustic conditions), the speech from ATCOs is usually
recognized more accurately than from pilots. Automatically
identifying the speaker roles is a challenging task, especially
in the case of the noisy voice recordings collected using Very
High Frequency (VHF) receivers or due to the unavailability
of the push-to-talk (PTT) signal, i.e., both audio channels are
mixed. In this work, we propose to (1) automatically segment the
ATCO and pilot data based on an intuitive approach exploiting
ASR transcripts and (2) subsequently consider an automatic
recognition of ATCOs' and pilots' voice as two separate tasks.
Our work is performed on VHF audio data with high noise
levels, i.e., signal-to-noise (SNR) ratios below 15 dB, as this data
is recognized to be helpful for various speech-based machinelearning
tasks. Specifically, for the speaker role identification
task, the module is represented by a simple yet efficient
knowledge-based system exploiting a grammar defined by the
International Civil Aviation Organization (ICAO). The system
accepts text as the input, either manually verified annotations
or automatically generated transcripts. The developed approach
provides an average accuracy in speaker role identification of
about 83%. Finally, we show that training an acoustic model
for ASR tasks separately (i.e., separate models for ATCOs and
pilots) or using a multitask approach is well suited for the noisy
data and outperforms the traditional ASR system where all data
is pooled together.

Klíčová slova

assistant based speech recognition, air traffic management, multitask acoustic modeling, speaker role classification, Kaldi

URL

Rok

2022

Strany

1–9

Sborník

Proceedings of the 12th SESAR Innovation Days

Konference

12th SESAR Innovation Days

Místo

Budapest

BibTeX

@inproceedings{BUT185195,
  author="PRASAD, A. and ZULUAGA-GOMEZ, J. and MOTLÍČEK, P. and SARFJOO, S. and NIGMATULINA, I. and OHNEISER, O. and HELMKE, H.",
  title="Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition",
  booktitle="Proceedings of the 12th SESAR Innovation Days",
  year="2022",
  pages="1--9",
  address="Budapest",
  url="https://arxiv.org/abs/2108.12175"
}

Soubory

pdf prasad_published_SID_paper_68.pdf 2 MB

Projekty

Automatický sběr a zpracování hlasových dat z letecké komunikace, EU, Horizon 2020, zahájení: 2019-11-01, ukončení: 2022-02-28, ukončen
HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration, EU, Horizon 2020, H2020-SESAR-2019-2, zahájení: 2020-06-01, ukončení: 2022-11-30, ukončen

Výzkumné skupiny

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (VZ SPEECH)

Pracoviště

Ústav počítačové grafiky a multimédií (UPGM)