Detail výsledku
MGB-3 but system: Low-resource ASR on Egyptian YouTube data
Baskar Murali Karthick, Ing., Ph.D., UPGM (FIT)
Diez Sánchez Mireia, M.Sc., Ph.D., UPGM (FIT)
Beneš Karel, Ing., Ph.D., UPGM (FIT)
This paper presents a series of experiments we performed duringour work on the MGB-3 evaluations. We both describethe submitted system, as well as the post-evaluation analysis.Our initial BLSTM-HMM system was trained on 250 hoursof MGB-2 data (Al-Jazeera), it was adapted with 5 hours ofEgyptian data (YouTube). We included such techniques asdiarization, n-gram language model adaptation, speed perturbationof the adaptation data, and the use of all 4 correctreferences. The 4 references were either used for supervisionwith a confusion network, or we included each sentence 4xwith the transcripts from all the annotators. Then, it was alsohelpful to blend the augmented MGB-3 adaptation data with15 hours of MGB-2 data. Although we did not rank with oursingle system among the best teams in the evaluations, we believethat our analysis will be highly interesting not only forthe other MGB-3 challenge participants.
MGB-3, ASR adaptation, low-resource ASR, Egyptian Arabic, diarization
@inproceedings{BUT144502,
author="Karel {Veselý} and Murali Karthick {Baskar} and Mireia {Diez Sánchez} and Karel {Beneš}",
title="MGB-3 but system: Low-resource ASR on Egyptian YouTube data",
booktitle="Proceedings of ASRU 2017",
year="2017",
pages="368--373",
publisher="IEEE Signal Processing Society",
address="Okinawa",
doi="10.1109/ASRU.2017.8268959",
isbn="978-1-5090-4788-8",
url="https://www.fit.vut.cz/research/publication/11595/"
}
Dolování infoRmAcí z řeči Pořízené vzdÁlenými miKrofony, MV, Bezpečnostní výzkum České republiky 2015-2020, VI20152020025, zahájení: 2015-10-01, ukončení: 2020-09-30, ukončen
IT4Innovations excellence in science, MŠMT, Národní program udržitelnosti II, LQ1602, zahájení: 2016-01-01, ukončení: 2020-12-31, ukončen
Robustní diarizace mluvčích pomocí Bayesovské inference a hlubokého učení, EU, Horizon 2020, zahájení: 2017-03-01, ukončení: 2019-02-28, ukončen
Zpracování, zobrazování a analýza multimediálních a 3D dat, VUT, Vnitřní projekty VUT, FIT-S-17-3984, zahájení: 2017-03-01, ukončení: 2020-02-29, ukončen