Result Details

Phoneme Recognition of Meetings using Audio-Visual Data

MOTLÍČEK, P.; BURGET, L.; ČERNOCKÝ, J. Phoneme Recognition of Meetings using Audio-Visual Data. AMI Workshop. Martigny: 2004. 6 p.

Type

abstract

Language

English

Authors

Motlíček Petr, doc. Ing., Ph.D., DCGM (FIT)
Burget Lukáš, doc. Ing., Ph.D., FIT (FIT), DCGM (FIT)
Černocký Jan, prof. Dr. Ing., DCGM (FIT)

Abstract

Phoneme Recognition of Meetings using Audio-Visual Data

Keywords

speech recognition, pattern recognition, feature extraction, audio, video, bimodal recognition

URL

https://www.fit.vut.cz/person/motlicek/public/publi/2004/ami2004.pdf

Annotation

The movement of speaker's faces are known to convey visual information that can improve speech intelligibility especially in case of somehow corrupted or noisy data. Therefore, availability of visual data could be exploited to enhance automatic speech recognition task. This paper demonstrates the use of visual parameters extracted from video for automatic recognition of context-independent phoneme strings from meeting data. Encouraged by the good performance of audio-visual systems utilized to work with "visually clean" data (limited variation in the speaker's frontal pose, lighting conditions, background, etc.), we investigate their efficiency in non-ideal conditions which are introduced by meeting audio-visual data employed in our experiments. A major issue is the phoneme recognition task based on combination of the audio and visual data so that the best use can be made of the two modalities together.

Published

2004

Pages

Book

AMI Workshop

Conference

Joint AMI/PASCAL/IM2/M4 workshop

Place

Martigny

BibTeX

@misc{BUT60054,
  author="Petr {Motlíček} and Lukáš {Burget} and Jan {Černocký}",
  title="Phoneme Recognition of Meetings using Audio-Visual Data",
  booktitle="AMI Workshop",
  year="2004",
  pages="6",
  address="Martigny",
  url="http://www.fit.vutbr.cz/~motlicek/publi/2004/ami2004.pdf",
  note="Abstract"
}

Projects

Augmented Multi-party Interaction, EU, Sixth Framework programme, 506811-AMI, start: 2004-01-01, end: 2006-12-31, completed
Data driven and anthropic coding and recognition of speech, GACR, Postdoktorandské granty, GP102/02/D108, start: 2002-09-01, end: 2005-08-30, completed
Voice technologies for support of information society, GACR, Standardní projekty, GA102/02/0124, start: 2002-01-01, end: 2004-12-31, completed

Research groups

Výzkumná skupina dolování dat z řeči BUT Speech@FIT (RG SPEECH)

Departments

Ústav počítačové grafiky a multimédií (DCGM)