Faculty of Information Technology, BUT

Publication Details

Brno University of Technology at TRECVid 2010

HRADIŠ Michal, BERAN Vítězslav, ŘEZNÍČEK Ivo, HEROUT Adam, BAŘINA David, VLČEK Adam and ZEMČÍK Pavel. Brno University of Technology at TRECVid 2010. In: TRECVID 2010: Participant Notebook Papers and Slides. Gaithersburg, MD: National Institute of Standards and Technology, 2010, p. 11.
Czech title
Brno University of Technology at TRECVid 2010
Type
conference paper
Language
english
Authors
URL
Keywords
TRECVID, semantic indexing, Content-based Copy Detection, image classification
Abstract
This paper describes our approach to semantic indexing and content-based copy detection which was used for TRECVID 2010 evaluation.

Semantic indexing

1.  The runs differ in the types of visual features used. All runs use several bag-of-word representations fed to separate linear SVMs and the SVMs were fused by logistic regression.

  • F_A_Brno_resource_4: Only single best visual features (on the training set) are used - dense image sampling with rgb-SIFT.
  • F_A_Brno_basic_3: This run uses dense sampling and Harris-Laplace detector in combination with SIFT and rgb-sift descriptors.
  • F_A_Brno_color_2: This run extends F_A_Brno_basic_3 by adding dense sampling with rg-SIFT, Opponent-SIFT, Hue-SIFT, HSV-SIFT, C-SIFT and opponent histogram descriptors.
  • F_A_Brno_spacetime_1: This run extends F_A_Brno_color_2 by adding space-time visual features STIP and HESSTIP.

2. Combining multiple types of visual features improves results significantly. F_A_Brno_color_2 achieve more than twice better results than F_A_Brno_resource_4. The space-time visual features did not improve results.

3. Combining multiple types of visual features is important. Linear SVM is inferior to non-linear SVM in the context of semantic indexing.

Content-based Copy Detection

1.    Two runs submitted, but with similar settings; the difference is only in amount of processed test data (40% and 60%)

  • brno.m.*.l3sl2: SURF, bag-of-words (visual codebook: 2k size, 4 nearest neighbors used in soft-assignment), inverted file index, geometry (homography) based image similarity metric

2.    What if any significant differences (in terms of what measures) did you find among the runs?

  • only one setting used - no differences

3.    Based on the results, can you estimate the relative contribution of each component of your system/approach to its effectiveness?

  • slow search in reference dataset due to unsuitable configuration of used visual codebook

4.    Overall, what did you learn about runs/approaches and the research question(s) that motivated them?

  • change the way of describing the video content - frame based (or key-frame based) approach is not sufficient
Published
2010
Pages
11
Proceedings
TRECVID 2010: Participant Notebook Papers and Slides
Conference
2010 TRECVID Workshop, Gaithersburg, US
Publisher
National Institute of Standards and Technology
Place
Gaithersburg, MD, US
BibTeX
@INPROCEEDINGS{FITPUB9444,
   author = "Michal Hradi\v{s} and V\'{i}t\v{e}zslav Beran and Ivo \v{R}ezn\'{i}\v{c}ek and Adam Herout and David Ba\v{r}ina and Adam Vl\v{c}ek and Pavel Zem\v{c}\'{i}k",
   title = "Brno University of Technology at TRECVid 2010",
   pages = 11,
   booktitle = "TRECVID 2010: Participant Notebook Papers and Slides",
   year = 2010,
   location = "Gaithersburg, MD, US",
   publisher = "National Institute of Standards and Technology",
   language = "english",
   url = "https://www.fit.vut.cz/research/publication/9444"
}
Back to top