Detail výsledku
Ego4D: Around the World in 3,600 Hours of Egocentric Video
Westbury Andrew
Byrne Eugene
Cartillier Vincent
Chavis Zachary
Furnari Antonino
Girdhar Rohit
Hamburger Jackson
Jiang Hao
Kukreja Devansh
Liu Miao
Liu Xingyu
Martin Miguel
Nagarajan Tushar
Radosavovic Ilija
Ramakrishnan Santhosh Kumar
Ryan Fiona
Sharma Jayant
Wray Michael
Xu Mengmeng
Xu Eric Zhongcong
Zhao Chen
Bansal Siddhant
Batra Dhruv
Crane Sean
Do Tien
Doulaty Morrie
Erapalli Akshay
Feichtenhofer Christoph
Fragomeni Adriano
Fu Qichen
Gebreselasie Abrham
Gonzalez Cristina
Hillis James
Huang Xuhua
Huang Yifei
Jia Wenqi
Khoo Weslie
Kolar Jachym
Kottur Satwik
Kumar Anurag
Landini Federico Nicolás, Ph.D.
Li Chao
Li Yanghao
Li Zhenqiang
Mangalam Karttikeya
Modhugu Raghava
Munro Jonathan
Murrell Tullie
Nishiyasu Takumi
Price Will
Ruiz Puentes Paola
Ramazanova Merey
Sari Leda
Somasundaram Kiran
Southerland Audrey
Sugano Yusuke
Tao Ruijie
Vo Minh
Wang Yuchen
Wu Xindi
Yagi Takuma
Zhao Ziwei
Zhu Yunyi
Arbelaez Pablo
Crandall David
Damen Dima
Farinella Giovanni Maria
Fuegen Christian
Ghanem Bernard
Krishna Vamsi
Jawahar C. V.
Joo Hanbyul
Kitani Kris
Li Haizhou
Newcombe Richard
Oliva Aude
Park Hyun Soo
Rehg James M.
Sato Yoichi
Shi Jianbo
Zheng Shou Mike
Torralba Antonio
Torresani Lorenzo
Yan Mingfei
Malik Jitendra
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards, with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception.
Cameras, Benchmark testing, Three-dimensional displays, Task analysis, Annotations, Cultural differences, Computer vision, Video understanding, egocentric video, first-person vision, datasets and benchmarks
@article{BUT201375,
author="{} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and Federico Nicolás {Landini} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {} and {}",
title="Ego4D: Around the World in 3,600 Hours of Egocentric Video",
journal="IEEE Transactions on Pattern Analysis and Machine Intelligence",
year="2025",
volume="47",
number="11",
pages="9468--9509",
doi="10.1109/TPAMI.2024.3381075",
issn="0162-8828",
url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10611736&utm_source=scopus&getft_integrator=scopus&tag=1"
}