Challenge Description

Challenge Introduction

Since the success of the previous ActivityNet Challenges and based on your feedback, we have worked hard on making this round richer and more inclusive. We are proud to announce that this year's challenge will be a packed full-day workshop with many diverse challenges, which aim to push the limits of semantic visual understanding of videos as well as bridging visual content with human captions.
Two out of the these challenges are based on the ActivityNet Dataset. These tasks focus on tracing evidence of activities in time in the form of class labels, and captions. In this installment of the challenge, we will host various guest tasks, which enrich the understanding of visual information in videos. These tasks focus on complementary aspects of the video understanding problems at large scale and involve challenging and recently compiled datasets.


ActivityNet Temporal Action Localization

This task is intended to evaluate the ability of algorithms to temporally localize activities in untrimmed video sequences. Here, videos can contain more than one activity instance, and mutiple activity categories can appear in the video.

ActivityNet Event Dense-Captioning

This task involves both detecting and describing events in a video. For this task, participants will use the ActivityNet Captions dataset, a new large-scale benchmark for dense-captioning events.

AVA-Kinetics & Active Speakers

This challenge addresses two fundamental problems for spatio-temporal video understanding: (i) localize actions extents in space and time, and (ii) densely detect active speakers in video sequences.

ActEV Self-Reported Leaderboard (SRL)

The ActEV Self-Reported Leaderboard (SRL) Challenge is a self-reported leaderboard (take-home) evaluation; participants download a ActEV SRL testset, run their activity detection algorithms on the test set using their own hardware platforms, and then submit their system output to the evaluation server for scoring results, we will invite the top two teams to give oral presentations at the CVPR’22 ActivityNet workshop.

SoccerNet Challenge

The SoccerNet challenges are back! We are proud to announce new soccer video challenges at CVPR 2022, including (i) Action Spotting and Replay Grounding, (ii) Camera Calibration and Field Localization, (iii) Player Re-Identification and (iv) Ball and Player Tracking. Feel free to check out our presentation video for more details. The challenges end on the 30th of May 2022. Each challenge has a 1000$ prize sponsored by EVS Broadcast Equipment, SportRadar and Baidu Research. Latest news including leaderboards will be shared throughout the competition on our discord server.


This challenge focuses on recognizing tiny actions in videos. There can be multiple activities present in a video and the videos can have varying range of resolution from 10x10 to 128x128 pixels. The task will run on TinyVIRAT benchmark dataset.


This challenge leverages the Home Action Genome dataset, which contains multi-view videos of indoor daily activities. We use scene graphs to describe the relationship between a person and the object used during the execution of an action. In this track, the algorithms need to predict per-frame scene graphs, including how they change as the video progresses.