Challenge Description

Challenge Introduction

We are proud to announce that this year the challenge will host seven diverse tasks which aim to push the limits of semantic visual understanding of videos as well as bridging visual content with human captions. Three out of the seven tasks are based on the ActivityNet dataset, which was introduced in CVPR 2015 and organized hierarchically in a semantic taxonomy. These tasks focus on trace evidence of activities in time in the form of proposals, class labels, captions, and object.

Guest tasks are to be announced...

ActivityNet Tasks

Task 1

5th trial

Temporal Action Localization (ActivityNet)

This task is intended to evaluate the ability of algorithms to temporally localize activities in untrimmed video sequences. Here, videos can contain more than one activity instance, and mutiple activity categories can appear in the video.

Task 2

4th trial

Dense-Captioning Events in Videos (ActivityNet Captions)

This task involves both detecting and describing events in a video. For this task, participants will use the ActivityNet Captions dataset, a new large-scale benchmark for dense-captioning events.

Guest Tasks

Task A


ActivityNet Entities Object Localization

This task aims to evaluate how grounded or faithful a description (could be generated or ground-truth) is to the video they describe. An object word is first identified in the description and then localized in the video in the form of a spatial bounding box. The prediction is compared against the human annotation to determine the correctness and overall localization accuracy.

Task B


Kinetics-AVA Crossover challenge

This task is an umbrella for a crossover of the previous AVA and Kinetics tasks , where Kinetics has now been annotated with AVA labels (but AVA has not been annotated with Kinetics labels). There has always been some interactions between the two datasets, e.g. many of the AVA methods are pre-trained on Kinetics. The new annotations should allow for improved performance on both tasks and also increase the diversity of the AVA evaluation set (which now also includes Kinetics clips).

Task C

2nd trials

Activity Detection in Extended Videos (ActEV-PC)

This task seeks to encourage the development of robust automatic activity detection algorithms for an extended video. Challenge participants will develop algorithms to detect and temporally localize instances of 18 different activities. Executive summary. [to be updated]

More guest tasks are to be announced...