To enter the competition, you need to create an account on the Evaluation Tab. Using a registered account you will be able to upload your results to the evaluation server and participate in the ActivityNet Challenge 2017. Please be advised that we have changed our submission policy this year: each participant is limited to 1 submission per task per week. The Evaluation Server will enforce a waiting time of 7 days between submissions of the same task. This gives each participant a total of 3 submissions per task before the evaluation server closes. Each team is limited to 4 submissions (in total) per task. Only results that are submitted during the challenge period (before the deadline) and posted to the leaderboard will be considered valid. Additionally, you will also need to upload a notebook paper that describes your method in detail. This challenge allows the use of external data to train and tune algorithm parameters. We are committed to keeping track of this practice. Therefore, each submission must explicitly cite the kind of external data used and which modules benefit from it.
The ActivityNet challenge 2017 includes five different tasks as described below:
This task is intended to evaluate the ability of algorithms to predict activities in untrimmed video sequences. Here, videos can contain more than one activity, and typically large time lapses of the video are not related with any activity of interest. [Details]
This task is intended to evaluate the ability of algorithms to recognize activities in trimmed video sequences. Here, videos contain a single activity, and all the clips have a standard duration of ten seconds. For this task, participants will use the Kinetics dataset, a new large-scale benchmark for trimmed action classification. [Details]
This task is intended to evaluate the ability of algorithms to generate high quality action proposals . The goal is to produce a set of candidate temporal segments that are likely to contain a human action. [Details]
This task is intended to evaluate the ability of algorithms to temporally localize activities in untrimmed video sequences. Here, videos can contain more than one activity instance, and mutiple activity categories can appear in the video. [Details]
This task involves both detecting and describing events in a video. For this task, participants will use the ActivityNet Captions dataset, a new large-scale benchmark for dense-captioning events. [Details]
This challenge allows participants to use external data to train their algorithms or tune parameters. Each submission should explicitly cite the kind of external data used and which modules of the system benefit from it. Some popular forms of external data usage include (but not confined to):
This academic challenge aims to highlight automated algorithms that understand the audio-visual content of videos. To serve this purpose and to allow for fair competition, we request that ALL participants:
This year, we will award a Panasonic Lumix DC-GH5 camera (+lens) and an Nvidia graphics card to the participant with the most innovative solution. This solution does not necessarily have to be the winner of any task. A technical committee will make this decision based on the participants' submitted reports. So, please make sure to submit a detailed description of your solution on time.