Since the success of the previous ActivityNet Challenges and based on your feedback, we have worked hard on making this round richer and more inclusive. We are proud to announce that this year's challenge will be a packed full-day workshop with many diverse challenges, which aim to push the limits of semantic visual understanding of videos as well as bridging visual content with human captions. Two out of the these challenges are based on the ActivityNet Dataset. These tasks focus on tracing evidence of activities in time in the form of class labels, and captions. In this installment of the challenge, we will host various guest tasks, which enrich the understanding of visual information in videos. These tasks focus on complementary aspects of the video understanding problems at large scale and involve challenging and recently compiled datasets.
The ActEV Self-Reported Leaderboard (SRL) Challenge is a self-reported leaderboard (take-home) evaluation; participants download a ActEV SRL testset, run their activity detection algorithms on the test set using their own hardware platforms, and then submit their system output to the evaluation server for scoring results, we will invite the top two teams to give oral presentations at the CVPR’22 ActivityNet workshop.
The SoccerNet challenges are back! We are proud to announce new soccer video challenges at CVPR 2022, including (i) Action Spotting and Replay Grounding, (ii) Camera Calibration and Field Localization, (iii) Player Re-Identification and (iv) Ball and Player Tracking. Feel free to check out our presentation video for more details. The challenges end on the 30th of May 2022. Each challenge has a 1000$ prize sponsored by EVS Broadcast Equipment, SportRadar and Baidu Research. Latest news including leaderboards will be shared throughout the competition on our discord server.
This challenge leverages the Home Action Genome dataset, which contains multi-view videos of indoor daily activities. We use scene graphs to describe the relationship between a person and the object used during the execution of an action. In this track, the algorithms need to predict per-frame scene graphs, including how they change as the video progresses.