ActivityNet Challenge

Download

C3D Features

The publicly available pre-trained C3D model which has a temporal resolution of 16 frames was used to extract frame based features. This network was not fine-tuned on our data. We reduce the dimensionality of the activations from the second fully-connected layer (fc7) of our visual encoder from 4096 to 500 dimensions using PCA. The C3D features were extracted every 8 frames. The C3D Features were stacked in an HDF5 file, zipped and then splitted into six different files:

activitynet_v1-3.part-00

activitynet_v1-3.part-01

activitynet_v1-3.part-02

activitynet_v1-3.part-03

activitynet_v1-3.part-04

activitynet_v1-3.part-05

PCA_activitynet_v1-3

ImagenetShuffle Features

CNN features based on the pool5 layer of a Google inception net (GoogLeNet) on two frames per second. Features were mean-pooled across the frames followed by L1-normalization.

Features

Video index

README

MBH Features

The MBH features were generated with the aid of the Improved Trajectories executable from this page: IDT.

Features

Video index

Temporal Activity Proposals

Agnostic temporal proposals are provided to encourage the participation in the Activity Detection challenge. This proposals could be applied as a preliminary stage to split the untrimmed videos into high recall trimmed temporal segments. The temporal activity proposals are generated for every video in ActivityNet Version 1.3 and saved into an HDF5 file. Check out this Notebook for further information on how to read and use the temporal proposals.

Temporal Activity Proposals

README