The Multimodal IMDb (MM-IMDb) dataset

Multimodal dataset with around 26,000 movies including images, plots and other metadata

Description

The MM-IMDb dataset comprises 25,959 movies along with their plot, poster, genres and other 50 additional metadata fields such as year, language, writer, director, aspect ratio, etc. Additional info can be found in the paper.

Source code

Github repository

Download Dataset

Raw dataset: mmimdb.tar.gz [8.1G]

Fuel format dataset: multimodal_imdb.hdf5 [15G], metadata.npy [62M].

Reference

This paper describes the dataset in much greater detail. Please cite it if you intend to use this dataset.

Arevalo, J., Solorio, T., Montes-y-Gómez, M., & González, F. A. (2017). Gated multimodal units for information fusion. In: 5th International conference on learning representations 2017 workshop

@inproceedings{arevalo2017gated,
    title={Gated Multimodal Units for Information Fusion},
    author={Arevalo, John and Solorio, Thamar and Montes-y-G{\'o}mez, Manuel and Gonz{\'a}lez, Fabio A},
    booktitle={5th International conference on learning representations 2017 workshop},
    year={2017}
}

Contact