The Multimodal IMDb (MM-IMDb) dataset

Multimodal dataset with around 26,000 movies including images, plots and other metadata


The MM-IMDb dataset comprises 25,959 movies along with their plot, poster, genres and other 50 additional metadata fields such as year, language, writer, director, aspect ratio, etc. Additional info can be found in the paper.

Source code

Github repository

Download Dataset

Raw dataset: mmimdb.tar.gz [8.1G]

Fuel format dataset: multimodal_imdb.hdf5 [15G], metadata.npy [62M].


This paper describes the dataset in much greater detail. Please cite it if you intend to use this dataset.

Arevalo, J., Solorio, T., Montes-y-Gómez, M., & González, F. A. (2017). Gated multimodal units for information fusion. In: 5th International conference on learning representations 2017 workshop

    title={Gated Multimodal Units for Information Fusion},
    author={Arevalo, John and Solorio, Thamar and Montes-y-G{\'o}mez, Manuel and Gonz{\'a}lez, Fabio A},
    booktitle={5th International conference on learning representations 2017 workshop},