Hackathon: Visualising the IMDb Top 1000

Sign up, join a team, gain experience, and compete for prizes in the Polyra + Bournemouth University hackathon!

Hackathon: Visualising the IMDb Top 1000

You know what they call a Quarter Pounder with Cheese in Paris?

Data visualisation is a branch of descriptive statistics that can be considered both an art and a science. It allows information to be communicated with clarity and efficiency, ultimately aiding the viewer in their analysis by making the data digestible. Each visualisation has its strengths and weaknesses depending on the encoding (e.g. markers, lines, or bars) and arrangement. They can be used to compare values, observe frequencies, understand relationships, expose clusters, and reveal more about the data than what was previously understood.

Although the primary goal of data visualisation is to communicate information, it doesn't mean that it can't be aesthetically pleasing, or beautiful. In fact, we can almost always consider the secondary goal of data visualisation to be how engaging a visualisation is for the viewer. A carefully crafted data visualisation can both turn heads and paint the bigger picture.

With that in mind, let's unleash our creativity on a recent dump of the IMDb Top 1000.

The datasetΒΆ

Traversing through the IMDb IMDb Top 1000, we've selected a few of the interesting features that can be seen in the following screenshot.

We can download the dataset here, it's in CSV format and fresh as of 27-JUN-2022. Let's use Python to take a closer look at what we're working with.

import pandas as pd

data = pd.read_csv("https://datacrayon.com/datasets/IMDb_top_1000_June.csv", index_col=0)
name img rating genre gross certificate metascore runtime url votes
0 The Shawshank Redemption https://m.media-amazon.com/images/M/MV5BMDFkYT... 9.3 ['Drama'] 28,341,469 15 81.0 142 min https://www.imdb.com/title/tt0111161/ 2602807
1 The Godfather https://m.media-amazon.com/images/M/MV5BM2MyNj... 9.2 ['Crime', ' Drama'] 134,966,411 X 100.0 175 min https://www.imdb.com/title/tt0068646/ 1798235
2 The Dark Knight https://m.media-amazon.com/images/M/MV5BMTMxNT... 9.0 ['Action', ' Crime', ' Drama'] 534,858,444 12A 84.0 152 min https://www.imdb.com/title/tt0468569/ 2574258
3 The Lord of the Rings: The Return of the King https://m.media-amazon.com/images/M/MV5BNzA5ZD... 9.0 ['Action', ' Adventure', ' Drama'] 377,845,905 12A 94.0 201 min https://www.imdb.com/title/tt0167260/ 1787394
4 Schindler's List https://m.media-amazon.com/images/M/MV5BNDE4OT... 9.0 ['Biography', ' Drama', ' History'] 96,898,818 15 94.0 195 min https://www.imdb.com/title/tt0108052/ 1323548

We can confirm all 1000 entires.

(1000, 10)

We can also see which features we've selected for this challenge.


Not every case in the dataset is complete - so we may need to clean the dataset depending on the features we want to use.

The challengeΒΆ

Use this dataset to create a data visualisation or infographic!

  • Highlight something interesting in the dataset based on a selection of features.
  • We don't have to code to take part, we could use spreadsheet technology to wrangle the data and obtain our insights.

The context, movies, is likely to be familiar to most of you. Likewise, the features are not esoteric.

The submission can be:

  • In any format.
  • Made with your software of choice.
  • Interactive or static.
  • Must be safe for work πŸ‘€.

Make use of the presentation time to highlight what's interesting about the submission!

An exampleΒΆ

Click here to see a visualisation created with this dataset. In this example, Plotapi Chord was used to visualise the co-occurrence of genres in the IMDb "Top 1000".

Supporters, thank you!

A special thank you to my supporters who were with me leading up to this post.



Join the discussion and like this article below! You can also join us on Discord!