You know what they call a Quarter Pounder with Cheese in Paris?
Data visualisation is a branch of descriptive statistics that can be considered both an art and a science. It allows information to be communicated with clarity and efficiency, ultimately aiding the viewer in their analysis by making the data digestible. Each visualisation has its strengths and weaknesses depending on the encoding (e.g. markers, lines, or bars) and arrangement. They can be used to compare values, observe frequencies, understand relationships, expose clusters, and reveal more about the data than what was previously understood.
Although the primary goal of data visualisation is to communicate information, it doesn't mean that it can't be aesthetically pleasing, or beautiful. In fact, we can almost always consider the secondary goal of data visualisation to be how engaging a visualisation is for the viewer. A carefully crafted data visualisation can both turn heads and paint the bigger picture.
With that in mind, let's unleash our creativity on a recent dump of the IMDb Top 1000.
Traversing through the IMDb IMDb Top 1000, we've selected a few of the interesting features that can be seen in the following screenshot.
We can download the dataset here, it's in CSV format and fresh as of 27-JUN-2022. Let's use Python to take a closer look at what we're working with.
import pandas as pd data = pd.read_csv("https://datacrayon.com/datasets/IMDb_top_1000_June.csv", index_col=0) data.head()
|0||The Shawshank Redemption||https://m.media-amazon.com/images/M/MV5BMDFkYT...||9.3||['Drama']||28,341,469||15||81.0||142 min||https://www.imdb.com/title/tt0111161/||2602807|
|1||The Godfather||https://m.media-amazon.com/images/M/MV5BM2MyNj...||9.2||['Crime', ' Drama']||134,966,411||X||100.0||175 min||https://www.imdb.com/title/tt0068646/||1798235|
|2||The Dark Knight||https://m.media-amazon.com/images/M/MV5BMTMxNT...||9.0||['Action', ' Crime', ' Drama']||534,858,444||12A||84.0||152 min||https://www.imdb.com/title/tt0468569/||2574258|
|3||The Lord of the Rings: The Return of the King||https://m.media-amazon.com/images/M/MV5BNzA5ZD...||9.0||['Action', ' Adventure', ' Drama']||377,845,905||12A||94.0||201 min||https://www.imdb.com/title/tt0167260/||1787394|
|4||Schindler's List||https://m.media-amazon.com/images/M/MV5BNDE4OT...||9.0||['Biography', ' Drama', ' History']||96,898,818||15||94.0||195 min||https://www.imdb.com/title/tt0108052/||1323548|
We can confirm all 1000 entires.
We can also see which features we've selected for this challenge.
['name', 'img', 'rating', 'genre', 'gross', 'certificate', 'metascore', 'runtime', 'url', 'votes']
Not every case in the dataset is complete - so we may need to clean the dataset depending on the features we want to use.
Use this dataset to create a data visualisation or infographic!
- Highlight something interesting in the dataset based on a selection of features.
- We don't have to code to take part, we could use spreadsheet technology to wrangle the data and obtain our insights.
The context, movies, is likely to be familiar to most of you. Likewise, the features are not esoteric.
The submission can be:
- In any format.
- Made with your software of choice.
- Interactive or static.
- Must be safe for work 👀.
Make use of the presentation time to highlight what's interesting about the submission!
Click here to see a visualisation created with this dataset. In this example, Plotapi Chord was used to visualise the co-occurrence of genres in the IMDb "Top 1000".
Supporters, thank you!
A special thank you to my supporters who were with me leading up to this post.
Join the discussion and like this article below! You can also join us on Discord!