Variation in Hue across Movie Posters
My latest visualisation was a blast to build – a real tour de force of many technologies and techniques. I want to share with you some of the ideas and methods behind it. However first here’s the visualisation – click to interact:
I find it remarkable where this viz started; way back before the Tableau Conference in a conversation with Rob Radburn (Tableau Zen Master no less) on visualisations about Vegas.
Rob went on to reference this site: http://www.nytimes.com/newsgraphics/2014/02/14/fashion-week-editors-picks/
From there I went away and that evening built this visualisation (again click to interact and download but don’t consider it “finished”):
I was unhappy with this visualisation, while it was pretty the lack of data and lack of a good story was not setting my pulse racing. I never release anything I don’t believe in and so this visualisation doesn’t appear on my Tableau Public profile – I show it here to show where ideas can lead. I like to think the road to a good visualisation is paved with bad visualisations like this….
What started here though left an indent in my mind, I wasn’t satisfied with an unfinished Viz. I loved the idea but I wasn’t happy with the implementation. Post-Vegas I vowed to take it further….
Getting data is a challenge for anyone in data visualisation. I needed many more Movie Posters than were available on Wikipedia, I also wanted to look at genres (that weren’t available in wikipedia). IMDB has a block preventing web scraping (I tried believe me), and several other sites prevented me downloading from their sites. In the end I turned to www.TMDB.org and used Kimono Labs to scrape this page and it’s subsequent children: https://www.themoviedb.org/movie?page=1
The subsequent API gave me access to a webpage with a link to the posters. Alteryx (using the download tool) enabled me to download those 4,840 posters.
I then needed to convert these posters into data. I turned to the “convert” function in ImageMagik – a free piece of software – using a batch file:
It turns out that 200% in the above command resized it to 200px not 200% – which was great – as when I saw the data I realised what a mammoth task it was. Imagine this data for 200px x 300px x 4840 posters (that’s a lot):
Parsing this and then turning it into data was a nightmare – thankfully Alteryx was up to it.
If you look at the above picture and think I have bags of room on my laptop (153 GB and ~700 million rows) then you’re wrong. Alteryx may have had 153GB of data in memory but it barely touched the 20Gb of spare space I had on my PC. Why? That’s the magic of Alteryx – I’d love to explain it but I can’t.
I went through several Alteryx iterations, in the end the simplest (wildcard) input of the data and then a join to the original data to obtain genre – followed by a summarize – was enough for what I needed. Thankfully Alteryx took less than an hour to process data and so I was able to play with different options and visualisations.
I’d love to spend longer on the role Alteryx played in this Viz – the above paragraph doesn’t do it justice – but that’s Alteryx, it’s quiet, unassuming and it doesn’t hog the limelight. It just does its job. To say I love having Alteryx in my toolkit is a massive understatement – I couldn’t do my job without it.
What didn’t work
Colour over time – not a good enough story there. Also with most movies being more recent it was hard to justify.
Absolute Colour by Genre – generally movie posters are relatively similar – the differences are subtle but not enough for a viz IMO..
Colour by film – I’d love to show a film by film breakdown but the data volumes were too large, even a percentile breakdown proved too difficult
By now an idea was forming – a Hue colour wheel as a visualisation – broken down by Genre – I started to imagine this might work, in my mind I also had a lightness “meter” underneath…but how to show the data in an interesting way? and how to convert RGB to HSL (Hue, Saturation and Lightness)?
That was easy – but how to compare them…
Community Steal #2: Maybe it was Matt Chamber’s Iron Viz, but Z-Scores seemed an obvious way to compare hues. However each Genre had a different numbers of films – so I obtained the percentage contribution of each hue to the posters in the Genre. Effectively this normalised the hue and allowed a standard measure across that hue when compared across genres.
There’s more to write on Z-Scores, I’ll save that for another post (in the meantime here’s the basic premise: http://kb.tableau.com/articles/knowledgebase/z-scores). Suffice to say a few table calculations later I had the beginnings of my viz. [For those interested I tried LOD calcs but found performance worse than TCs]
So here’s where I go, well, controversial. I’m unhappy with the above visualisation. As a Tableau Public author I see the job of my visualisation is to entertain as much as to inform. If the above bar chart’s sole purpose was to inform a manager about sales then I’d leave it as it as a bar chart. However that isn’t my job. My interest here is in capturing the public’s imagination as much as informing. So let’s move towards the artful data visualisation. Yes, we need to stay functional but we can entertain as well….
Community Steal #3: Radial Bar Charts http://interworks.co.uk/blog/radial-bar-chart/
ha ha – if only it were that simple. I was already using table calculations. Here I’m being asked to nest them again – well here we go….
Community Steal #4: Trellis Charts http://www.theinformationlab.co.uk/2014/10/06/dynamic-visualisations-size-index/ (no I’m not sure you can steal from yourself either)
Bringing it all together into multiple tiles required MORE table calcs. I swear things got messy here, yes I swore, yes I cheered when I got it right. I’m not going to suggest this was straightforward, but it was fun.
Community Steal #5: Single Border Annotations https://www.tableau.com/about/blog/2015/10/viz-reigned-supreme-iron-viz-2015-45256
Shine rocked it and I won’t pretend anything else but when it came to doing annotations I did copy him.
Interactivity: Every Viz needs interactivity – users have questions and I wasn’t going to leave them wanting. So I added extra charts. This is something everyone can and should do with Tableau – don’t leave your users wanting more.
Help: Always give users help. The toggle help here is something I need a separate blog to explain (and I will do it) but please consider users when showing visualisations.
Finally I got feedback – Rob Radburn, Matt Francis, Niccolo Cirone all contributed to pre-release comments and made me tweak what I had. Again, you should always do this if you’re publishing visualisations – your own opinion isn’t good enough to check your work.
Finally, you know when it’s ready – yes hold back, but sometimes you need to let your baby lose. Watching the results is fabulous….