Diamonds in the Rough: Digging out data for the Visual History of Golf
A couple of people have asked about the data preparation process behind my ‘Visual History of Golf’ chart from last month. This has been my most well received visualisation to date, earning me my first Viz of the Day, then Viz of the Week, and then getting a place in the Tableau Greatest Hits Gallery. I’m certainly getting value for money from this one. While I thought this would have more clout with the Twitter golf community, the vast majority of interactions came from within the dataviz community (I did however have one guy contact me to see if I was interested in purchasing some golf buggy parts).
Here is the visualisation (click to interact):
The actual visual output was relatively straightforward to produce. It is based on concepts used by John Burn-Murdoch to represent world tennis rankings, who I think in turn sought inspiration from Doc Quang Nguyen. Although it took time and effort to render the graphic to a level I was happy with, time spent in Tableau was the tip of the iceberg. A disproportionate chunk of time was actually spent behind the scenes, wrangling with the data; getting it onto my laptop, and then getting into a shape fit for visualisation. This was to be one of my biggest data wrangling challenges to date.
The problem? It required accessing and parsing ~1000 weekly ranking data tables: ⅓ web html tables, ⅓ PDF tables, ⅓ images of photocopies.The process was long and messy, involving a fair bit of trial and error, and there were a couple of times when I thought I might not make the cut. In this blog post I’ll pick out some of the key techniques, tricks and software tools that got this visualisation over the line.
I’ve decided to present this write-up as an annotated flow chart. I’m currently drawing towards the end of a 6-month placement at Jones Lang LeSalle (under the auspices of @cheeky_chappie), and thought I’d use this as practice for documenting some of the dashboards, tools, and analyses I’ve worked on during this stint.
Click below to view the workflow:
I hope you find some useful morsels in there.
I’ve tried to keep the write-up as software agnostic as possible. Although I currently use Alteryx for much of my data wrangling, the concepts and approaches should remain universal despite the tools used
(Also posted on my personal site pixelpoolviz.wordpress.com)