On average, considering variation leads to better decisions
Garrison Keillor describes the people of his fictional town, Lake Wobegon, by saying "All the women are strong, all the men are good-looking, and all the children are above average." While the last claim is impossible and the first two are unlikely, even for a fictional town, the statement illustrates our tendency to focus on aggregate statistics such as totals and averages when we summarise things.
Aggregate statistics are important, but if you only examine overall measures such as those without also checking the extent and nature of underlying variation you may miss important pieces of the puzzle you are trying to put together. That’s particularly the case when what you are describing is less homogeneous than the people of Lake Wobegon.
A recently released report on community engagement done by the New Zealand Transport Agency (NZTA) regarding proposed changes to how State Highway 1 travels through Wellington demonstrates the value of looking into the extent and nature of variation rather than just at aggregate statistics. It does that by breaking down the data a variety of ways using a consistent chart type to make it easy to compare different results to get a clear understanding of the overall situation. Whatever your opinion about the changes proposed for State Highway 1, the report is an example of good data communication.
Aggregate measures sometimes don’t show the full picture
The report summarises results of a survey that asked about five specific changes being considered to the portion of State Highway 1 that runs through Wellington City. Community members who took the opportunity to provide feedback were asked to indicate how they believe the changes overall, and each change individually, would affect them personally and the Wellington region more generally.
Results for each of those two measures for the full set of changes collectively are shown in aggregate in Figure 5-1.
Image reproduced for purposes of education, criticism and commentary.
Importantly though, the report also includes the results for each measure based on where survey participants live, as shown in Figure 5-2 below for how survey respondents think the whole programme of changes would affect them personally.
Image reproduced for purposes of education, criticism and commentary.
If we only looked at the first figure we would get a sense that the community is somewhat evenly split on whether the project overall would make things better (41%) or worse (49%) for them personally, but by examining the more granular geographic breakdown of results it becomes clear that the people who believe the changes would make things better disproportionately live outside of Wellington City or in the northern and western suburbs while the people who believe the changes would make things worse for them live in the CBD and the southern and eastern suburbs.
This is important information for policy makers to consider given the changes proposed would occur in the CBD and the southern and eastern suburbs. In other words, the people most likely to be most directly affected by the changes during and after implementation were disproportionately likely to believe the changes would make things worse for them personally.
In this situation a lot of variation in attitudes toward the proposed changes can be explained by where people live, but characteristics such as age, gender, and income, may account for variation in other metrics. It’s always worth checking for such differences rather than just focussing on aggregate measures such as totals and overall averages when using data to make decisions.
Lesson: Aggregated results may hide a lot of variation across different characteristics. Check for such differences, and when they exist show and explain the variation as well as the aggregated results.
Make comparison easy
Previous posts have shown examples of inconsistency in how data insights are communicated. This report shows the benefit of consistency when it comes to choices such as the type of chart, the order of data series, and the colour scheme.
Both Figure 5-1 and 5-2 use stacked bar charts, ordered to show responses from much worse to much better, and with the same colours used to represent each possible response option. Similar charts were used to show perceived personal and Wellington-wide effects toward each change considered. For example, Figure 5-5 below shows how responses varied for each change under consideration.
Image reproduced for purposes of education, criticism and commentary.
A stacked bar was a good choice for this data because the response options for each question are mutually exclusive and some labels for different projects and locations are long. Having selected an optimal chart type for the situation, keeping everything else constant makes it very easy to make comparisons between charts as well as within them. It means that someone reading the report can concentrate on how attitudes change depending on where a person lives or what part of the project is being considered rather than forcing them to waste mental bandwidth trying to orient themselves around each new chart because it’s designed slightly differently.
Lesson: Using charts of the same type and with the same order and colour scheme helps facilitate comparisons
Unlike the fictional, reportedly highly homogeneous Lake Wobegon, variation is common in the real world and the data that represents it. Understanding that variation rather than focussing exclusively on aggregate metrics won’t make decisions such as whether to change a state highway easy, but it will, on average, result in better decisions.
What the screen industry can teach data communicators
People in the screen sector excel at telling stories, and a report about the New Zealand screen sector provides an opportunity to consider how we tell stories with and about data. The report provides an interesting overview of the sector, but also illustrates some common ways in which the use of charts is not quite as effective as it could be.
The film director David Fincher is quoted as saying: "My idea of professionalism is probably a lot of people's idea of obsessive." Attention to detail can elevate a data communication from serviceable to excellent just as it can elevate a film or a TV show.
Consider the metric you’re using when creating stacked bar and column charts
Figures 3 and 6 of the report focus on how the New Zealand screen sector breaks down into sub-sectors such as production and post-production. It does this based on a count of firms in Figure 3 and a count of people in Figure 6. That’s interesting and important information, but it’s shown in a way that makes it harder to digest than it needs to be.
Image reproduced for purposes of education, criticism and commentary.
Because both figures show the data as counts, or absolute values, rather than as percentages, it’s somewhat hard to discern to what extent a particular sub-sector is growing because that can be masked by growth in the sector overall. For example, looking at Figure 3 we can reasonably conclude that, when it comes to firms, the production sub-sector is shrinking as a percentage of the overall sector and post-production is growing because the orange portion of each column has stayed around the same height while the columns have grown overall and the dark teal portion appears to have grown as a percentage of the columns.
Beyond that though we don’t have a very good idea of the magnitude of the shift in those percentages, and we have even less idea of whether there have been any changes in the proportions of the smaller sub-sectors since those are represented by relatively small slices of relatively tall columns.
When using stacked columns or stacked bars, the story being told will generally be more clear if the data is shown as percentages rather than as counts or absolute values. That makes it easy to scan horizontally (for stacked columns) or vertically (for stacked bars) to see differences. For example, while not perfect for other reasons we will touch on shortly, Figure 19 of the same report uses stacked columns showing percentages to illustrate the breakdown of sub-sectors by region and from that we can easily see that people working in post-production are concentrated primarily in Wellington, whereas production represents a large proportion of people working in the screen sector in all regions.
Image reproduced for purposes of education, criticism and commentary.
Lesson: Stacked column (and bar) charts usually work best when they show percentages rather than absolute values
Continuity
While Figure 19 is good in the sense that it represents the data as percentages rather than counts or absolute values it’s not as good as it could be in that the colours used to represent the different sub-sectors have changed from what they were earlier in the report, such as in Figure 3. That is a common problem in data communication, as we’ve seen in earlier posts. It can occur when different people work on the same output or even if the same person works on it at different times. It often occurs because people rely on software defaults, which are a function of the order the data is in and sometimes the particular theme or template a person has on their computer.
No matter how or why this shift in colour assignment happens it’s as disruptive as it would be if the colours of the costumes the characters in a TV show or movie you were watching changed part way through for no apparent reason. In our data communication, as in film or TV production, we should take care to avoid that.
That maintenance of consistency is called continuity in the screen industry. For example, besides noticing if a costume has changed colour from one scene to the next without explanation we would also be likely to notice if an object is in a different place. Similarly, in data communication the idea of continuity applies to order as well as colour. Once we have established a particular order for something, such as the sub-sectors in this report, maintaining it makes it easier for viewers to understand what they are looking at in a given chart and to make comparisons across charts.
For example, like Figure 3, Figure 6 shows the breakdown of sub-sectors, but this time by workers rather than by firms. It’s interesting to compare and contrast the two, but if you visually scan back and forth between Figures 3 and 6 you can see it’s not that easy to do. Part of that is because both use counts rather than percentages, as described previously, but it’s also because the order of the sub-sectors has changed. Maintaining continuity when it comes to the order of the sub-sectors across both charts would have improved the experience of the viewer.
Image reproduced for purposes of education, criticism and commentary.
Lesson: Once you’ve established a colour scheme or an order in which to show different groups, categories, etc., maintain it unless there is a very good reason not to
Two (or more) charts are often better than one
Just as filmmakers use different scenes to show us different insights into characters, we can use different charts to show different insights derived from data. The stacked column charts shown in the current versions of Figures 3 and 6 each show two different insights: 1) total growth in firms or people working in the New Zealand screen industry, and 2) changes to the proportional breakdown of firms or people by sub-sector.
There are many similar situations in data communication. For example, we might want to show how the number of customers or clients has changed and how that breaks down by region, age, income, etc.
In all of those situations it generally works better to use a chart with solid bars or columns first to show the change in the absolute value or count of whatever we are focussed on and then follow that up with a stacked bar or column chart showing the proportional breakdown than to try to do both at once as happens in the current versions of Figures 3 and 6. The first chart establishes the overall change and then the second one shows whether that is being driven disproportionately by particular sub-groups. Additional stacked bar or column charts can be used to show additional breakdowns.
Lesson: If you are trying to communicate multiple insights consider using multiple charts
Those of us trying to communicate data-driven insights are like filmmakers and TV producers in that we are trying to create an engaging narrative. We can learn from them in taking care to ensure the story we tell is clear, maintains continuity when it comes to things such as colour and order, and is not unnecessarily complicated to follow. We should carefully attend to those details because in data communication, as in filmmaking, Fincher's 'obsessiveness' is really true professionalism.