On average, considering variation leads to better decisions
Garrison Keillor describes the people of his fictional town, Lake Wobegon, by saying "All the women are strong, all the men are good-looking, and all the children are above average." While the last claim is impossible and the first two are unlikely, even for a fictional town, the statement illustrates our tendency to focus on aggregate statistics such as totals and averages when we summarise things.
Aggregate statistics are important, but if you only examine overall measures such as those without also checking the extent and nature of underlying variation you may miss important pieces of the puzzle you are trying to put together. That’s particularly the case when what you are describing is less homogeneous than the people of Lake Wobegon.
A recently released report on community engagement done by the New Zealand Transport Agency (NZTA) regarding proposed changes to how State Highway 1 travels through Wellington demonstrates the value of looking into the extent and nature of variation rather than just at aggregate statistics. It does that by breaking down the data a variety of ways using a consistent chart type to make it easy to compare different results to get a clear understanding of the overall situation. Whatever your opinion about the changes proposed for State Highway 1, the report is an example of good data communication.
Aggregate measures sometimes don’t show the full picture
The report summarises results of a survey that asked about five specific changes being considered to the portion of State Highway 1 that runs through Wellington City. Community members who took the opportunity to provide feedback were asked to indicate how they believe the changes overall, and each change individually, would affect them personally and the Wellington region more generally.
Results for each of those two measures for the full set of changes collectively are shown in aggregate in Figure 5-1.
Image reproduced for purposes of education, criticism and commentary.
Importantly though, the report also includes the results for each measure based on where survey participants live, as shown in Figure 5-2 below for how survey respondents think the whole programme of changes would affect them personally.
Image reproduced for purposes of education, criticism and commentary.
If we only looked at the first figure we would get a sense that the community is somewhat evenly split on whether the project overall would make things better (41%) or worse (49%) for them personally, but by examining the more granular geographic breakdown of results it becomes clear that the people who believe the changes would make things better disproportionately live outside of Wellington City or in the northern and western suburbs while the people who believe the changes would make things worse for them live in the CBD and the southern and eastern suburbs.
This is important information for policy makers to consider given the changes proposed would occur in the CBD and the southern and eastern suburbs. In other words, the people most likely to be most directly affected by the changes during and after implementation were disproportionately likely to believe the changes would make things worse for them personally.
In this situation a lot of variation in attitudes toward the proposed changes can be explained by where people live, but characteristics such as age, gender, and income, may account for variation in other metrics. It’s always worth checking for such differences rather than just focussing on aggregate measures such as totals and overall averages when using data to make decisions.
Lesson: Aggregated results may hide a lot of variation across different characteristics. Check for such differences, and when they exist show and explain the variation as well as the aggregated results.
Make comparison easy
Previous posts have shown examples of inconsistency in how data insights are communicated. This report shows the benefit of consistency when it comes to choices such as the type of chart, the order of data series, and the colour scheme.
Both Figure 5-1 and 5-2 use stacked bar charts, ordered to show responses from much worse to much better, and with the same colours used to represent each possible response option. Similar charts were used to show perceived personal and Wellington-wide effects toward each change considered. For example, Figure 5-5 below shows how responses varied for each change under consideration.
Image reproduced for purposes of education, criticism and commentary.
A stacked bar was a good choice for this data because the response options for each question are mutually exclusive and some labels for different projects and locations are long. Having selected an optimal chart type for the situation, keeping everything else constant makes it very easy to make comparisons between charts as well as within them. It means that someone reading the report can concentrate on how attitudes change depending on where a person lives or what part of the project is being considered rather than forcing them to waste mental bandwidth trying to orient themselves around each new chart because it’s designed slightly differently.
Lesson: Using charts of the same type and with the same order and colour scheme helps facilitate comparisons
Unlike the fictional, reportedly highly homogeneous Lake Wobegon, variation is common in the real world and the data that represents it. Understanding that variation rather than focussing exclusively on aggregate metrics won’t make decisions such as whether to change a state highway easy, but it will, on average, result in better decisions.
Clearly communicating when conclusions may be complicated or contentious
Only five out of the eighty-nine countries and regions included in the World Mortality Dataset had fewer excess deaths than New Zealand over the pandemic period. That might seem unremarkable to people who lived through New Zealand’s lockdowns or to those elsewhere who read about them — except that, indexed across the full period from 2020 to 2023, New Zealand's Covid restrictions were actually less stringent than those of almost every country it is typically compared to, including Sweden, the United Kingdom, Australia, Italy, and the United States. Whether that finding surprises you or confirms what you already suspected, you don't have to take anyone's word for it. The data is right there for you to check yourself — and that is no accident.
It’s part of a recently released series of reports from the New Zealand Royal Commission examining lessons learned about the country’s experience with Covid 19. The pandemic and its aftermath are things that all adults experienced and nearly all have opinions about. It also influenced and was influenced by a complicated web of public health, economic, and legal policies and practices. In situations like this, where the data and related analysis are complex, and many people have strong opinions about, or vested interests in the conclusions, the stakes and potential for a contentious reception are high.
In communicating the results of its investigation into this period of New Zealand’s history to provide lessons to guide responses to pandemics the country may face in the future, the Royal Commission also did two things that provide important lessons for those trying to communicate data-intensive insights in the future.
Divide and conquer
Somewhat unusually, the final outputs from the Commission’s work were delivered in not one, but three reports. The main report is mainly text that systematically lays out conclusions and recommendations about different aspects of the pandemic response. Two separate reports summarise submissions made to the Commission by the public and provide a curated collection of relevant publicly available data.
The one summarising the submissions includes some photos of, and direct quotes from the people who made submissions, showing how the pandemic affected them personally. The data-oriented report includes many charts and graphs showing how New Zealand compared to other countries on a variety of measures and also tracks changes to various measures before, during, and after the pandemic.
The information in those two separate reports inform the conclusions and recommendations in the main report, but each output is hundreds of pages individually, so combining them without removing any content would have created a document of over one thousand pages. Even the most interested parties are unlikely to want to read that.
Faced with a situation like this, it helps to split things up. In some cases that might mean dividing content by topic. For example, in this case maybe separating the health response from the economic response from the legal response. Those things were all inter-twined though, so it makes sense that’s not the option that the authors chose.
Another common way of dealing with this type of problem is to make different reports for different target audiences. In this situation that might have meant one report for policy makers and another for the general public. While that’s often a good solution, the pandemic is a rare situation in that it’s one where anyone reading anything about it is likely to have some direct personal experience with it, and yet there is probably no one who is an expert in all aspects of it. Health professionals don’t understand the nuances of the economic issues that had to be addressed and vice-versa, and everyone, regardless of their professional perspective, also had personal experiences.
Splitting the content the way the Commission did made it easy for politicians and policy makers to focus on the main document to consider possible lessons for the future (and to try to attribute blame for past decisions). It also demonstrated that the Royal Commission listened to and heard the many people who made submissions to it, which is particularly important given how intense, and often sceptical, feelings around the pandemic are. Finally, it allowed people who want to dive into the detailed and comparative data to easily find that. We will look more closely at some of that data ourselves next.
Lesson: If you have a lot of data-intensive information to communicate or you are trying to communicate data-intensive information to multiple audiences with different needs, consider using multiple outputs rather than trying to create one that is intended to be everything for everyone.
Anticipate assumptions, hypotheses and objections
Focusing now on the data-oriented ‘Covid by the Numbers’ supplementary report, we can see another smart decision. That was to make it easy for people to check their own assumptions, hypotheses and objections against the actual data.
People looking at any sort of data-oriented output often have their own views about the phenomena being examined. That’s certainly true of Covid. In some cases the views may be implicit assumptions, but in others they may be explicit hypotheses about how one thing affected another. In this and other situations, we can think about our audience’s assumptions and hypotheses and anticipate objections they might have to insights being presented. I think of these as the ‘yeah, but…’ thoughts that form in people’s minds as they watch a presentation or read a report.
We can leverage our understanding of an audience’s assumptions, hypotheses and objections in constructing data-oriented outputs. The idea is that almost as soon as the ‘yeah, but…’ thought forms in the minds of the viewer or reader the next slide or the next page of the report provides the information needed to answer that question or address that concern.
In the case of Covid, a common type of ‘yeah, but…’ thought is likely to relate to other countries. For example, ‘Yeah, but Sweden didn’t have so many rules, and not that many people died there.’ Or ‘Yeah, but people in the UK were much freer to live their lives.’ The ‘Covid by the Numbers’ supplementary report addresses concerns such as those with a series of charts providing comparative data across many countries and highlighting the exact countries that are most likely to feature in ‘yeah, but…’ thoughts.
For example, Figure 23 shows excess mortality by country (going from least to most excess mortality, and also explaining what excess mortality is in a note at the bottom) and Figure 46 shows the stringency of Covid policies (going from most to least stringent, though it might have been better to always have the most desirable end of the scale at the top). Both use highlighted bars along with arrows and larger labels to make it easy to find comparator countries most likely to feature in ‘yeah, but…’ thoughts and therefore make it easy for readers to test their own assumptions, hypotheses and objections against the actual data.
Image reproduced for purposes of education, criticism and commentary.
Image reproduced for purposes of education, criticism and commentary.
Lesson: Consider implicit assumptions, explicit hypotheses, or possible objections your audience is likely to have about the data you are trying to communicate and make it easy for them to test their assumptions, hypotheses and objections against the actual data.
Everyone had their own experience of the pandemic, and everyone is likely to have their own reaction to these reports — including to the findings shown in this post. Whether New Zealand's combination of near-best excess mortality and, indexed over the full pandemic period, relatively low policy stringency strikes you as expected, surprising, or still not the whole story, the Commission's choices mean that you are not left arguing from memory or anecdote. The data is there, clearly laid out, and deliberately designed to let you test your assumptions against the evidence. That is exactly what good data communication makes possible — and exactly what we can aim for when we need to communicate clearly about things that are complicated or contentious.
Tables may not be flashy, but they’re often very useful
When faced with data-intensive insights to communicate, common mistakes are to go for a cool new type of chart you’ve recently seen, to include a variety of different visualisations to ‘mix things up’, or to rely too heavily on a single type of visualisation.
It’s helpful to think of the best way to communicate insights derived from data the same way a tradesperson might think about their tools, and select the right one for the job at hand. While they may not be the flashiest tool in the data toolbox, tables are often a good visualisation option, and can be the best one.
This example from Tourism New Zealand helps illustrate why that’s the case. The table shows data related to international visitors' arrivals in New Zealand. It is part of an interactive online dashboard and includes many more countries and territories than can be seen in this screenshot (237 in total). Data for the other countries is visible if you view the data online and scroll or download it.
This is good data communication because it aligns the visualisation type with audience needs.
Why tables work well when audience interests vary, and there are a lot of possible ways of aggregating and showing the data
If the target audience was only interested in the total number of visitor arrivals, that could easily be shown with a different chart type, such as a line chart showing arrivals over time; however it’s easy to imagine many reasons why people might want to be able to see the disaggregated data for specific countries. Something like a line chart, scatter plot, or bubble chart showing each country or territory individually would be far too cluttered and make it hard to discern precise values for specific countries.
Showing the data in this table format makes it possible for viewers to easily see the data for the countries that are of interest to them. That might vary depending on whether the user is a government policy analyst, someone working for an airline, the owner of a specific tourism-focussed business, etc.
Lesson: Consider what type of data communication will create the best user experience. When there are many possible cuts of the data and different audience members are interested in different aspects, tables often work better than charts.
Smart design choices reduce cognitive load
Good tables aren’t just about showing numbers in rows and columns. Notice the conditional formatting in this example: green shading shows year-over-year increases, with darker shades indicating larger gains. Australia’s substantial 148,116 increase appears in the darkest green. Red text flags decreases, such as the drop in visitor numbers from India and Fiji.
This formatting follows data visualisation conventions that most viewers intuitively understand: red typically signals decreases or concerning values, green indicates increases or positive values, and darker shades represent greater magnitude. These conventions reduce cognitive load - users don’t need to learn a new visual language for each dataset but can instead focus on the insights revealed from the data. Such formatting also lets users quickly spot patterns and outliers without needing to manually compare numbers across rows.
While it’s not obvious from the screenshot, clicking on the titles of any of the columns in the table enables you to sort on that metric and the dots in the upper right corner lead to more options, including filtering. Those features enable users to see the data in the way that’s most helpful for them with minimal effort. For example, a capacity analyst at an airline might want to sort the data as they are by number of arrivals. In contrast, a policy analyst documenting the effectiveness of covid recovery policies might sort on the final column or someone with a tourism business focussing on visitors from just a few countries could filter the data to show only those countries.
Lesson: Good design reduces the work viewers must do. Features like conditional formatting should help users spot patterns instantly, while sorting and filtering capabilities let users organise data for their specific questions rather than forcing them to search through irrelevant information.
When tables work - and when they don’t
For all their strengths in an interactive dashboard context like this one, tables are normally much less effective in a presentation setting. Imagine projecting this table in a conference room. The font would be too small to read if you showed all countries at once. If you enlarged the font and spread the data across multiple slides, audiences couldn’t easily find the countries of greatest interest to them unless the countries were arranged in alphabetical order. But alphabetical ordering would obscure the patterns revealed by metric-based sorting. The difference comes down to how people interact with the data. Dashboard users can sort, filter, scroll, and spend time with individual data points that matter to them. Presentation audiences are passive viewers who see whatever the presenter shows them, in whatever order, for however long the slide stays on screen. Different contexts demand different approaches.
If the data had to be communicated in a presentation, it would be better to focus on the more aggregated data and the overall trends and patterns in that, and then distribute a detailed table via something like a handout or a link. A link would make it possible to offer the types of interactivity described previously, but even in a handout the data could be shown in different orders (e.g., alphabetical, greatest to fewest arrivals, greatest year on year change, etc.) to aid usability.
Lesson: Context determines effectiveness. The same data often requires different visualisations for different viewing situations. Design for how and where your audience will actually view the data.
Choose the right tool for the job
As will be discussed in subsequent posts, there are many things that can be done to polish data visualisations and other communications, but the starting point should be choosing the right type of data visualisation for the audience, the data, and the delivery context. Tables may not be the flashiest choice, but there are many situations where they are the right tool for the job.