Adding Context

11.5. Adding Context

We have used text in our graphs throughout this chapter to provide meaningful axis labels that include units of measurement, tickmark labels for categories, and titles. This is good practice when sharing a visualization more broadly. A good goal is to include enough context in a plot so it can standalone—a reader should be able to get the gist of the plot without needing to search for explanation elsewhere. That said, every element of a statistical graph should have a purpose. Superfluous text or plot features, often referred to as “chartjunk”, should be eliminated. In this section, we provide a brief overview of ways we can add helpful context to our plots and an example where we create a publication-ready plot by adding context.

Text context includes labels and captions. It is good practice to consistently use informative labels on tick marks and axes. For example, axis labels often benefit from including units of measurement. Our graphs should contain titles and legends when needed. Informative labels are especially important for plots that other people will see and interpret. However, even when we’re doing exploratory data analysis just for us we often want to include enough context so that when we return to an analysis we can easily figure out what we plotted.

Captions serve several purposes. They describe what has been plotted and orient the reader. Captions also point out important features of the plot and comment on their implications. It’s okay for the caption to repeat information found in the text. Readers often skim a publication and focus on section headings and visualizations so plot captions should be self-contained.

Reference markers bring additional context to the plotting region. Reference points and lines that provide benchmarks, historical values, and other external information help form comparisons and interpretations. For example, we often add a reference line with slope 1 to a quantile-quantile plot. We might also add a vertical line on a time-series plot to mark a special event, like a natural disaster.

The following example demonstrates how to add these context elements to a plot.

11.5.1. Example: 100m Sprint Times

The figure below shows the race times in the men’s 100-meter sprint since 1968. These data include only races that were electronically timed and held outdoors in normal wind conditions, and the times included are only for those runners’ that came in under 10 seconds. The plot is a basic scatter plot showing race time against year. Beginning with this plot, we augment it to create a plot featured in a FiveThirtyEight article about the Olympic 100-meter sprint.

fig = px.scatter(run, x='year', y='time', hover_name='name',
                 width=450, height=250)
fig
../../_images/viz_context_11_0.svg

When we want to prepare a plot for other people to read, we consider the plot’s takeaway message. In this case, our main message is two-fold: the best runners have been getting faster over the past 50 years, and Usain Bolt’s remarkable record time of 9.58 seconds set in 2009 remains untouched. (In fact, the second-best race time also belongs to Bolt.) We provide context to this plot by adding a title that directly states the main takeaway, units of measurement in the y-axis label, and annotations to key points in the scatter plot, including the two best race times that belong to Usain Bolt. In addition, we add a horizontal reference line at 10 seconds to clarify that only times below 10 seconds are plotted, and we use a special symbol for the world record time to draw the reader’s attention to this crucial point.

title_100 = to_plotly_text('''No one is coming close to Usain Bolt's best times
<sup>All times under 10 seconds in the outdoor men's 100-meter sprint,
included are times from only electronic readings and under regular wind conditions</sup>''')

fig = px.scatter(run, x='year', y='time', hover_name='name',
                 width=550, height=450)
fig.update_yaxes(range=[9.5, 10.05], title="time (seconds)")
fig.update_xaxes(range=[1966, 2026], title="race date")
margin(fig, t=70)
fig.update_layout(title=dict(text=title_100, y=0.95, x=0.1, xanchor='left'))

fig.add_hline(y=10.0, line_width=2, line_dash="dash", line_color="black")


hines = to_plotly_text('''
At the 1968 Mexico City
Games, American
Jim Hines was
the first sprinter to break
the 10-second barrier
''')

anno_params=dict(bgcolor='white',
                 borderpad=3,
                 arrowcolor='black',
                 showarrow=True,
                 arrowwidth=1.5)

fig.add_annotation(x=1968, y=9.95,
                   ax=60,
                   ay=50,
                   xshift=3,
                   yshift=-5,
                   yanchor='top',
                   text=hines,
                   align='left',
                   **anno_params)

fig.add_annotation(x=2009, y=9.58,
                   ax=-30,
                   ay=-40,
                   xshift=-3,
                   yshift=5,
                   xanchor='right',
                   text='Usain Bolt set the world<br>record in 2009...',
                   align='right',
                   **anno_params)

fig.add_annotation(x=2012, y=9.63,
                   ax=20,
                   ay=30,
                   xshift=4,
                   yshift=-4,
                   xanchor='left',
                   text='...and the<br>Olympic<br>record in 2012',
                   align='left',
                   **anno_params)

fig.show()
../../_images/viz_context_13_0.svg

These bits of context describe what we have plotted, help readers see the main takeaway, and point out several interesting features in the data. The plot can now be a useful part of a slideshow, technical report, or social media post. In our experience, people who look at our data analyses remember our plots, not paragraphs of text or equations. It’s important to go the extra mile and add context to the plots we prepare for others.

In the next section, we move on to specifics on how to create plots using the plotly Python package.