Guidelines for Exploration

10.5. Guidelines for Exploration

So far in this chapter, we have: introduced the notion of feature types; seen how the feature type can help to figure out what plot to make; and described how to read distributions and relationships in a visualization. EDA relies on building these skills and flexibly developing your understanding of the data.

You have seen EDA in action already in Chapter 9 when we developed checks for data quality and feature transformations to improve their usefulness in data analysis. Below are a set of questions to guide you when making plots to explore the data.

  • How are the values of Feature X distributed?

  • How do Feature X and Feature Y relate to each other?

  • Is the distribution of Feature X the same across subgroups defined by Feature Z?

  • Are there any unusual observations in X? In the combination of (X,Y)? In X for a subgroup of Z?

As you answer each of these questions, it is important to tie your answer back to the features measured and the context. It is also important to adopt an active, inquisitive approach to the investigation. To guide your explorations ask yourself “what next” and “so what” questions, such as the following.

  • Do you have reason to expect that one group/observation might be different?

  • Why might your finding about shape matter?

  • What additional comparison might bring added value to the investigation?

  • Are there any potentially important features to create comparisons with/against?

We put these guidelines into practice with a concrete example of EDA next.