1.3. Summary#

The data science lifecycle provides an organizing structure for this book. We keep the lifecycle in mind as we work with many datasets from a wide range of sources, including science, medicine, politics, social media, and government. The first time we use a dataset, we provide the context in which the data were collected, the question of interest in examining the data, and descriptions needed to understand the data. In this way, we aim to practice good data science throughout the book.

The first stage of the lifecycle—asking a question—is often seen in books as a question that requires an application of a technique to get a number, such as “What’s the p-value for this A/B test?” Or a vague question that is often seen in practice, like “Can we restore the American Dream?” Answering the first sort of question gives little practice in developing a research question. Answering the second is hard to do without guidance on how to turn a general area of interest into a question that can be answered with data. The interplay between asking a question and understanding the limitations of data to answer it is the topic of the next chapter.