The Data Science Lifecycle

1. The Data Science Lifecycle

Data science is a rapidly evolving field. At the time of this writing people are still trying to pin down exactly what data science is, what data scientists do, and what skills data scientists should have. What we do know, though, is that data science uses a combination of methods and principles from statistics and computer science to work with and draw insights from data. And, learning computer science and statistics in combination makes us better data scientists. We also know that any insights we glean need to be interpreted in the context of the problem that we are working on.

This book covers fundamental principles and skills that data scientists need to help make all sorts of important decisions. With both technical skills and conceptual understanding we can work on data-centric problems to, say, assess whether a vaccine works, filter out fake news automatically, calibrate air quality sensors, and advise analysts on policy changes.

To help you keep track of the bigger picture, we’ve organized topics around a workflow that we call the data science lifecycle. In this chapter, we introduce this lifecycle. Unlike other data science books that tend to focus on one part of the lifecycle or address only computational or statistical topics, we cover the entire cycle from start to finish and consider both statistical and computational aspects together.