{ "cells": [ { "cell_type": "code", "execution_count": 120, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import sys\n", "import os\n", "if not any(path.endswith('textbook') for path in sys.path):\n", " sys.path.append(os.path.abspath('../../..'))\n", "from textbook_utils import *" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Transforming\n", "\n", "Data scientists transform dataframe columns when they need to change each value\n", "in a feature in the same way. For example, if a feature contains heights of\n", "people in feet, a data scientist might want to transform the heights to\n", "centimeters. In this section, we'll introduce *apply*, an operation that\n", "transforms columns of data using a user-defined function:" ] }, { "cell_type": "code", "execution_count": 121, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Name | \n", "Sex | \n", "Count | \n", "Year | \n", "
---|---|---|---|---|
0 | \n", "Liam | \n", "M | \n", "19659 | \n", "2020 | \n", "
1 | \n", "Noah | \n", "M | \n", "18252 | \n", "2020 | \n", "
2 | \n", "Oliver | \n", "M | \n", "14147 | \n", "2020 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2020719 | \n", "Verona | \n", "F | \n", "5 | \n", "1880 | \n", "
2020720 | \n", "Vertie | \n", "F | \n", "5 | \n", "1880 | \n", "
2020721 | \n", "Wilma | \n", "F | \n", "5 | \n", "1880 | \n", "
2020722 rows × 4 columns
\n", "\n", " | Name | \n", "Sex | \n", "Count | \n", "Year | \n", "Firsts | \n", "
---|---|---|---|---|---|
0 | \n", "Liam | \n", "M | \n", "19659 | \n", "2020 | \n", "L | \n", "
1 | \n", "Noah | \n", "M | \n", "18252 | \n", "2020 | \n", "N | \n", "
2 | \n", "Oliver | \n", "M | \n", "14147 | \n", "2020 | \n", "O | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2020719 | \n", "Verona | \n", "F | \n", "5 | \n", "1880 | \n", "V | \n", "
2020720 | \n", "Vertie | \n", "F | \n", "5 | \n", "1880 | \n", "V | \n", "
2020721 | \n", "Wilma | \n", "F | \n", "5 | \n", "1880 | \n", "W | \n", "
2020722 rows × 5 columns
\n", "\n", " | Firsts | \n", "Year | \n", "Count | \n", "
---|---|---|---|
0 | \n", "A | \n", "1880 | \n", "16740 | \n", "
1 | \n", "A | \n", "1881 | \n", "16257 | \n", "
2 | \n", "A | \n", "1882 | \n", "18790 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
3638 | \n", "Z | \n", "2018 | \n", "55996 | \n", "
3639 | \n", "Z | \n", "2019 | \n", "55293 | \n", "
3640 | \n", "Z | \n", "2020 | \n", "54011 | \n", "
3641 rows × 3 columns
\n", "