{ "cells": [ { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "import sys\n", "import os\n", "if not any(path.endswith('textbook') for path in sys.path):\n", " sys.path.append(os.path.abspath('../../..'))\n", "from textbook_utils import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "(ch:pandas_aggregating)=\n", "# Aggregating\n", "\n", "This section introduces operations for aggregating rows in a dataframe. Data\n", "scientists aggregate rows together to make summaries of data. For instance, a\n", "dataset containing daily sales can be aggregated to show monthly sales instead.\n", "This section introduces *grouping* and *pivoting*, two common operations\n", "for aggregating data.\n", "\n", "We work with the baby names data, as introduced in the previous section:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Name | \n", "Sex | \n", "Count | \n", "Year | \n", "
---|---|---|---|---|
0 | \n", "Liam | \n", "M | \n", "19659 | \n", "2020 | \n", "
1 | \n", "Noah | \n", "M | \n", "18252 | \n", "2020 | \n", "
2 | \n", "Oliver | \n", "M | \n", "14147 | \n", "2020 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2020719 | \n", "Verona | \n", "F | \n", "5 | \n", "1880 | \n", "
2020720 | \n", "Vertie | \n", "F | \n", "5 | \n", "1880 | \n", "
2020721 | \n", "Wilma | \n", "F | \n", "5 | \n", "1880 | \n", "
2020722 rows × 4 columns
\n", "\n", " | name | \n", "
---|---|
0 | \n", "Eden | \n", "
1 | \n", "Sachit | \n", "
2 | \n", "Eden | \n", "
3 | \n", "Sachit | \n", "
4 | \n", "Sachit | \n", "
5 | \n", "Luke | \n", "
\n", " | \n", " | Count | \n", "
---|---|---|
Year | \n", "Sex | \n", "\n", " |
1880 | \n", "F | \n", "83929 | \n", "
M | \n", "110490 | \n", "|
1881 | \n", "F | \n", "85034 | \n", "
... | \n", "... | \n", "... | \n", "
2019 | \n", "M | \n", "1785527 | \n", "
2020 | \n", "F | \n", "1581301 | \n", "
M | \n", "1706423 | \n", "
282 rows × 1 columns
\n", "\n", " | Year | \n", "Sex | \n", "Count | \n", "
---|---|---|---|
0 | \n", "1880 | \n", "F | \n", "83929 | \n", "
1 | \n", "1880 | \n", "M | \n", "110490 | \n", "
2 | \n", "1881 | \n", "F | \n", "85034 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
279 | \n", "2019 | \n", "M | \n", "1785527 | \n", "
280 | \n", "2020 | \n", "F | \n", "1581301 | \n", "
281 | \n", "2020 | \n", "M | \n", "1706423 | \n", "
282 rows × 3 columns
\n", "\n", " | \n", " | Count | \n", "
---|---|---|
Year | \n", "Sex | \n", "\n", " |
1880 | \n", "F | \n", "83929 | \n", "
M | \n", "110490 | \n", "|
1881 | \n", "F | \n", "85034 | \n", "
... | \n", "... | \n", "... | \n", "
2019 | \n", "M | \n", "1785527 | \n", "
2020 | \n", "F | \n", "1581301 | \n", "
M | \n", "1706423 | \n", "
282 rows × 1 columns
\n", "Sex | \n", "F | \n", "M | \n", "
---|---|---|
Year | \n", "\n", " | \n", " |
1880 | \n", "83929 | \n", "110490 | \n", "
1881 | \n", "85034 | \n", "100738 | \n", "
1882 | \n", "99699 | \n", "113686 | \n", "
... | \n", "... | \n", "... | \n", "
2018 | \n", "1676884 | \n", "1810309 | \n", "
2019 | \n", "1651911 | \n", "1785527 | \n", "
2020 | \n", "1581301 | \n", "1706423 | \n", "
141 rows × 2 columns
\n", "