{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "if not any(path.endswith('textbook') for path in sys.path):\n",
    "    sys.path.append(os.path.abspath('../../..'))\n",
    "from textbook_utils import *"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sec:sql_transforming)=\n",
    "# Transforming and Common Table Expressions\n",
    "\n",
    "In this section, we show how to call functions to transform columns of data using built-in SQL functions. We also demonstrate how to use common table expressions to build up complex queries from simpler ones. As usual, we start by loading the database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up connection to database\n",
    "import sqlalchemy\n",
    "db = sqlalchemy.create_engine('sqlite:///babynames.db')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SQL Functions\n",
    "\n",
    "SQLite provides a variety of _scalar functions_, or functions that transform single data\n",
    "values. When called on a column of data, SQLite will apply these functions on\n",
    "each value in the column. In contrast, aggregation functions like `SUM` and\n",
    "`COUNT` take a column of values as input and compute a single value as output.\n",
    "\n",
    "SQLite provides a comprehensive list of the built-in scalar functions in [its online documentation][funcs]. For instance, to find the number of characters in each name, we use the\n",
    "`LENGTH` function:\n",
    "\n",
    "[funcs]: https://www.sqlite.org/lang_corefunc.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>LENGTH(Name)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Liam</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Noah</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Oliver</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Lucas</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Henry</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Alexander</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         Name  LENGTH(Name)\n",
       "0        Liam             4\n",
       "1        Noah             4\n",
       "2      Oliver             6\n",
       "..        ...           ...\n",
       "7       Lucas             5\n",
       "8       Henry             5\n",
       "9   Alexander             9\n",
       "\n",
       "[10 rows x 2 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT Name, LENGTH(Name)\n",
    "FROM baby\n",
    "LIMIT 10;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the `LENGTH` function is applied to each value within the `Name`\n",
    "column. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "\n",
    "Like aggregation functions, each implementation of SQL provides a different set\n",
    "of scalar functions. SQLite has a relatively minimal set of functions, while\n",
    "[PostgreSQL has many more][pg_funcs]. That said, almost all SQL implementations provide some\n",
    "equivalent to SQLite's `LENGTH`, `ROUND`, `SUBSTR`, and `LIKE` functions. \n",
    "\n",
    ":::\n",
    "\n",
    "[pg_funcs]: https://www.postgresql.org/docs/9.2/functions.html"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although scalar functions use the same syntax as an aggregation function, they behave differently. This\n",
    "can result in confusing output if the two are mixed together in a single query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>LENGTH(Name)</th>\n",
       "      <th>AVG(Count)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Liam</td>\n",
       "      <td>4</td>\n",
       "      <td>174.47</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Name  LENGTH(Name)  AVG(Count)\n",
       "0  Liam             4      174.47"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT Name, LENGTH(Name), AVG(Count)\n",
    "FROM baby\n",
    "LIMIT 10;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the `AVG(Name)` computes the average of the entire `Count` column, but the output is confusing---a reader could easily think the average is related to the name Liam. For this reason, we must be careful when scalar and aggregation functions\n",
    "appear together within a `SELECT` statement. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To extract the first letter of each name, we can use the `SUBSTR` function\n",
    "(short for _substring_). As described in the documentation, the `SUBSTR`\n",
    "function takes three arguments. The first is the input string, the second is\n",
    "the position to begin the substring (1-indexed), and the third is the\n",
    "length of the substring:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>SUBSTR(Name, 1, 1)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Liam</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Noah</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Oliver</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Lucas</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Henry</td>\n",
       "      <td>H</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Alexander</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         Name SUBSTR(Name, 1, 1)\n",
       "0        Liam                  L\n",
       "1        Noah                  N\n",
       "2      Oliver                  O\n",
       "..        ...                ...\n",
       "7       Lucas                  L\n",
       "8       Henry                  H\n",
       "9   Alexander                  A\n",
       "\n",
       "[10 rows x 2 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT Name, SUBSTR(Name, 1, 1)\n",
    "FROM baby\n",
    "LIMIT 10;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the `AS` keyword to rename the column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "      <th>Year</th>\n",
       "      <th>Firsts</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Liam</td>\n",
       "      <td>M</td>\n",
       "      <td>19659</td>\n",
       "      <td>2020</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Noah</td>\n",
       "      <td>M</td>\n",
       "      <td>18252</td>\n",
       "      <td>2020</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Oliver</td>\n",
       "      <td>M</td>\n",
       "      <td>14147</td>\n",
       "      <td>2020</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Lucas</td>\n",
       "      <td>M</td>\n",
       "      <td>11281</td>\n",
       "      <td>2020</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Henry</td>\n",
       "      <td>M</td>\n",
       "      <td>10705</td>\n",
       "      <td>2020</td>\n",
       "      <td>H</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Alexander</td>\n",
       "      <td>M</td>\n",
       "      <td>10151</td>\n",
       "      <td>2020</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         Name Sex  Count  Year Firsts\n",
       "0        Liam   M  19659  2020      L\n",
       "1        Noah   M  18252  2020      N\n",
       "2      Oliver   M  14147  2020      O\n",
       "..        ...  ..    ...   ...    ...\n",
       "7       Lucas   M  11281  2020      L\n",
       "8       Henry   M  10705  2020      H\n",
       "9   Alexander   M  10151  2020      A\n",
       "\n",
       "[10 rows x 5 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT *, SUBSTR(Name, 1, 1) AS Firsts\n",
    "FROM baby\n",
    "LIMIT 10;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After calculating the first letter of each name, our analysis aims to understand the popularity of first letters over time. To do this, we want to take the output of this SQL query and use it as a single step within a longer chain of operations."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SQL provides several options to break queries into smaller steps, which is helpful in more complex analyses like this one. The most common options for doing this are to create a new relation using a `CREATE TABLE` statement, create a new view using `CREATE VIEW`, or create a temporary relation using `WITH`. Each of these methods have different use-cases. For simplicity, we only describe the `WITH` statement in this section and suggest that readers look over the SQLite documentation for details."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multistep Queries Using a WITH Clause\n",
    "\n",
    "The `WITH` clause lets us assign a name to any `SELECT` query. Then we can\n",
    "treat that query as though it exists as a relation in the database just for the duration of the query.\n",
    "SQLite calls these temporary relations *common table expressions*.\n",
    "For\n",
    "instance, we can take the earlier query that calculates the first letter of each\n",
    "name and call it `letters`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "      <th>Year</th>\n",
       "      <th>Firsts</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Liam</td>\n",
       "      <td>M</td>\n",
       "      <td>19659</td>\n",
       "      <td>2020</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Noah</td>\n",
       "      <td>M</td>\n",
       "      <td>18252</td>\n",
       "      <td>2020</td>\n",
       "      <td>N</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Oliver</td>\n",
       "      <td>M</td>\n",
       "      <td>14147</td>\n",
       "      <td>2020</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Lucas</td>\n",
       "      <td>M</td>\n",
       "      <td>11281</td>\n",
       "      <td>2020</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Henry</td>\n",
       "      <td>M</td>\n",
       "      <td>10705</td>\n",
       "      <td>2020</td>\n",
       "      <td>H</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Alexander</td>\n",
       "      <td>M</td>\n",
       "      <td>10151</td>\n",
       "      <td>2020</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         Name Sex  Count  Year Firsts\n",
       "0        Liam   M  19659  2020      L\n",
       "1        Noah   M  18252  2020      N\n",
       "2      Oliver   M  14147  2020      O\n",
       "..        ...  ..    ...   ...    ...\n",
       "7       Lucas   M  11281  2020      L\n",
       "8       Henry   M  10705  2020      H\n",
       "9   Alexander   M  10151  2020      A\n",
       "\n",
       "[10 rows x 5 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "-- Create a temporary relation called letters by calculating\n",
    "-- the first letter for each name in baby\n",
    "WITH letters AS (\n",
    "  SELECT *, SUBSTR(Name, 1, 1) AS Firsts\n",
    "  FROM baby\n",
    ")\n",
    "-- Then, select the first ten rows from letters\n",
    "SELECT *\n",
    "FROM letters\n",
    "LIMIT 10;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`WITH` statements are very useful since they can be chained together. We can\n",
    "create multiple temporary relations in a `WITH` statement that each perform a\n",
    "bit of work on the previous result, which lets us gradually build complicated\n",
    "queries a step at a time. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Popularity of \"L\" Names"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use `WITH` statements to look at the popularity of names that start with the letter _L_ over time. We'll group the temporary `letters` relation by the\n",
    "first letter and year, then aggregate the `Count` column using a sum, then filter to get only names with the letter _L_:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Firsts</th>\n",
       "      <th>Year</th>\n",
       "      <th>Count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>L</td>\n",
       "      <td>1880</td>\n",
       "      <td>12799</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>L</td>\n",
       "      <td>1881</td>\n",
       "      <td>12770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>L</td>\n",
       "      <td>1882</td>\n",
       "      <td>14923</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>138</th>\n",
       "      <td>L</td>\n",
       "      <td>2018</td>\n",
       "      <td>246251</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139</th>\n",
       "      <td>L</td>\n",
       "      <td>2019</td>\n",
       "      <td>249315</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>140</th>\n",
       "      <td>L</td>\n",
       "      <td>2020</td>\n",
       "      <td>239760</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>141 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    Firsts  Year   Count\n",
       "0        L  1880   12799\n",
       "1        L  1881   12770\n",
       "2        L  1882   14923\n",
       "..     ...   ...     ...\n",
       "138      L  2018  246251\n",
       "139      L  2019  249315\n",
       "140      L  2020  239760\n",
       "\n",
       "[141 rows x 3 columns]"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "WITH letters AS (\n",
    "  SELECT *, SUBSTR(Name, 1, 1) AS Firsts\n",
    "  FROM baby\n",
    ")\n",
    "SELECT Firsts, Year, SUM(Count) AS Count\n",
    "FROM letters\n",
    "WHERE Firsts = \"L\"\n",
    "GROUP BY Firsts, Year;\n",
    "'''\n",
    "\n",
    "letter_counts = pd.read_sql(query, db)\n",
    "letter_counts"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This relation contains the same data as the one from {numref}`Chapter %s <ch:pandas>`. In that chapter, we make a plot of the `Count` column over time, which we omit here for brevity."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we introduced data transformations.\n",
    "To transform values in a relation, we commonly use SQL functions like\n",
    "`LENGTH()` or `SUBSTR()`.\n",
    "We also explained how to build up complex queries using the `WITH` clause."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}