{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "if not any(path.endswith('textbook') for path in sys.path):\n",
    "    sys.path.append(os.path.abspath('../../..'))\n",
    "from textbook_utils import *"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(sec:sql_joining)=\n",
    "# Joining\n",
    "\n",
    "To connect records between two data tables, SQL relations can be joined together similar to dataframes. In this section, we introduce SQL joins to replicate our analysis of the baby names data. Recall that {numref}`Chapter %s <ch:pandas>` mentions a _New York Times_ article that talks about how certain name categories, like mythological and baby boomer names, have become more or less popular over time.\n",
    "\n",
    "We've taken the names and categories in the _NYT_ article and put them in a small relation named `nyt`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up connection to database\n",
    "import sqlalchemy\n",
    "db = sqlalchemy.create_engine('sqlite:///babynames.db')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>nyt_name</th>\n",
       "      <th>category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Lucifer</td>\n",
       "      <td>forbidden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Lilith</td>\n",
       "      <td>forbidden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Danger</td>\n",
       "      <td>forbidden</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Venus</td>\n",
       "      <td>celestial</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Celestia</td>\n",
       "      <td>celestial</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Skye</td>\n",
       "      <td>celestial</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>23 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    nyt_name   category\n",
       "0    Lucifer  forbidden\n",
       "1     Lilith  forbidden\n",
       "2     Danger  forbidden\n",
       "..       ...        ...\n",
       "20     Venus  celestial\n",
       "21  Celestia  celestial\n",
       "22      Skye  celestial\n",
       "\n",
       "[23 rows x 2 columns]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT *\n",
    "FROM nyt;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "\n",
    "Notice that the preceding code runs a query on `babynames.db`, the same database\n",
    "that contains the larger `baby` relation from the previous sections. SQL\n",
    "databases can hold more than one relation, making them very useful when we need\n",
    "to work with many data tables at once. CSV files, on the other hand, typically\n",
    "contain one data table each---if we perform a data analysis that uses 20\n",
    "data tables, we might need to keep track of the names, locations, and versions\n",
    "of 20 CSV files. Instead, it could be simpler to store all the data tables\n",
    "in a SQLite database stored in a single file.\n",
    "\n",
    ":::"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see how popular the categories of names are, we join the `nyt` relation with\n",
    "the `baby` relation to get the name counts from `baby`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inner Joins"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As in Chapter 6, we've made smaller versions of the `baby` and `nyt` tables so that it's easier to\n",
    "see what happens when we join tables together. The relations are called\n",
    "`baby_small` and `nyt_small`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Noah</td>\n",
       "      <td>M</td>\n",
       "      <td>18252</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Julius</td>\n",
       "      <td>M</td>\n",
       "      <td>960</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Karen</td>\n",
       "      <td>M</td>\n",
       "      <td>6</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Karen</td>\n",
       "      <td>F</td>\n",
       "      <td>325</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Noah</td>\n",
       "      <td>F</td>\n",
       "      <td>305</td>\n",
       "      <td>2020</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name Sex  Count  Year\n",
       "0    Noah   M  18252  2020\n",
       "1  Julius   M    960  2020\n",
       "2   Karen   M      6  2020\n",
       "3   Karen   F    325  2020\n",
       "4    Noah   F    305  2020"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT *\n",
    "FROM baby_small;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>nyt_name</th>\n",
       "      <th>category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Karen</td>\n",
       "      <td>boomer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Julius</td>\n",
       "      <td>mythology</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Freya</td>\n",
       "      <td>mythology</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  nyt_name   category\n",
       "0    Karen     boomer\n",
       "1   Julius  mythology\n",
       "2    Freya  mythology"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT *\n",
    "FROM nyt_small;\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To join relations in SQL, we use the `INNER JOIN` clause to say which tables we want to join and the `ON` clause to specify a predicate for joining the tables. Here's an example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "      <th>Year</th>\n",
       "      <th>nyt_name</th>\n",
       "      <th>category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Julius</td>\n",
       "      <td>M</td>\n",
       "      <td>960</td>\n",
       "      <td>2020</td>\n",
       "      <td>Julius</td>\n",
       "      <td>mythology</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Karen</td>\n",
       "      <td>M</td>\n",
       "      <td>6</td>\n",
       "      <td>2020</td>\n",
       "      <td>Karen</td>\n",
       "      <td>boomer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Karen</td>\n",
       "      <td>F</td>\n",
       "      <td>325</td>\n",
       "      <td>2020</td>\n",
       "      <td>Karen</td>\n",
       "      <td>boomer</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name Sex  Count  Year nyt_name   category\n",
       "0  Julius   M    960  2020   Julius  mythology\n",
       "1   Karen   M      6  2020    Karen     boomer\n",
       "2   Karen   F    325  2020    Karen     boomer"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT *\n",
    "FROM baby_small INNER JOIN nyt_small\n",
    "  ON baby_small.Name = nyt_small.nyt_name\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that this result is the same as doing an inner join in `pandas`: the new table has the columns of both the `baby_small` and `nyt_small`\n",
    "tables. The rows with the name Noah are gone, and the remaining rows have their matching `category` from `nyt_small`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To join two tables together, we tell SQL the column(s) from each\n",
    "table that we want to use to  do the join using a predicate with the `ON` keyword.\n",
    "SQL matches rows together when the values in the joining columns fulfill the predicate, as\n",
    "shown in {numref}`fig:sql-inner-join`.\n",
    "\n",
    "```{figure} figures/sql-inner-join.svg\n",
    "---\n",
    "name: fig:sql-inner-join\n",
    "alt: sql-inner-join\n",
    "---\n",
    "Joining two tables together with SQL\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unlike `pandas`, SQL gives more flexibility on how rows are joined. The `pd.merge()` method can only join using simple equality, but the predicate in the `ON` clause can be arbitrarily complex. As an example, we take advantage of this extra versatility in {numref}`Section %s <sec:pa_collocated>`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Left and Right Joins"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like `pandas`, SQL also supports left joins. Instead of saying `INNER JOIN`, we use `LEFT JOIN`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "      <th>Year</th>\n",
       "      <th>nyt_name</th>\n",
       "      <th>category</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Noah</td>\n",
       "      <td>M</td>\n",
       "      <td>18252</td>\n",
       "      <td>2020</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Julius</td>\n",
       "      <td>M</td>\n",
       "      <td>960</td>\n",
       "      <td>2020</td>\n",
       "      <td>Julius</td>\n",
       "      <td>mythology</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Karen</td>\n",
       "      <td>M</td>\n",
       "      <td>6</td>\n",
       "      <td>2020</td>\n",
       "      <td>Karen</td>\n",
       "      <td>boomer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Karen</td>\n",
       "      <td>F</td>\n",
       "      <td>325</td>\n",
       "      <td>2020</td>\n",
       "      <td>Karen</td>\n",
       "      <td>boomer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Noah</td>\n",
       "      <td>F</td>\n",
       "      <td>305</td>\n",
       "      <td>2020</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     Name Sex  Count  Year nyt_name   category\n",
       "0    Noah   M  18252  2020     None       None\n",
       "1  Julius   M    960  2020   Julius  mythology\n",
       "2   Karen   M      6  2020    Karen     boomer\n",
       "3   Karen   F    325  2020    Karen     boomer\n",
       "4    Noah   F    305  2020     None       None"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT *\n",
    "FROM baby_small LEFT JOIN nyt_small\n",
    "  ON baby_small.Name = nyt_small.nyt_name\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we might expect, the \"left\" side of the join refers to the table that appears on the left side of the `LEFT JOIN` keyword.\n",
    "We can see the `Noah` rows are kept in the resulting relation even when they don't have a match in the righthand relation.\n",
    "\n",
    "Note that SQLite doesn't support\n",
    "right joins directly, but we can perform the same join by swapping the order of relations, then using `LEFT JOIN`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>nyt_name</th>\n",
       "      <th>category</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Count</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Karen</td>\n",
       "      <td>boomer</td>\n",
       "      <td>Karen</td>\n",
       "      <td>F</td>\n",
       "      <td>325.0</td>\n",
       "      <td>2020.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Karen</td>\n",
       "      <td>boomer</td>\n",
       "      <td>Karen</td>\n",
       "      <td>M</td>\n",
       "      <td>6.0</td>\n",
       "      <td>2020.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Julius</td>\n",
       "      <td>mythology</td>\n",
       "      <td>Julius</td>\n",
       "      <td>M</td>\n",
       "      <td>960.0</td>\n",
       "      <td>2020.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Freya</td>\n",
       "      <td>mythology</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  nyt_name   category    Name   Sex  Count    Year\n",
       "0    Karen     boomer   Karen     F  325.0  2020.0\n",
       "1    Karen     boomer   Karen     M    6.0  2020.0\n",
       "2   Julius  mythology  Julius     M  960.0  2020.0\n",
       "3    Freya  mythology    None  None    NaN     NaN"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT *\n",
    "FROM nyt_small LEFT JOIN baby_small\n",
    "  ON baby_small.Name = nyt_small.nyt_name\n",
    "'''\n",
    "\n",
    "pd.read_sql(query, db)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "SQLite doesn't have a built-in keyword for outer joins. In cases where\n",
    "an outer join is needed, we have to either use a different SQL engine or perform an outer join via `pandas`. However, in our (the authors') experience, outer joins are rarely used in practice compared to inner and left joins."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example: Popularity of NYT Name Categories"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's return to the full `baby` and `nyt` relations."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to know how the popularity of name categories in `nyt` has changed\n",
    "over time. To answer this question, we should:\n",
    "\n",
    "1. Inner join `baby` with `nyt`, matching rows where the names are equal.\n",
    "2. Group the table by `category` and `Year`.\n",
    "3. Aggregate the counts using a sum:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>category</th>\n",
       "      <th>Year</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>boomer</td>\n",
       "      <td>1880</td>\n",
       "      <td>292</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>boomer</td>\n",
       "      <td>1881</td>\n",
       "      <td>298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>boomer</td>\n",
       "      <td>1882</td>\n",
       "      <td>326</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>647</th>\n",
       "      <td>mythology</td>\n",
       "      <td>2018</td>\n",
       "      <td>2944</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>648</th>\n",
       "      <td>mythology</td>\n",
       "      <td>2019</td>\n",
       "      <td>3320</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>649</th>\n",
       "      <td>mythology</td>\n",
       "      <td>2020</td>\n",
       "      <td>3489</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>650 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      category  Year  count\n",
       "0       boomer  1880    292\n",
       "1       boomer  1881    298\n",
       "2       boomer  1882    326\n",
       "..         ...   ...    ...\n",
       "647  mythology  2018   2944\n",
       "648  mythology  2019   3320\n",
       "649  mythology  2020   3489\n",
       "\n",
       "[650 rows x 3 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = ''' \n",
    "SELECT\n",
    "  category,\n",
    "  Year,\n",
    "  SUM(Count) AS count           -- [3]\n",
    "FROM baby INNER JOIN nyt        -- [1]\n",
    "  ON baby.Name = nyt.nyt_name   -- [1]\n",
    "GROUP BY category, Year         -- [2]\n",
    "'''\n",
    "\n",
    "cate_counts = pd.read_sql(query, db)\n",
    "cate_counts"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The numbers in square brackets (`[1]`, `[2]`, `[3]`) in the preceding query show how each step in our plan maps to the parts of the SQL query. The code re-creates the dataframe from {numref}`Chapter %s <ch:pandas>`, where we created plots to verify the claims of the _New York Times_ article. For brevity, we omit duplicating the plots here."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{note}\n",
    "\n",
    "Notice that in the SQL code in this example, the numbers appear out of order---`[3]`, `[1]`, then `[2]`. As a rule of thumb for first-time SQL learners, we can often think of the `SELECT` statement as the *last* piece of the query to execute even though it appears first.\n",
    "\n",
    ":::"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we introduced joins for relations. When joining relations together, we match rows using the `INNER JOIN` or `LEFT JOIN` keyword and a boolean predicate. In the next section, we'll explain how to transform values in a relation."
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}