{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "\n",
    "# A little bit of SQL(ite) \n",
    "### and very little Pandas\n",
    "<br><br>\n",
    "<div style=\"background-color:rgba(128, 128, 0, 0.1); text-align:left; vertical-align: middle; padding:20px 0;\">\n",
    "<p style=\"font-size:134%;color:Deep Teal;\">SC 4125: Developing Data Products</p>\n",
    "    <p style=\"font-size:100%;color:Olive;\">Module-1: Introduction</p><br>\n",
    "\n",
    "<br> \n",
    "by <a href=\"https://personal.ntu.edu.sg/anwitaman/\" style=\"font-size:100%;color:Deep Teal;\">Anwitaman DATTA</a><br>\n",
    "School of Computer Science and Engineering, NTU Singapore.        \n",
    "</div>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Teaching material\n",
    "- <a href=\"M1-Intro.pdf\">.pdf</a> deck of slides (complements the html slides)\n",
    "- <a href=\"M1-Intro.slides.html\">.html</a> deck of slides\n",
    "- <a href=\"M1-Intro.ipynb\">.ipynb</a> Jupyter notebook"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    " This is companion material for <a href=\"Module-01.pdf\">Module-1: Introduction</a> lecture for the <b>Developing Data Products</b> course.<br>\n",
    "    \n",
    "The executable source <a href=\"M1-Examples.ipynb\">.ipynb notebook</a> file for these HTML slides, and the <a href=\"CountriesDB.db\">CountriesDB.db</a> file that goes with it can be found through the links.  \n",
    "\n",
    "<img src=\"pics/ETL.png\" alt=\"SQLLite screenshot\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### A very non-systematic review/recap\n",
    "\n",
    "You are expected to have already taken introductory courses \n",
    "- SC 1015: Introduction to Data Science & AI\n",
    "- SC 2207: Introduction to Database Systems"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Some useful references for beginners/catching-up:\n",
    "\n",
    "https://www.sqlitetutorial.net/\n",
    "\n",
    "https://pandas.pydata.org/pandas-docs/stable/getting_started/tutorials.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### <b>SQLite Browser</b>\n",
    "\n",
    "https://sqlitebrowser.org/\n",
    "\n",
    "<img src=\"pics/SQLiteScreenShot.png\" alt=\"SQLLite screenshot\" width=\"600\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "This notebook is accompanied with <a href=\"CountriesDB.db\">CountriesDB.db</a> file, which comprises of three tables. The tables correspond to the <a href=\"https://www.kaggle.com/fernandol/countries-of-the-world\">Countries of the world</a> and <a href=\"https://www.kaggle.com/codingelements/gapminder\">gapminder</a> data obtained from <a href=\"https://www.kaggle.com/\">Kaggle</a>, and the <a href=\"https://en.wikipedia.org/wiki/2016_Summer_Olympics_medal_table\">Olympics 2016 medals data obtained from Wikipedia</a>. \n",
    "\n",
    "You may want to install and use the <b><a href=\"https://sqlitebrowser.org/\">SQLite Browser</a></b> to check out the data in the .db file (as shown in the previous panel). You may also want to check out the source files indicated above. \n",
    "\n",
    "It is easy to import the .csv (and other similar formats, such as .tsv) directly using the graphical interface of SQLite Browser. Nevertheless, we will next explore how to do so, and other actions such as manipulate and query the data from the Jupyter programming environment. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Import modules\n",
    "\n",
    "Let's start with importing necessary modules. We will also check the environment/version/etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Python version: 3.8.5 (default, Sep  3 2020, 21:29:08) [MSC v.1916 64 bit (AMD64)]\n",
      "Numpy version: 1.19.2\n",
      "Pandas version: 1.1.3\n",
      "SQLite version: 2.6.0\n"
     ]
    }
   ],
   "source": [
    "# Housekeeping: We will first carry out the imports and print version numbers\n",
    "import sys\n",
    "print(\"Python version: \" +str(sys.version))\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "print(\"Numpy version: \" +str(np.version.version))\n",
    "print(\"Pandas version: \" +str(pd.__version__))\n",
    "\n",
    "#import xml.etree.ElementTree as ET\n",
    "import sqlite3\n",
    "print(\"SQLite version: \" + str(sqlite3.version))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "jupyter core     : 4.6.3\n",
      "jupyter-notebook : 6.1.4\n",
      "qtconsole        : 4.7.7\n",
      "ipython          : 7.19.0\n",
      "ipykernel        : 5.3.4\n",
      "jupyter client   : 6.1.7\n",
      "jupyter lab      : 3.0.16\n",
      "nbconvert        : 6.0.7\n",
      "ipywidgets       : 7.5.1\n",
      "nbformat         : 5.0.8\n",
      "traitlets        : 5.0.5\n"
     ]
    }
   ],
   "source": [
    "! jupyter --version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "          country continent  year  lifeExp       pop   gdpPercap\n",
      "0     Afghanistan      Asia  1952   28.801   8425333  779.445314\n",
      "1     Afghanistan      Asia  1957   30.332   9240934  820.853030\n",
      "2     Afghanistan      Asia  1962   31.997  10267083  853.100710\n",
      "3     Afghanistan      Asia  1967   34.020  11537966  836.197138\n",
      "4     Afghanistan      Asia  1972   36.088  13079460  739.981106\n",
      "...           ...       ...   ...      ...       ...         ...\n",
      "1699     Zimbabwe    Africa  1987   62.351   9216418  706.157306\n",
      "1700     Zimbabwe    Africa  1992   60.377  10704340  693.420786\n",
      "1701     Zimbabwe    Africa  1997   46.809  11404948  792.449960\n",
      "1702     Zimbabwe    Africa  2002   39.989  11926563  672.038623\n",
      "1703     Zimbabwe    Africa  2007   43.487  12311143  469.709298\n",
      "\n",
      "[1704 rows x 6 columns]\n",
      "['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap']\n",
      "        country continent  year  lifeExp       pop   gdpPercap\n",
      "0   Afghanistan      Asia  1952   28.801   8425333  779.445314\n",
      "1   Afghanistan      Asia  1957   30.332   9240934  820.853030\n",
      "2   Afghanistan      Asia  1962   31.997  10267083  853.100710\n",
      "3   Afghanistan      Asia  1967   34.020  11537966  836.197138\n",
      "4   Afghanistan      Asia  1972   36.088  13079460  739.981106\n",
      "5   Afghanistan      Asia  1977   38.438  14880372  786.113360\n",
      "6   Afghanistan      Asia  1982   39.854  12881816  978.011439\n",
      "7   Afghanistan      Asia  1987   40.822  13867957  852.395945\n",
      "8   Afghanistan      Asia  1992   41.674  16317921  649.341395\n",
      "9   Afghanistan      Asia  1997   41.763  22227415  635.341351\n",
      "10  Afghanistan      Asia  2002   42.129  25268405  726.734055\n",
      "11  Afghanistan      Asia  2007   43.828  31889923  974.580338\n"
     ]
    }
   ],
   "source": [
    "# load data from .tsv file (as an alternate to loading from the DB file)\n",
    "\n",
    "gapminderdatapath ='data/gapminder/' # change this to adjust relative path\n",
    "gap_df = pd.read_csv(gapminderdatapath+'gapminder.tsv', sep='\\t')\n",
    "print(gap_df)\n",
    "print(gap_df.columns.to_list())\n",
    "print(gap_df[gap_df['country']=='Afghanistan'])\n",
    "country_names_series=gap_df['country'].drop_duplicates()\n",
    "country_names_array=gap_df['country'].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We begin by connecting to the CountriesDB.db database, using the <b>sqlite3.connect()</b> function. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "conn = sqlite3.connect('CountriesDB.db')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We then create a cursor object using the <b>cursor()</b> function. We determine the tables stored in the database, we can query (using a SELECT statement) a special table called the <b>sqlite_schema</b>. It can also be referenced as sqlite_master. \n",
    "\n",
    "To retrieve data after executing a SELECT statement, one can either treat the cursor as an iterator, call the cursor’s <b>fetchone()</b> method to retrieve a single matching row, or call <b>fetchall()</b> to get a list of the matching rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['CountriesOfTheWorld', 'gapminder', 'Olymp2016']\n"
     ]
    }
   ],
   "source": [
    "cursor = conn.cursor()\n",
    "cursor.execute(\"SELECT name FROM sqlite_schema WHERE type='table';\")\n",
    "db_tables= [x[0] for x in cursor.fetchall()] # we are storing this information in a list\n",
    "print(db_tables)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "If we want to determine the schema of one of these tables, say, the gapminder table, we may achieve it as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('CREATE TABLE \"gapminder\" (\\n\\t\"country\"\\tTEXT,\\n\\t\"continent\"\\tTEXT,\\n\\t\"year\"\\tINTEGER,\\n\\t\"lifeExp\"\\tREAL,\\n\\t\"pop\"\\tINTEGER,\\n\\t\"gdpPercap\"\\tREAL\\n)',)]\n"
     ]
    }
   ],
   "source": [
    "cursor.execute(\"SELECT sql FROM sqlite_master WHERE name = 'gapminder';\")\n",
    "print(cursor.fetchall())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Alternatively, we may also use the description method to determine the schema instead. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['country', 'continent', 'year', 'lifeExp', 'pop', 'gdpPercap']\n"
     ]
    }
   ],
   "source": [
    "data=cursor.execute(\"SELECT * FROM 'gapminder'\")\n",
    "data_descr=[x[0] for x in data.description]\n",
    "print(data_descr)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['Country', 'Region', 'Population', 'Area(sq.mi.)', 'Pop.Density(persq.mi.)', 'Coastline(coast/arearatio)', 'Netmigration', 'Infantmortality(per1000births)', 'GDP($percapita)', 'Literacy(%)', 'Phones(per1000)', 'Arable(%)', 'Crops(%)', 'Other(%)', 'Climate', 'Birthrate', 'Deathrate', 'Agriculture', 'Industry', 'Service']\n"
     ]
    }
   ],
   "source": [
    "# If you do not want to type the names, you can ofcourse \n",
    "# refer to the db_tables and iterate as needed.\n",
    "data=cursor.execute('SELECT * FROM ' +str(db_tables[0]))\n",
    "data_descr=[x[0] for x in data.description]\n",
    "print(data_descr)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "How do we check the data itself? Depending on the purpose, we can obtain it in different manners. We have already seen the most basic option, of carrying out a cursor.execute() followed by fetchone()/fetchall(). \n",
    "\n",
    "One can also instead just carry out a pandas read_sql_query, and store the result in a dataframe. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Region</th>\n",
       "      <th>Population</th>\n",
       "      <th>Area(sq.mi.)</th>\n",
       "      <th>Pop.Density(persq.mi.)</th>\n",
       "      <th>Coastline(coast/arearatio)</th>\n",
       "      <th>Netmigration</th>\n",
       "      <th>Infantmortality(per1000births)</th>\n",
       "      <th>GDP($percapita)</th>\n",
       "      <th>Literacy(%)</th>\n",
       "      <th>Phones(per1000)</th>\n",
       "      <th>Arable(%)</th>\n",
       "      <th>Crops(%)</th>\n",
       "      <th>Other(%)</th>\n",
       "      <th>Climate</th>\n",
       "      <th>Birthrate</th>\n",
       "      <th>Deathrate</th>\n",
       "      <th>Agriculture</th>\n",
       "      <th>Industry</th>\n",
       "      <th>Service</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>ASIA (EX. NEAR EAST)</td>\n",
       "      <td>31056997</td>\n",
       "      <td>647500</td>\n",
       "      <td>48,0</td>\n",
       "      <td>0,00</td>\n",
       "      <td>23,06</td>\n",
       "      <td>163,07</td>\n",
       "      <td>700.0</td>\n",
       "      <td>36,0</td>\n",
       "      <td>3,2</td>\n",
       "      <td>12,13</td>\n",
       "      <td>0,22</td>\n",
       "      <td>87,65</td>\n",
       "      <td>1</td>\n",
       "      <td>46,6</td>\n",
       "      <td>20,34</td>\n",
       "      <td>0,38</td>\n",
       "      <td>0,24</td>\n",
       "      <td>0,38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>EASTERN EUROPE</td>\n",
       "      <td>3581655</td>\n",
       "      <td>28748</td>\n",
       "      <td>124,6</td>\n",
       "      <td>1,26</td>\n",
       "      <td>-4,93</td>\n",
       "      <td>21,52</td>\n",
       "      <td>4500.0</td>\n",
       "      <td>86,5</td>\n",
       "      <td>71,2</td>\n",
       "      <td>21,09</td>\n",
       "      <td>4,42</td>\n",
       "      <td>74,49</td>\n",
       "      <td>3</td>\n",
       "      <td>15,11</td>\n",
       "      <td>5,22</td>\n",
       "      <td>0,232</td>\n",
       "      <td>0,188</td>\n",
       "      <td>0,579</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>NORTHERN AFRICA</td>\n",
       "      <td>32930091</td>\n",
       "      <td>2381740</td>\n",
       "      <td>13,8</td>\n",
       "      <td>0,04</td>\n",
       "      <td>-0,39</td>\n",
       "      <td>31</td>\n",
       "      <td>6000.0</td>\n",
       "      <td>70,0</td>\n",
       "      <td>78,1</td>\n",
       "      <td>3,22</td>\n",
       "      <td>0,25</td>\n",
       "      <td>96,53</td>\n",
       "      <td>1</td>\n",
       "      <td>17,14</td>\n",
       "      <td>4,61</td>\n",
       "      <td>0,101</td>\n",
       "      <td>0,6</td>\n",
       "      <td>0,298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>American Samoa</td>\n",
       "      <td>OCEANIA</td>\n",
       "      <td>57794</td>\n",
       "      <td>199</td>\n",
       "      <td>290,4</td>\n",
       "      <td>58,29</td>\n",
       "      <td>-20,71</td>\n",
       "      <td>9,27</td>\n",
       "      <td>8000.0</td>\n",
       "      <td>97,0</td>\n",
       "      <td>259,5</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "      <td>75</td>\n",
       "      <td>2</td>\n",
       "      <td>22,46</td>\n",
       "      <td>3,27</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>WESTERN EUROPE</td>\n",
       "      <td>71201</td>\n",
       "      <td>468</td>\n",
       "      <td>152,1</td>\n",
       "      <td>0,00</td>\n",
       "      <td>6,6</td>\n",
       "      <td>4,05</td>\n",
       "      <td>19000.0</td>\n",
       "      <td>100,0</td>\n",
       "      <td>497,2</td>\n",
       "      <td>2,22</td>\n",
       "      <td>0</td>\n",
       "      <td>97,78</td>\n",
       "      <td>3</td>\n",
       "      <td>8,71</td>\n",
       "      <td>6,25</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>222</th>\n",
       "      <td>West Bank</td>\n",
       "      <td>NEAR EAST</td>\n",
       "      <td>2460492</td>\n",
       "      <td>5860</td>\n",
       "      <td>419,9</td>\n",
       "      <td>0,00</td>\n",
       "      <td>2,98</td>\n",
       "      <td>19,62</td>\n",
       "      <td>800.0</td>\n",
       "      <td>None</td>\n",
       "      <td>145,2</td>\n",
       "      <td>16,9</td>\n",
       "      <td>18,97</td>\n",
       "      <td>64,13</td>\n",
       "      <td>3</td>\n",
       "      <td>31,67</td>\n",
       "      <td>3,92</td>\n",
       "      <td>0,09</td>\n",
       "      <td>0,28</td>\n",
       "      <td>0,63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>223</th>\n",
       "      <td>Western Sahara</td>\n",
       "      <td>NORTHERN AFRICA</td>\n",
       "      <td>273008</td>\n",
       "      <td>266000</td>\n",
       "      <td>1,0</td>\n",
       "      <td>0,42</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>0,02</td>\n",
       "      <td>0</td>\n",
       "      <td>99,98</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>0,4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>224</th>\n",
       "      <td>Yemen</td>\n",
       "      <td>NEAR EAST</td>\n",
       "      <td>21456188</td>\n",
       "      <td>527970</td>\n",
       "      <td>40,6</td>\n",
       "      <td>0,36</td>\n",
       "      <td>0</td>\n",
       "      <td>61,5</td>\n",
       "      <td>800.0</td>\n",
       "      <td>50,2</td>\n",
       "      <td>37,2</td>\n",
       "      <td>2,78</td>\n",
       "      <td>0,24</td>\n",
       "      <td>96,98</td>\n",
       "      <td>1</td>\n",
       "      <td>42,89</td>\n",
       "      <td>8,3</td>\n",
       "      <td>0,135</td>\n",
       "      <td>0,472</td>\n",
       "      <td>0,393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>225</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>SUB-SAHARAN AFRICA</td>\n",
       "      <td>11502010</td>\n",
       "      <td>752614</td>\n",
       "      <td>15,3</td>\n",
       "      <td>0,00</td>\n",
       "      <td>0</td>\n",
       "      <td>88,29</td>\n",
       "      <td>800.0</td>\n",
       "      <td>80,6</td>\n",
       "      <td>8,2</td>\n",
       "      <td>7,08</td>\n",
       "      <td>0,03</td>\n",
       "      <td>92,9</td>\n",
       "      <td>2</td>\n",
       "      <td>41</td>\n",
       "      <td>19,93</td>\n",
       "      <td>0,22</td>\n",
       "      <td>0,29</td>\n",
       "      <td>0,489</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>226</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>SUB-SAHARAN AFRICA</td>\n",
       "      <td>12236805</td>\n",
       "      <td>390580</td>\n",
       "      <td>31,3</td>\n",
       "      <td>0,00</td>\n",
       "      <td>0</td>\n",
       "      <td>67,69</td>\n",
       "      <td>1900.0</td>\n",
       "      <td>90,7</td>\n",
       "      <td>26,8</td>\n",
       "      <td>8,32</td>\n",
       "      <td>0,34</td>\n",
       "      <td>91,34</td>\n",
       "      <td>2</td>\n",
       "      <td>28,01</td>\n",
       "      <td>21,84</td>\n",
       "      <td>0,179</td>\n",
       "      <td>0,243</td>\n",
       "      <td>0,579</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>227 rows × 20 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Country                Region  Population  Area(sq.mi.)  \\\n",
       "0       Afghanistan  ASIA (EX. NEAR EAST)    31056997        647500   \n",
       "1           Albania        EASTERN EUROPE     3581655         28748   \n",
       "2           Algeria       NORTHERN AFRICA    32930091       2381740   \n",
       "3    American Samoa               OCEANIA       57794           199   \n",
       "4           Andorra        WESTERN EUROPE       71201           468   \n",
       "..              ...                   ...         ...           ...   \n",
       "222       West Bank             NEAR EAST     2460492          5860   \n",
       "223  Western Sahara       NORTHERN AFRICA      273008        266000   \n",
       "224           Yemen             NEAR EAST    21456188        527970   \n",
       "225          Zambia    SUB-SAHARAN AFRICA    11502010        752614   \n",
       "226        Zimbabwe    SUB-SAHARAN AFRICA    12236805        390580   \n",
       "\n",
       "    Pop.Density(persq.mi.) Coastline(coast/arearatio) Netmigration  \\\n",
       "0                     48,0                       0,00        23,06   \n",
       "1                    124,6                       1,26        -4,93   \n",
       "2                     13,8                       0,04        -0,39   \n",
       "3                    290,4                      58,29       -20,71   \n",
       "4                    152,1                       0,00          6,6   \n",
       "..                     ...                        ...          ...   \n",
       "222                  419,9                       0,00         2,98   \n",
       "223                    1,0                       0,42         None   \n",
       "224                   40,6                       0,36            0   \n",
       "225                   15,3                       0,00            0   \n",
       "226                   31,3                       0,00            0   \n",
       "\n",
       "    Infantmortality(per1000births)  GDP($percapita) Literacy(%)  \\\n",
       "0                           163,07            700.0        36,0   \n",
       "1                            21,52           4500.0        86,5   \n",
       "2                               31           6000.0        70,0   \n",
       "3                             9,27           8000.0        97,0   \n",
       "4                             4,05          19000.0       100,0   \n",
       "..                             ...              ...         ...   \n",
       "222                          19,62            800.0        None   \n",
       "223                           None              NaN        None   \n",
       "224                           61,5            800.0        50,2   \n",
       "225                          88,29            800.0        80,6   \n",
       "226                          67,69           1900.0        90,7   \n",
       "\n",
       "    Phones(per1000) Arable(%) Crops(%) Other(%) Climate Birthrate Deathrate  \\\n",
       "0               3,2     12,13     0,22    87,65       1      46,6     20,34   \n",
       "1              71,2     21,09     4,42    74,49       3     15,11      5,22   \n",
       "2              78,1      3,22     0,25    96,53       1     17,14      4,61   \n",
       "3             259,5        10       15       75       2     22,46      3,27   \n",
       "4             497,2      2,22        0    97,78       3      8,71      6,25   \n",
       "..              ...       ...      ...      ...     ...       ...       ...   \n",
       "222           145,2      16,9    18,97    64,13       3     31,67      3,92   \n",
       "223            None      0,02        0    99,98       1      None      None   \n",
       "224            37,2      2,78     0,24    96,98       1     42,89       8,3   \n",
       "225             8,2      7,08     0,03     92,9       2        41     19,93   \n",
       "226            26,8      8,32     0,34    91,34       2     28,01     21,84   \n",
       "\n",
       "    Agriculture Industry Service  \n",
       "0          0,38     0,24    0,38  \n",
       "1         0,232    0,188   0,579  \n",
       "2         0,101      0,6   0,298  \n",
       "3          None     None    None  \n",
       "4          None     None    None  \n",
       "..          ...      ...     ...  \n",
       "222        0,09     0,28    0,63  \n",
       "223        None     None     0,4  \n",
       "224       0,135    0,472   0,393  \n",
       "225        0,22     0,29   0,489  \n",
       "226       0,179    0,243   0,579  \n",
       "\n",
       "[227 rows x 20 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Countries_df = pd.read_sql_query(\"SELECT * FROM CountriesOfTheWorld\", conn) # we need to provide the DB connection information\n",
    "Countries_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We will next create the Olymp2016 table using data from <a href=\"https://en.wikipedia.org/wiki/2016_Summer_Olympics\">Wikipedia</a> for the Rio 2016 Olympics.\n",
    "\n",
    "<img src=\"pics/olymp2016mascots.png\" alt=\"socially distanced mascots\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Let's start by reading the data from the Wikipedia page. The read_html() method works neatly (assuming that the HTML source file itself is constructed neatly, otherwise, it may get miserable!). The read_html() obtains several tables. After inspection of the results, it turned out that the table indexed '2' is the medal tally table for the specific Wiki-page (as on 9th August 2021). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rank</th>\n",
       "      <th>NOC</th>\n",
       "      <th>Gold</th>\n",
       "      <th>Silver</th>\n",
       "      <th>Bronze</th>\n",
       "      <th>Total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>United States (USA)</td>\n",
       "      <td>46</td>\n",
       "      <td>37</td>\n",
       "      <td>38</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Great Britain (GBR)</td>\n",
       "      <td>27</td>\n",
       "      <td>23</td>\n",
       "      <td>17</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>China (CHN)</td>\n",
       "      <td>26</td>\n",
       "      <td>18</td>\n",
       "      <td>26</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Russia (RUS)</td>\n",
       "      <td>19</td>\n",
       "      <td>17</td>\n",
       "      <td>20</td>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Germany (GER)</td>\n",
       "      <td>17</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "      <td>42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>78</td>\n",
       "      <td>Nigeria (NGR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>78</td>\n",
       "      <td>Portugal (POR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>78</td>\n",
       "      <td>Trinidad and Tobago (TTO)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>78</td>\n",
       "      <td>United Arab Emirates (UAE)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>Totals (86 NOCs)</td>\n",
       "      <td>Totals (86 NOCs)</td>\n",
       "      <td>307</td>\n",
       "      <td>307</td>\n",
       "      <td>359</td>\n",
       "      <td>973</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>87 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                Rank                         NOC  Gold  Silver  Bronze  Total\n",
       "0                  1         United States (USA)    46      37      38    121\n",
       "1                  2         Great Britain (GBR)    27      23      17     67\n",
       "2                  3                 China (CHN)    26      18      26     70\n",
       "3                  4                Russia (RUS)    19      17      20     56\n",
       "4                  5               Germany (GER)    17      10      15     42\n",
       "..               ...                         ...   ...     ...     ...    ...\n",
       "82                78               Nigeria (NGR)     0       0       1      1\n",
       "83                78              Portugal (POR)     0       0       1      1\n",
       "84                78   Trinidad and Tobago (TTO)     0       0       1      1\n",
       "85                78  United Arab Emirates (UAE)     0       0       1      1\n",
       "86  Totals (86 NOCs)            Totals (86 NOCs)   307     307     359    973\n",
       "\n",
       "[87 rows x 6 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "olymp_df=pd.read_html(r'https://en.wikipedia.org/wiki/2016_Summer_Olympics_medal_table')\n",
    "olymp2016medals=olymp_df[2]\n",
    "olymp2016medals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We do not want to populate the database with a derived information of Totals, and wish to eliminate the last line. Let's try that!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rank</th>\n",
       "      <th>NOC</th>\n",
       "      <th>Gold</th>\n",
       "      <th>Silver</th>\n",
       "      <th>Bronze</th>\n",
       "      <th>Total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>United States (USA)</td>\n",
       "      <td>46</td>\n",
       "      <td>37</td>\n",
       "      <td>38</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Great Britain (GBR)</td>\n",
       "      <td>27</td>\n",
       "      <td>23</td>\n",
       "      <td>17</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>China (CHN)</td>\n",
       "      <td>26</td>\n",
       "      <td>18</td>\n",
       "      <td>26</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Russia (RUS)</td>\n",
       "      <td>19</td>\n",
       "      <td>17</td>\n",
       "      <td>20</td>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Germany (GER)</td>\n",
       "      <td>17</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "      <td>42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>78</td>\n",
       "      <td>Morocco (MAR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>78</td>\n",
       "      <td>Nigeria (NGR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>78</td>\n",
       "      <td>Portugal (POR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>78</td>\n",
       "      <td>Trinidad and Tobago (TTO)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>78</td>\n",
       "      <td>United Arab Emirates (UAE)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>86 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Rank                         NOC  Gold  Silver  Bronze  Total\n",
       "0     1         United States (USA)    46      37      38    121\n",
       "1     2         Great Britain (GBR)    27      23      17     67\n",
       "2     3                 China (CHN)    26      18      26     70\n",
       "3     4                Russia (RUS)    19      17      20     56\n",
       "4     5               Germany (GER)    17      10      15     42\n",
       "..  ...                         ...   ...     ...     ...    ...\n",
       "81   78               Morocco (MAR)     0       0       1      1\n",
       "82   78               Nigeria (NGR)     0       0       1      1\n",
       "83   78              Portugal (POR)     0       0       1      1\n",
       "84   78   Trinidad and Tobago (TTO)     0       0       1      1\n",
       "85   78  United Arab Emirates (UAE)     0       0       1      1\n",
       "\n",
       "[86 rows x 6 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "olymp2016medals=olymp_df[2][:-1]\n",
    "olymp2016medals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "NOC, standing for <b>National Olympic Committees</b> may become difficult to recall later on. Let's rename the column as <b>Country</b> instead."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rank</th>\n",
       "      <th>Country</th>\n",
       "      <th>Gold</th>\n",
       "      <th>Silver</th>\n",
       "      <th>Bronze</th>\n",
       "      <th>Total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>United States (USA)</td>\n",
       "      <td>46</td>\n",
       "      <td>37</td>\n",
       "      <td>38</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Great Britain (GBR)</td>\n",
       "      <td>27</td>\n",
       "      <td>23</td>\n",
       "      <td>17</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>China (CHN)</td>\n",
       "      <td>26</td>\n",
       "      <td>18</td>\n",
       "      <td>26</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Russia (RUS)</td>\n",
       "      <td>19</td>\n",
       "      <td>17</td>\n",
       "      <td>20</td>\n",
       "      <td>56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Germany (GER)</td>\n",
       "      <td>17</td>\n",
       "      <td>10</td>\n",
       "      <td>15</td>\n",
       "      <td>42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>78</td>\n",
       "      <td>Morocco (MAR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>78</td>\n",
       "      <td>Nigeria (NGR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>78</td>\n",
       "      <td>Portugal (POR)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>78</td>\n",
       "      <td>Trinidad and Tobago (TTO)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>78</td>\n",
       "      <td>United Arab Emirates (UAE)</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>86 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   Rank                     Country  Gold  Silver  Bronze  Total\n",
       "0     1         United States (USA)    46      37      38    121\n",
       "1     2         Great Britain (GBR)    27      23      17     67\n",
       "2     3                 China (CHN)    26      18      26     70\n",
       "3     4                Russia (RUS)    19      17      20     56\n",
       "4     5               Germany (GER)    17      10      15     42\n",
       "..  ...                         ...   ...     ...     ...    ...\n",
       "81   78               Morocco (MAR)     0       0       1      1\n",
       "82   78               Nigeria (NGR)     0       0       1      1\n",
       "83   78              Portugal (POR)     0       0       1      1\n",
       "84   78   Trinidad and Tobago (TTO)     0       0       1      1\n",
       "85   78  United Arab Emirates (UAE)     0       0       1      1\n",
       "\n",
       "[86 rows x 6 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "olymp2016medals = olymp2016medals.rename(columns={'NOC':'Country'})\n",
    "olymp2016medals"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "Let's check the data types."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Rank       object\n",
       "Country    object\n",
       "Gold        int64\n",
       "Silver      int64\n",
       "Bronze      int64\n",
       "Total       int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "olymp2016medals.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "How about we designate rank explicitly as integer? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Rank        int64\n",
       "Country    object\n",
       "Gold        int64\n",
       "Silver      int64\n",
       "Bronze      int64\n",
       "Total       int64\n",
       "dtype: object"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "olymp2016medals=olymp2016medals.astype({'Rank': 'int64'}, copy=True)\n",
    "olymp2016medals.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We are almost there. However, since I have populated the database with this data previously, and (may) reuse this teaching material, it is possible that the table already exists in the DB! As such, lets do a conditional deletion of the table, in case it exists, using a DROP operation. And now, you know how to DROP a table from your DB!\n",
    "\n",
    "<img src=\"pics/BobbyTables.png\" alt=\"SQLLite screenshot\" width=\"500\"/>\n",
    "\n",
    "Once that is done, let's add the Table using Panda's to_sql() function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "# Drop table from DB conditionally, if it already existed.\n",
    "if \"Olymp2016\" in db_tables:\n",
    "    cursor.execute('''DROP TABLE Olymp2016''')\n",
    "    \n",
    "# Add table to DB    \n",
    "olymp2016medals.to_sql('Olymp2016', conn, if_exists='replace', index = False)\n",
    "# With the use of if_exists option, we could in fact get rid of the explicit conditional DROP TABLE op. \n",
    "# other options for if_exists: fail, append (default: fail)\n",
    "conn.commit()  # Commit the changes to be sure.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rank</th>\n",
       "      <th>Country</th>\n",
       "      <th>Gold</th>\n",
       "      <th>Silver</th>\n",
       "      <th>Bronze</th>\n",
       "      <th>Total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>81</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>83</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>84</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>86 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    Rank  Country  Gold  Silver  Bronze  Total\n",
       "0   True     True  True    True    True   True\n",
       "1   True     True  True    True    True   True\n",
       "2   True     True  True    True    True   True\n",
       "3   True     True  True    True    True   True\n",
       "4   True     True  True    True    True   True\n",
       "..   ...      ...   ...     ...     ...    ...\n",
       "81  True     True  True    True    True   True\n",
       "82  True     True  True    True    True   True\n",
       "83  True     True  True    True    True   True\n",
       "84  True     True  True    True    True   True\n",
       "85  True     True  True    True    True   True\n",
       "\n",
       "[86 rows x 6 columns]"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Sanity check code: Is the data the same?\n",
    "\n",
    "Olympics_df=pd.read_sql_query(\"SELECT * FROM Olymp2016\", conn)\n",
    "olymp2016medals == Olympics_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We are all good! The table entries have been added to the DB. We can now start querying the data. For example, if we want to know which all countries won more Gold medals than Silver and Bronze medals combined, we can get that easy-peasy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Rank</th>\n",
       "      <th>Country</th>\n",
       "      <th>Gold</th>\n",
       "      <th>Silver</th>\n",
       "      <th>Bronze</th>\n",
       "      <th>Total</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12</td>\n",
       "      <td>Hungary (HUN)</td>\n",
       "      <td>8</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>16</td>\n",
       "      <td>Jamaica (JAM)</td>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>27</td>\n",
       "      <td>Argentina (ARG)</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>54</td>\n",
       "      <td>Fiji (FIJ)</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>54</td>\n",
       "      <td>Jordan (JOR)</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>54</td>\n",
       "      <td>Kosovo (KOS)</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>54</td>\n",
       "      <td>Puerto Rico (PUR)</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>54</td>\n",
       "      <td>Singapore (SIN)</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>54</td>\n",
       "      <td>Tajikistan (TJK)</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Rank            Country  Gold  Silver  Bronze  Total\n",
       "0    12      Hungary (HUN)     8       3       4     15\n",
       "1    16      Jamaica (JAM)     6       3       2     11\n",
       "2    27    Argentina (ARG)     3       1       0      4\n",
       "3    54         Fiji (FIJ)     1       0       0      1\n",
       "4    54       Jordan (JOR)     1       0       0      1\n",
       "5    54       Kosovo (KOS)     1       0       0      1\n",
       "6    54  Puerto Rico (PUR)     1       0       0      1\n",
       "7    54    Singapore (SIN)     1       0       0      1\n",
       "8    54   Tajikistan (TJK)     1       0       0      1"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "MoreGoldThanSilver_df= pd.read_sql_query(\"SELECT * FROM Olymp2016 WHERE Gold>Silver+Bronze\", conn)\n",
    "MoreGoldThanSilver_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "SQLite does not implement SQL fully, e.g., it does not support RIGHT JOIN and FULL OUTER JOIN (though it can be emulated, e.g., <a href=\"https://www.sqlitetutorial.net/sqlite-full-outer-join/\">see here for details</a>). But it does support a lot of features, including INNER JOIN. \n",
    "\n",
    "Here's an example of how we can create a summary of countries with their population as per the CountriesOfTheWorld table, choosing the countries where the life expectancy reported in 21st century has been less than 50 years, along with the year of the record, as per the gapminder table. \n",
    "\n",
    "To make the code readable (and reusable?), we have written the query separately first, before invoking the actual read_sql_query() function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Country</th>\n",
       "      <th>Population</th>\n",
       "      <th>year</th>\n",
       "      <th>lifeExp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>31056997</td>\n",
       "      <td>2002</td>\n",
       "      <td>42.129</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>31056997</td>\n",
       "      <td>2007</td>\n",
       "      <td>43.828</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Angola</td>\n",
       "      <td>12127071</td>\n",
       "      <td>2002</td>\n",
       "      <td>41.003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Angola</td>\n",
       "      <td>12127071</td>\n",
       "      <td>2007</td>\n",
       "      <td>42.731</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Botswana</td>\n",
       "      <td>1639833</td>\n",
       "      <td>2002</td>\n",
       "      <td>46.634</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Burundi</td>\n",
       "      <td>8090068</td>\n",
       "      <td>2002</td>\n",
       "      <td>47.360</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Burundi</td>\n",
       "      <td>8090068</td>\n",
       "      <td>2007</td>\n",
       "      <td>49.580</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Cameroon</td>\n",
       "      <td>17340702</td>\n",
       "      <td>2002</td>\n",
       "      <td>49.856</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Congo, Dem. Rep.</td>\n",
       "      <td>62660551</td>\n",
       "      <td>2002</td>\n",
       "      <td>44.966</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Congo, Dem. Rep.</td>\n",
       "      <td>62660551</td>\n",
       "      <td>2007</td>\n",
       "      <td>46.462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Cote d'Ivoire</td>\n",
       "      <td>17654843</td>\n",
       "      <td>2002</td>\n",
       "      <td>46.832</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Cote d'Ivoire</td>\n",
       "      <td>17654843</td>\n",
       "      <td>2007</td>\n",
       "      <td>48.328</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Equatorial Guinea</td>\n",
       "      <td>540109</td>\n",
       "      <td>2002</td>\n",
       "      <td>49.348</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Guinea-Bissau</td>\n",
       "      <td>1442029</td>\n",
       "      <td>2002</td>\n",
       "      <td>45.504</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Guinea-Bissau</td>\n",
       "      <td>1442029</td>\n",
       "      <td>2007</td>\n",
       "      <td>46.388</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Lesotho</td>\n",
       "      <td>2022331</td>\n",
       "      <td>2002</td>\n",
       "      <td>44.593</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Lesotho</td>\n",
       "      <td>2022331</td>\n",
       "      <td>2007</td>\n",
       "      <td>42.592</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Liberia</td>\n",
       "      <td>3042004</td>\n",
       "      <td>2002</td>\n",
       "      <td>43.753</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Liberia</td>\n",
       "      <td>3042004</td>\n",
       "      <td>2007</td>\n",
       "      <td>45.678</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Malawi</td>\n",
       "      <td>13013926</td>\n",
       "      <td>2002</td>\n",
       "      <td>45.009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Malawi</td>\n",
       "      <td>13013926</td>\n",
       "      <td>2007</td>\n",
       "      <td>48.303</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Mozambique</td>\n",
       "      <td>19686505</td>\n",
       "      <td>2002</td>\n",
       "      <td>44.026</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Mozambique</td>\n",
       "      <td>19686505</td>\n",
       "      <td>2007</td>\n",
       "      <td>42.082</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Nigeria</td>\n",
       "      <td>131859731</td>\n",
       "      <td>2002</td>\n",
       "      <td>46.608</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Nigeria</td>\n",
       "      <td>131859731</td>\n",
       "      <td>2007</td>\n",
       "      <td>46.859</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>8648248</td>\n",
       "      <td>2002</td>\n",
       "      <td>43.413</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Rwanda</td>\n",
       "      <td>8648248</td>\n",
       "      <td>2007</td>\n",
       "      <td>46.242</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Sierra Leone</td>\n",
       "      <td>6005250</td>\n",
       "      <td>2002</td>\n",
       "      <td>41.012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Sierra Leone</td>\n",
       "      <td>6005250</td>\n",
       "      <td>2007</td>\n",
       "      <td>42.568</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Somalia</td>\n",
       "      <td>8863338</td>\n",
       "      <td>2002</td>\n",
       "      <td>45.936</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Somalia</td>\n",
       "      <td>8863338</td>\n",
       "      <td>2007</td>\n",
       "      <td>48.159</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>South Africa</td>\n",
       "      <td>44187637</td>\n",
       "      <td>2007</td>\n",
       "      <td>49.339</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>Swaziland</td>\n",
       "      <td>1136334</td>\n",
       "      <td>2002</td>\n",
       "      <td>43.869</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>Swaziland</td>\n",
       "      <td>1136334</td>\n",
       "      <td>2007</td>\n",
       "      <td>39.613</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>Tanzania</td>\n",
       "      <td>37445392</td>\n",
       "      <td>2002</td>\n",
       "      <td>49.651</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>Uganda</td>\n",
       "      <td>28195754</td>\n",
       "      <td>2002</td>\n",
       "      <td>47.813</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>11502010</td>\n",
       "      <td>2002</td>\n",
       "      <td>39.193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>11502010</td>\n",
       "      <td>2007</td>\n",
       "      <td>42.384</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>12236805</td>\n",
       "      <td>2002</td>\n",
       "      <td>39.989</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>12236805</td>\n",
       "      <td>2007</td>\n",
       "      <td>43.487</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              Country  Population  year  lifeExp\n",
       "0         Afghanistan    31056997  2002   42.129\n",
       "1         Afghanistan    31056997  2007   43.828\n",
       "2              Angola    12127071  2002   41.003\n",
       "3              Angola    12127071  2007   42.731\n",
       "4            Botswana     1639833  2002   46.634\n",
       "5             Burundi     8090068  2002   47.360\n",
       "6             Burundi     8090068  2007   49.580\n",
       "7            Cameroon    17340702  2002   49.856\n",
       "8    Congo, Dem. Rep.    62660551  2002   44.966\n",
       "9    Congo, Dem. Rep.    62660551  2007   46.462\n",
       "10      Cote d'Ivoire    17654843  2002   46.832\n",
       "11      Cote d'Ivoire    17654843  2007   48.328\n",
       "12  Equatorial Guinea      540109  2002   49.348\n",
       "13      Guinea-Bissau     1442029  2002   45.504\n",
       "14      Guinea-Bissau     1442029  2007   46.388\n",
       "15            Lesotho     2022331  2002   44.593\n",
       "16            Lesotho     2022331  2007   42.592\n",
       "17            Liberia     3042004  2002   43.753\n",
       "18            Liberia     3042004  2007   45.678\n",
       "19             Malawi    13013926  2002   45.009\n",
       "20             Malawi    13013926  2007   48.303\n",
       "21         Mozambique    19686505  2002   44.026\n",
       "22         Mozambique    19686505  2007   42.082\n",
       "23            Nigeria   131859731  2002   46.608\n",
       "24            Nigeria   131859731  2007   46.859\n",
       "25             Rwanda     8648248  2002   43.413\n",
       "26             Rwanda     8648248  2007   46.242\n",
       "27       Sierra Leone     6005250  2002   41.012\n",
       "28       Sierra Leone     6005250  2007   42.568\n",
       "29            Somalia     8863338  2002   45.936\n",
       "30            Somalia     8863338  2007   48.159\n",
       "31       South Africa    44187637  2007   49.339\n",
       "32          Swaziland     1136334  2002   43.869\n",
       "33          Swaziland     1136334  2007   39.613\n",
       "34           Tanzania    37445392  2002   49.651\n",
       "35             Uganda    28195754  2002   47.813\n",
       "36             Zambia    11502010  2002   39.193\n",
       "37             Zambia    11502010  2007   42.384\n",
       "38           Zimbabwe    12236805  2002   39.989\n",
       "39           Zimbabwe    12236805  2007   43.487"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "SQL_quer = \"\"\"\n",
    "    SELECT CountriesOfTheWorld.Country, CountriesOfTheWorld.Population, year, lifeExp\n",
    "    FROM CountriesOfTheWorld\n",
    "    INNER JOIN gapminder\n",
    "    ON CountriesOfTheWorld.Country =gapminder.country\n",
    "    WHERE year>1999 and lifeExp <50;\n",
    "\"\"\"\n",
    "\n",
    "Quer_res = pd.read_sql_query(SQL_quer, conn)\n",
    "Quer_res\n",
    "#conn.close() when you are done, and want to close the connection to the DB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "#### Ungraded tasks\n",
    "Now, try a few things out yourselves. Following are some ideas to try. But think for yourself also, what other interesting information do you want to determine from these three tables!\n",
    "\n",
    "If your Python/Pandas skills are good, you will be tempted to import the data as Pandas dataframes and then operate on those dataframes. But, let's do that some other time (next session, in fact!). Instead, try to see what all you can do purely with SQL(ite). However, you may use Python to create scripts invoking the necessary SQL commands to carry out any necessary 'data cleaning or manipulation' operations.\n",
    "\n",
    "<b>Ungraded Task 1.1:</b> Determine all the countries with a population of more than 20 million as per the CountriesOfTheWorld table, that got no gold medals. \n",
    "\n",
    "<b>Ungraded Task 1.2:</b> Determine the ratio of the number of total medals to the per capita GDP for each country as per the CountriesOfTheWorld table.\n",
    "\n",
    "If and when you manage to solve these, share and compare your solution with others. Challenge each other with other such queries! Contemplate on what difficulties you encountered in answering these questions. \n",
    "\n",
    "Later in the course, we will delve into data wrangling, and use Python/Pandas to solve such questions. You could then compare the pros and cons with your experience. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "<p style=\"font-size:134%;color:Deep Teal;\">That's it folks!</p>\n",
    "\n",
    "<img src=\"pics/ETL.png\" alt=\"SQLLite screenshot\" width=\"500\"/>"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}