{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Some basic statistical concepts and tools \n", "

\n", "
\n", "

SC 4125: Developing Data Products

\n", "

Module-7: Statistical toolkit


\n", "\n", " \n", "
\n", "by Anwitaman DATTA
\n", "School of Computer Science and Engineering, NTU Singapore. \n", "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Teaching material\n", "- .html deck of slides\n", "- .ipynb Jupyter notebook" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Disclaimer/Caveat emptor\n", "\n", "- Non-systematic and non-exhaustive review\n", "- Illustrative approaches are not necessarily the most efficient or elegant, let alone unique" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Acknowledgement & Disclaimer\n", "\n", "> The main narrative of this module is based on the first three chapters of the book Practical Statistics for Data Scientists by Bruce et al. \n", ">\n", ">Data and code in this module are also copied & adapted from the github resources accompanying the book, following *fair use* permission provided by the authors and publisher as per in the book's preface. \n", ">\n", "> Few other online sources and images have also been used to prepare the material in this module. Original sources have been attributed and acknowledged to the best of my abilities. Should anything in the material need to be changed or redacted, the copyright owners are requested to contact me at anwitaman@ntu.edu.sg\n", "
\n", "\"Big
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Module outline\n", "\n", "> Bare basics\n", "\n", "> Sampling\n", "\n", "> Statistical experiments & significance" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "# Library imports & data directory path \n", "import pandas as pd\n", "import numpy as np\n", "from scipy.stats import trim_mean\n", "from statsmodels import robust\n", "#!pip install wquantiles\n", "import wquantiles\n", "\n", "import seaborn as sns\n", "import matplotlib.pylab as plt\n", "import random\n", "practicalstatspath ='data/practical-stats/' # change this to adjust relative path" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Bare basics\n", "\n", "
\"Big
\n", "\n", "Image source: XKCD" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Estimates of location\n", "\n", "location: Where is the data (in a possibly multi-dimensional space)? Its typical value, i.e., its central tendency " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "> mean" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "> weighted mean: $\\bar{x_w}=\\frac{\\sum_{i=1}^n{w_i x_i}}{\\sum_{i=1}^n{w_i}}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "> trimmed mean: $\\bar{x}=\\frac{\\sum_{i=p+1}^{n-p}{x_{(i)}}}{n-2p}$ where $x_{(i)}$ is the _i_-th largest value, _p_ is the trimming parameter. \n", ">> Robust estimate: eliminates influence of extreme values, i.e., outliers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "> median" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "> percentile" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Example: US states murder rates" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StatePopulationMurder.RateAbbreviation
0Alabama47797365.7AL
1Alaska7102315.6AK
2Arizona63920174.7AZ
3Arkansas29159185.6AR
4California372539564.4CA
\n", "
" ], "text/plain": [ " State Population Murder.Rate Abbreviation\n", "0 Alabama 4779736 5.7 AL\n", "1 Alaska 710231 5.6 AK\n", "2 Arizona 6392017 4.7 AZ\n", "3 Arkansas 2915918 5.6 AR\n", "4 California 37253956 4.4 CA" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "state_df = pd.read_csv(practicalstatspath+'state.csv')\n", "state_df.head()" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mean: 6162876.3\n", "Median: 4436369.5\n", "Trimmed Mean: 4783697.125\n" ] } ], "source": [ "print('Mean: '+str(state_df['Population'].mean()))\n", "print('Median: '+str(state_df['Population'].median()))\n", "print('Trimmed Mean: '+str(trim_mean(state_df['Population'], 0.1))) # from scipy.stats" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "How about national mean and median murder rates? Try to compute that yourselves!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Estimates of variability\n", "\n", "variability: Whether and how clustered or dispersed is the data?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- variance: $s^2=\\frac{\\sum_i^n(x_i-\\bar{x})^2}{n-1}$ where $\\bar{x}$ is the mean\n", " * division by n-1 to create a unbiased estimate (since there are n-1 degrees of freedom, given $\\bar{x}$)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- standard deviation: $s=\\sqrt{\\frac{\\sum_i^n(x_i-\\bar{x})^2}{n-1}}$\n", " * has the same scale as the original data" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- some other measures:\n", " * mean absolute deviation\n", " * mean absolute deviation from the median (MAD)\n", " * percentiles & interquartile range (IQR)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Std. Dev.: 6848235.347401142\n", "IQR: 4847308.0\n", "MAD: 3849876.1459979336\n" ] } ], "source": [ "print('Std. Dev.: '+str(state_df['Population'].std())) # standard deviation\n", "print('IQR: '+str(state_df['Population'].quantile(0.75) - state_df['Population'].quantile(0.25))) # IQR\n", "print('MAD: '+str(robust.scale.mad(state_df['Population']))) # MAD computed using a method from statsmodels library" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Visualizing the deviation/distribution of data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAANAAAAEYCAYAAAAtaHgZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAU00lEQVR4nO3df5BdZX3H8feHJRoKCYgEGvmViEy7ySpoVtSyVYPVQauAKGim+INshU4lVWEw0Z2O0E5GAoiF1KqhCUY0W6mKIIJK2UW8VcFEfgVXBsUEo5gErCQoSBK+/eOchU3Ye+7ZPXv27tn7ec2cufc+e8+538zkO+c5z3nO81VEYGajs1ezAzCrMieQWQFOILMCnEBmBTiBzArYu9kB5HHQQQfFrFmzmh2GtbB169Y9EhEz9myvRALNmjWLtWvXNjsMa2GSNg7X7i6cWQFOILMCnEBmBTiBzApwApkV4AQyK8AJZFaAE2gS6+3tpaOjg7a2Njo6Oujt7W12SJNOJW6k2sj19vbS09PDypUr6erqolar0d3dDcCCBQuaHN0kEhETfps3b17YyMydOzf6+vp2a+vr64u5c+c2KaJqA9bGMP83FRV4IrWzszM8lWdk2traePLJJ5kyZcozbTt27GDq1Kns2rWriZFVk6R1EdG5Z7uvgSap9vZ2arXabm21Wo329vYmRTQ5OYEmqZ6eHrq7u+nv72fHjh309/fT3d1NT09Ps0ObVDyIMEkNDhQsWrSIgYEB2tvbWbp0qQcQxpivgcxy8DWQWQmcQGYFOIHMCnACmRXgBDIrwAlkVoATyKyA0hJI0lRJd0i6W9J9ki5M2y+Q9GtJd6XbW8qKwaxsZc5E+BNwQkQ8LmkKUJN0U/q3T0fEpSX+ttm4KC2B0ingj6cfp6TbxJ/2YDYCpV4DSWqTdBewBbg5Im5P/3SOpHskrZL0gjr7niVpraS1W7duLTNMs1ErNYEiYldEHAscBhwnqQP4LHAUcCzwMPCpOvuuiIjOiOicMeM5SxKbTQjjMgoXEb8HbgVOjIjNaWI9DVwJHDceMZiVocxRuBmSDkjf7wP8DfAzSTOHfO3twPqyYjArW5mjcDOB1ZLaSBL1moi4QdLVko4lGVDYAJxdYgxmpSpzFO4e4OXDtL+nrN80G2+eiWBWgBPIrAAnkFkBTiCzApxAZgU4gcwKcAKZFeAEMivACWRWgBPIrAAnkFkBTiCzApxAZgU4gcwKcAKZFeAEMivACWRWgBPIrAAnkFkBTiCzAnItKiLpr4BZQ78fEV8sKSazymiYQJKuJllJ9C5gV9ocgBPIWl6eM1AnMCddLN7MhshzDbQe+PORHjijPtCBkm6W9ED6Ouzi8mZVkOcMdBDwU0l3kNT8ASAiTmqwX736QKcCt0TERZKWAEuAxaML36y58iTQBaM5cEZ9oJOB16ftq0kWnXcCWSU17MJFxPeAnwHT0m0gbWuoTn2gQyLi4fTYDwMHjzJ2s6ZrmECSTgfuAE4DTgdul/TOPAevUx8oFxfYsirI04XrAV4ZEVsgKVsC/A/w1bw/EhG/l3QrcCKwWdLMiHg4LXWypc4+K4AVAJ2dnR4BtAkpzyjcXoPJk3o0z3716gMB1wPvS7/2PuC6kQRsNpHkOQN9W9J3gN7087uAG3PsV68+0A+BayR1Aw+RdA3NKqlhAkXE+ZLeARwPCFgREdfm2K9efaBHgTeMIlazCSfXXLiI+BrwtZJjMaucugkkqRYRXZK2k9y/eeZPJLd5ppcendkEVzeBIqIrfZ02fuGYVUue0bSr87SZtaI8w9hzh36QtDcwr5xwzKqlbgJJ+lh6/fMySdvSbTuwGd+7MQMyEigiPple/1wSEdPTbVpEvDAiPjaOMZpNWHmGsW+S9No9GyPithLiMauUPAl0/pD3U4HjgHXACaVEZFYheWYivG3oZ0mHAxeXFpFZhYxmWatNQO7HEswmszyr8izn2ZkIewHHAneXGJNZZeS5Blo75P1OoDci/rekeMwqJc810GpJzwP+kuRMdH/pUZlVRJ4u3FuAzwO/IJlIOlvS2RFxU9nBmU10ebpwlwHzI+LnAJKOAr4FOIGs5eUZhdsymDypB6mzjoFZq8l6HujU9O19km4EriG5BjoN+PE4xGY24WV14YbeQN0MvC59vxXwcrxmZD9Qd+Z4BmJWRVlduI9GxMV73Eh9RkT8U6mRmVVAVhduIH1dm/Eds5aW1YX7ZrqmW0dEnF/ve2atLHMYOyJ24ce3K6u3t5eOjg7a2tro6Oigt7e38U42InlupN4p6Xrgv4E/DDZGxNezdkofe/giSXGup0kWZLxc0gXAB0hG8wA+HhF5Vjq1Eejt7aWnp4eVK1fS1dVFrVaju7sbgAULFjQ5uslDjSo3SrpqmOaIiIUN9psJzIyIn0iaRvIQ3ikkFR4ej4hL8wbZ2dkZa9f6UmwkOjo6WL58OfPnz3+mrb+/n0WLFrF+/fomRlZNktZFROee7XnOQP+55+xrScc32imt/TNYB2i7pAHg0JzxWkEDAwN0dXXt1tbV1cXAwECdPWw08kzlWZ6zrS5Js0jWyb49bTpH0j2SVtWrker6QMW0t7dTq9V2a6vVarS3tzcpokkqIobdgNcA5wG/As4dsl0A3F1vv2GOsx9J9+3U9PMhwGDFhqXAqkbHmDdvXtjIrFmzJmbPnh19fX3x1FNPRV9fX8yePTvWrFnT7NAqCVgbw/zfzOrCPS/9z783SWnHQduAXBXq0uLCXwO+HOmgQ0RsHvL3K4Eb8hzLRmZwoGDRokUMDAzQ3t7O0qVLPYAwxvIMIhwZERvT93sB+0XEtoYHlkRSRPh3EfHhIe0zI62RKukjwKsi4t1Zx/IggjVbvUGEPNdAn5Q0XdK+wE+B+yXlubF6PPAe4ARJd6XbW4CLJd0r6R5gPvCREfw7zCaUPKNwcyJim6S/I6lMt5jkmuaSrJ0iokbyBOuefM/HJo08Z6Ap6bXMKcB1EbGDYSaXmrWiPAn0eWADsC9wm6QjSQYSzFpenlV5rgCuGNK0UdL8et83ayVZzwOdERFfknRuna9cVlJMZpWRdQbaN311iUezOrKeB/p8+nrh+IVjVi1ZXbgr6v0N/Ei3GWSPwq1Lt6nAK4AH0u1YYFfpkZlVQFYXbjWApPeTrEy6I/38OeC74xKd2QSX5z7Qi9h9IGG/tM2s5eWZynMRyWPd/enn15E80mDW8vLcSL1K0k3Aq9KmJRHx23LDMquGPGcg0oS5ruRYzCpnNDVSzSzlBDIrIFcXLl3440XAE8CGiHi61KjMKiJrJsL+wAeBBSTrI2wlual6iKQfAf8REf319jdrBVlnoK+SrCz61xHx+6F/kDQPeI+kF0fEyhLjM5vQsmYivDHjb4PTfMxaWsNBBEnHpwuKIOkMSZelT6Watbw8o3CfBf4o6Rjgo8BGkq6dWcvLk0A705UZTwYuj4jL8UN2ZkC+Yeztkj4GnAG8Ni26NaXcsMyqIc8Z6F3An4DudErPoTRYE86sVTRMoIj4bURcFhHfTz8/FBENr4EkHS6pX9KApPskfShtP1DSzZIeSF+Hrc5gVgV5RuFOTf+zPyZpm6TtkvKsC7cTOC8i2oFXAx+UNAdYAtwSEUcDt6SfzSopTxfuYuCkiNg/IqZHxLSImN5op4h4OCJ+kr7fTlL1+1CSwYjV6ddWk6x4alZJeRJoc0QUKmu2R4GtQwarM6SvB9fZxwW2bMLLk0BrJX1F0oK0O3eqpFPz/oCk/UhqBH04T1mUQRGxIiI6I6JzxowZeXezIVylu3x5hrGnA38E3jSkLYDMKt0wfIEtYPNgjaC0EPGWEcZsObhK9zgZrmzdWGwkpU2+CPzbHu2XkDwWDskAwsWNjuUSjyM3d+7c6Ovr262tr68v5s6d26SIqo06JR7zVKg7jKSo8PEkZ54a8KGI2NRgvy7g+8C9wODzQx8nuQ66BjgCeAg4LSJ+l3UsV6gbuba2Np588kmmTHn2nveOHTuYOnUqu3Z5Wb+RKlLm/ipgDXBa+vmMtK3ubG3ILLAF8IYcv2sFtLe3c+GFF/KNb3zjmRqpp5xyiqt0j7E8gwgzIuKqiNiZbl8AfFU/wc2fP59ly5axcOFCtm/fzsKFC1m2bBnz57syzVjKk0CPpI8xtKXbGcCjZQdmxfT397N48WJWrVrFtGnTWLVqFYsXL6a/3w8Rj6U810BHAP8OvIbkGugHJNdAG8sPL+FroJHzNdDYGnWV7kjmvp0UETMi4uCIOGU8k8dGp729nVqttltbrVbzNdAYy1pU5KMRcbGk5QxTVDhc3mRC6+npobu7+zn3gZYuXdrs0CaVrFG4wek77jtV0ODN0kWLFj0zCrd06VLfRB1jDa+BJgJfA1mzjfg+kKRvMkzXbVBEnDRGsZlVVlYX7tJxi8KsorLWhfveeAZiVkVZXbh7ye7CvayUiMwqJKsL99Zxi8KsorK6cL5ZatZAVheuFhFdkraze1dOQESOdRHMJrusM1BX+upVSM3qGEmBrcOHfj/SFXfMWlnDBJL0r8D7gQd59snSAE4oLyyzashzBjodOCoinio7GLOqyfNA3XrggJLjMKukPGegTwJ3SlpPssg84LlwZpAvgVYDy9h9dR0zI18CPRIRV5QeiVkF5UmgdZI+CVzP7l04D2Nby8uTQC9PX189pK3hMLakVSTz6bZEREfadgHwAWBwtfiPR8SNIwnYbCJpmEARMdqFxL5AsprPnsW4Ph0RftbIJoW6w9jpWnBZfz8qXb53WBFxG5C5ZK9Z1WWdgV5IMny9DlhH0u2aCrwEeB3wCKOrLneOpPeSLFZyXkT833BfknQWcBbAEUccMYqfMStf5qIiaUXuE0gWlp8JPEGyWs9NEfFQw4MnhbVuGHINdAhJ4gXwr8DMiFjY6DheVMSabVSLy0fELuDmdCssIjYPCehK4IaxOK5Zs+SZyjNm0oJag95OMk3IrLJyPc4wGpJ6gdcDB0naBHwCeL2kY0m6cBuAs8v6fbPxUFoCRcRwS2CuLOv3zJohz/NAzwfeAcxi9wfq/qW8sMyqIc8Z6DrgMZKh7D81+K5ZS8mTQIdFxImlR2JWQXlG4X4g6aWlR2JWQXnOQF3A+yX9kqQLN7islVcmtZaXJ4HeXHoUZhWVp8TjRpI1Ed6Wbgd41VKzRMMEkvQh4MvAwen2JUmLyg7MrArydOG6gVdFxB8AJC0DfggsLzMwGzlJI96nChUKJ7I8o3AChtZF35W22QQTEcNuRy6+oe7frJg8Z6CrgNslXZt+PgVPyTED8j3SfZmkW0mGswWcGRF3lh2YWRVklTeZHhHbJB1IMnN6w5C/HRgRflzbWl7WGWgNyao66ximPhDw4hLjMquErPpAb01fZ49fOGbVkuc+0C152sxaUdY10FTgz0ieKH0Bzw5dTwdeNA6xmU14WddAZwMfJkmWdTybQNuAz5Qbllk1ZF0DXQ5cLmlRRHjWgdkw8twHWi6pA5hDsrDiYPueS/aatZw8ayJ8gmR1nTnAjSSPN9R47prXZi0nz1y4dwJvAH4bEWcCxwDPLzUqs4rIk0BPRMTTwE5J04Et+CaqGZBvMulaSQcAV5KMxj0O3FFmUGZVkWcQ4R/Tt5+T9G1gekTc02i/OgW2DgS+QrLG3Abg9HrVGcyqIKv+zyv23IADgb3T9418AdhzOawlwC0RcTRwC6Mrj2I2YWSdgT6V8beGJR4j4ra0vMlQJ5OM6EFS/ftWYHFmhGYTWNaN1NGWdsxySEQ8nB7/YUkH1/uiC2xZFeS5D/Te4drLvpEaESuAFZAU2Crzt8xGK88o3CuHvJ9Kck/oJ4zuRupmSTPTs89MkiFxs8rKMwq32xJWkvYHrh7l710PvA+4KH29bpTHMZsQRlOh7o/A0Y2+lBbY+iHwF5I2SeomSZw3SnoAeGP62ayy8lwDfZNnH+luA9qBaxrtV6fAFiRdQLNJIc810KVD3u8ENkbEppLiMauUPGtjfw+4H9if5EbqzrKDMquKPGsi/D3J3LdTSWZm/0jSwrIDM6uCPF2484GXR8SjAJJeCPwAWFVmYGZVkGcUbhOwfcjn7cCvygnHrFrynIF+TbI29nUko3EnA3dIOheSpX9LjM9sQsuTQL9It0GDNz+njX04ZtWSZybChQCSpiUf4/HSozKriDyjcB2S7gTWA/dJWidpbvmhmU18eQYRVgDnRsSREXEkcB7J491mLS9PAu0bEf2DHyLiVmDf0iIyq5A8gwgPSvpnnp2BfQbwy/JCMquOPGeghcAM4OvpdhBwZplBmVVFo+oM/wC8BLgXOC8idoxXYGZVkHUGWg10kiTPm4FLxiUiswrJugaaExEvBZC0Ei+maPYcWQn0THctInZKyviqjadjLvwujz0xst70rCXfyv3d/feZwt2feNNIw2pJWQl0jKRt6XsB+6SfRTIjYXrp0dmwHntiBxsu+tvSjj+SZGt1WevCtY1nIGZVNJpFRcws5QQyK8AJZFaAE8isgDxz4cacpA0kj4bvAnZGRGcz4jArqikJlJofEY808ffNCnMXzqyAZiVQAN9Nn249a7gvSDpL0lpJa7du3TrO4Znl06wEOj4iXkEySfWDkl675xciYkVEdEZE54wZM8Y/QrMcmpJAEfGb9HULcC1wXDPiMCtq3BNI0r7pCj9I2hd4E8mCJWaV04xRuEOAa9PZ3XsDayLi202Io7KmtS/hpavLK3A+rR2gvMmqk8m4J1BEPAgcM96/O5lsH7jIs7EnCA9jmxXgBDIrwAlkVoATyKwAJ5BZAU4gswKcQGYFOIHMCnACmRXgBDIrwAlkVoATyKwAJ5BZAU4gswKcQGYFNHNZKyugzGd29t9nSmnHnmycQBU00ofpZi35VqkP4LUyd+HMCnACmRXgBDIrwAlkVoATyKwAJ5BZAR7GnkTSxSqH/9uy4dsjoqRoWkNTzkCSTpR0v6SfSypvic0WExEj3qyYZqyN3QZ8hqQywxxggaQ54x2H2VhoxhnoOODnEfFgRDwF/BdwchPiMCusGQl0KPCrIZ83pW27cYEtq4JmJNBwV7rP6Yy7wJZVQTMSaBNw+JDPhwG/aUIcZoU1I4F+DBwtabak5wHvBq5vQhxmhTWjPtBOSecA3wHagFURcd94x2E2FppyIzUibgRubMZvm40lT+UxK8AJZFaAqjCdQ9JWYGOz47CWdmREPOd+SiUSyGyichfOrAAnkFkBTiCzApxAZgU4gcwKcAKZFeAEMivACWRWgBPIrID/BwsdlHQtIratAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = (state_df['Population']/1000000).plot.box(figsize=(3, 4))\n", "# visualizing the distribution of the quartiles\n", "x_axis = ax.axes.get_xaxis()\n", "x_axis.set_visible(False)\n", "ax.set_ylabel('Population (millions) distribution')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAARgAAAEYCAYAAACHjumMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAoZElEQVR4nO3deXxdVbn/8c+TOWmTpk3SOZ1CB0rpRFpayiiUWYpX0SLKIIqgOF71cq9elZ969V79cZV7GewPEVS0yGhBZmQuhaSl85imKUnHpG3SNPPw/P5YO3AakuakPTv7nJPn/Xqd18nZZ+99vkmbJ3uvvdbaoqoYY4wfEoIOYIyJX1ZgjDG+sQJjjPGNFRhjjG+swBhjfJMUdIBIys3N1XHjxgUdw5i4tHLlyipVzevNNnFVYMaNG0dxcXHQMYyJSyKys7fb2CmSMcY3VmCMMb7xtcCIyMUiskVESkTkti7ev0ZE1nqP5SIyI+S9MhFZJyKrRcTOe4yJQb61wYhIInAXsBCoAIpEZJmqbgxZbQdwjqoeEpFLgCXA6SHvn6eqVX5lNMb4y88jmLlAiaqWqmozsBRYFLqCqi5X1UPeyxXAaB/zGGP6mJ8FZhRQHvK6wlvWnRuBZ0NeK/CCiKwUkZu620hEbhKRYhEprqysPKHAxpjI8vMytXSxrMuh2yJyHq7AnBmyeIGq7haRocCLIrJZVV//yA5Vl+BOrSgsLLSh4cZEET+PYCqA/JDXo4HdnVcSkenAfcAiVT3QsVxVd3vP+4EncKdcxpgY4meBKQImish4EUkBFgPLQlcQkTHA48DnVXVryPIBIpLZ8TVwIbDex6zGGB/4doqkqq0icivwPJAI3K+qG0TkZu/9e4EfAjnA3SIC0KqqhcAw4AlvWRLwZ1V9zq+sxhh/SDzNaFdYWKjRNFRg+/btEdlPQUFBRPZjzIkQkZXeAUDYrCevMcY3VmCMMb6xAmOM8Y0VGGOMb6zAGGN8YwXGGOMbKzDGGN9YgTHG+MYKjDHGN1ZgjDG+sQJjjPGNFRhjjG+swBhjfGMFxhjjGyswxhjfWIExxvjGCowxxjd+3lXAxCmbqc+Ey45gjDG+sQJjjPGNFRhjjG+swBhjfGMFxhjjGyswxhjfWIExxvjGCowxxjdWYIwxvrECY4zxjRUYY4xvrMAYY3xjBcYY4xsrMMYY31iBMcb4xgqMMcY3VmCMMb6xAmOM8Y0VGGOMb3wtMCJysYhsEZESEbmti/evEZG13mO5iMwId1tjTPTzrcCISCJwF3AJMBW4WkSmdlptB3COqk4HfgIs6cW2xpgo5+cRzFygRFVLVbUZWAosCl1BVZer6iHv5QpgdLjbGmOin58FZhRQHvK6wlvWnRuBZ3u7rYjcJCLFIlJcWVl5AnGNMZHmZ4GRLpZplyuKnIcrMP/S221VdYmqFqpqYV5e3nEFNcb4w88br1UA+SGvRwO7O68kItOB+4BLVPVAb7Y1xkQ3P49gioCJIjJeRFKAxcCy0BVEZAzwOPB5Vd3am22NMdHPtyMYVW0VkVuB54FE4H5V3SAiN3vv3wv8EMgB7hYRgFbvdKfLbf3Kaozxh6/3plbVZ4BnOi27N+TrLwJfDHdbY0xssZ68xhjfWIExxvjGCowxxjdWYIwxvrECY4zxjRUYY4xvrMAYY3xjBcYY4xsrMMYY3/jak9dEl+3btwcdwfQzdgRjjPGNFRhjjG+swBhjfGNtMAGT1kbS9q0k5fD7aEIizdkn0Zg3AxISg45mzAmzAhMUbWfQ5qUMXncfiU3VR73VmpZDzcmfpWbyYjQ5PZh8xkSAFZgASGsDw16/jQG73qB+xDxqTr6GxpypSHsraZWrySxZRs57/0PWtsfZf8aPoaAg6MjGHBcrMH2tvZXhr32X9N1vUznnexye/BmQD+c4rxu7kLqxC0nbt5Khb9/OyBduguZSOPt7kGBNZia22P/YPpaz6k4ydi+nct73OTxl8VHFJVTjsNMov+xhjky4DF79OTx6PTTX9W1YY06QFZg+lL7nHbI3/YmayZ+mduI/9bi+Jqez/4zb4cKfwsZl8PtL4PCePkjaNWltJOnIbhKaqkG7vIuMMUexU6Q+Iq0N5K34Gc2ZYzhw2rd6saHAGV+D3EnwyA1w3wVwzV9h2Cn+hQ2RXLODrJInSd/9NqnVJR8sb00bQv2oM6me+nlasq2NyHTNCkwfyd74J5KPVLBr4RI0MbX3O5h0EXzhWfjzZ+B3F8GnH4STzo98UE/KgY3krL6bjN3L0YQkGobO5uD0L9OaMZSE1gZSqzYwcOdLZJY+TfXU6zg48xZIsP9O5mj2P6IPJDTVMGjjHzmSfx6NwwuPf0cjZsAXX4Y/fxoeugouvwNOuz5iOQES6ysZ8t7/kln6NO2pgzg44xYOT/wkbelDPrJuQlM1OavuZPCG35N8uIz9Z/3H8RVPE7eswPSB7A1/IKGljkMzbjnxnQ0aBV94Dh65Hp76BhzcAef/6ISvMElrA9kb/0T2+t8j2kb1KddSPe0LtKdkdrtNe2o2lfN/SNPgSeQV/Re88W/sO+eXINa0ZxwrMD6Tlnqytj5C3dgLaB58UmR2mpoJVz8Mz34X3vo17NsAi/4XMof3fl+qDCx7jpxVd5JUv48jY87nwOxv0Jo5OuxdHJ6yGNE2cov/L81r7uXQzK/0PoeJS1ZgfJZZ+jSJLUeoPvmayO44MQkuuwOGToUXfgB3z4MLfwYzFoc3zECV9L3vMmT13aRVraNpyBT2nflTGoeddlxxaqZ8lpRD2xi87nc0jJxP49BZx7UfE1/sWNZP3nCAxpxTaMo9NfL7F4G5X4IvvwFDCuBvX4Hfng2r/ghNR7rcJKHpMJnbnmDUs59n5Eu3kFS3l/1n/JiKS/903MWlI0vVnO/ROmAEectvR1obj39fJm7YEYyP0vesIOVwGfsW/LTbDnURkTcJbnwRNjwOr/0nLLsV/v5tGDETcidCcjo0VDN611pSqksQbac5axz75/2A2gmXQ2JKRGJocgaV8/+dkS/dwqBND1F96o0R2a+JXVZgfJRV8jfaUrM5Mnah/x+WkACnfgqmfRLeXwFbnoGKIih9DVrqIC2btrRhHDr1RupGn0PzkJN9KXoNI07nSP55DN7wALUnfaLLq0+m/7AC45fGGjLKX6N24icgMbnvPlcExs53j0729NGUmQdnfY0BT11F9vrfcWDOd/vkM010sjYYv2z8GwntzdROuCzoJH2uZdA4aidcRta2J0hoPBR0HBMgKzB+WbOU5qyxNOX0TZf+aFN9ynVIWxODNv8l6CgmQFZg/FBTATvfonb8Zf427kaxlkHjqRtzHoO2PIy0NgQdxwTECowfNv8dgLpxFwYcJFg1U64msbmWgTtfDDqKCYgVGD9sfhryptCSNSboJIFqHDqb5qxxZG19LOgoJiBWYCKt/iCUvQVT+l/j7keIcHjSJ0mrWkfKwS1BpzEBsAITadteAG2zAuOpnXA57QkpZG17POgoJgC+FhgRuVhEtohIiYjc1sX7U0TkbRFpEpHvdHqvTETWichqESn2M2dEbX4aMkfCCBuLA9CeOoi6/HNdO0x7S9BxTB/zrcCISCJwF3AJMBW4WkSmdlrtIPB14Ffd7OY8VZ2pqicwiUofammAkpdhyqU2QXeII+MvIbGpmow97wQdxfQxP38L5gIlqlqqqs3AUmBR6Aqqul9Vi4D4+NNW9ia01MOkS4JOElXqR55BW0oWA3c8F3QU08f8LDCjgPKQ1xXesnAp8IKIrBSRm7pbSURuEpFiESmurKw8zqgRsv0fkJQG4xYEmyPaJCZTN+Z8BpS/Yn1i+hk/C0xXPcx6MxX9AlWdjTvF+qqInN3VSqq6RFULVbUwLy/veHJGTsnLMPYMN3rZHKV2/MUktDaQsevNoKOYPuRngakA8kNejwZ2h7uxqu72nvcDT+BOuaJXTQVUbYGCjwWdJCo1Dp1FW2o2A8pfDTqK6UN+FpgiYKKIjBeRFGAxsCycDUVkgIhkdnwNXAis9y1pJGx/xT0X+DfTf0xLSKJu1FnuCMauJvUbYRUYEXlMRC4TCX82Z1VtBW4Fngc2AX9V1Q0icrOI3Oztd7iIVADfBn4gIhUikgUMA94UkTXAu8DfVTW6Wwi3vwyZI2DoyUEniVr1+eeQ2FxL+r73go5i+ki488HcA9wA3CkijwAPqOrmnjZS1WeAZzotuzfk6724U6fODgMzwswWvPY2KH0VJl/abwc3hqN+xHzaE1PJqHiNhhHRfcZrIiOsIxJVfUlVrwFmA2XAiyKyXERuEJE+nE0pSu1ZAw2HYMJ5QSeJapqcTsPwua4dxm492y+EfcojIjnA9cAXgfeA3+AKjg2VLfOujIzv8kKXCVGXfy7JdXtIqd4WdBTTB8Jtg3kceAPIAD6uqleo6sOq+jVgoJ8BY0LZm5AzETKHBZ0k6tWPPguAjIo3Ak5i+kK4bTD3ee0pHxCRVFVtiplu/H5pb4P333aTbZsetaXn0jR4Mhl7VgQdxfSBcE+RftrFsrcjGSRm7V0LTYdh3JlBJ4kZ9SPnkVa5Bppqg45ifHbMAuNdRj4NSBeRWSIy23uciztdMmVvueexNjwgXA0j5iPtrR/+7Ezc6ukU6SJcw+5o4I6Q5bXAv/mUKbaUvenuqpg1IugkMaNh6EzaE9NI2P4yTL446DjGR8csMKr6IPCgiHxSVW3ew87a2+D95TB1Uc/rmg8lptAwvJAB2/8RdBLjs2MWGBH5nKr+CRgnIt/u/L6q3tHFZv3HvvXQWANjrf2ltxpGzGNA8a/g0E4YPDboOMYnPTXyDvCeBwKZXTz6t442BJueodfqR3p3nrSjmLjW0ynSb73n2/smTox5/23IHgODuhrtYI6lJWucG7tV9iYU3hB0HOOTcDva/ZeIZIlIsoi8LCJVIvI5v8NFNVV3c/n804NOEptE3KX9sjds2EAcC7cfzIWqehi4HDfPyySgf9/V/PAuqN0Do+cEnSR2jTsTjuyDAyVBJzE+CbfAdAxovBT4i6oe9ClP7Kgocs9WYI7fODdsgB2vB5vD+CbcAvOUiGwGCoGXRSQPaPQvVgwoL3Lz7w6bFnSS2DVkgrvFS5lNoxmvwp2u4TZgPlCoqi1AHZ3uENDvVBTBiJmQlBJ0ktglAuPPcgXG2mHiUm+mzDwZ+IyIXAt8CjeNZf/U2gR7VkO+nR6dsHFnQt1+qNoadBLjg7BGU4vIH4ECYDXQ5i1W4A/+xIpye9dBW7O1v0RCxyDRHa9D3uRgs5iIC3e6hkJgqqodxwIhDbw27eMJGzweska7y9VzvxR0GhNh4Z4irQeG+xkkppS/634pbIDjiRNxPaF3vm3tMHEo3COYXGCjiLwLNHUsVNUrfEkV7SqKYXTfzbO1ffv2PvusQIyZB2sfhkM73JUlEzfCLTA/9jNETKndCzXvw7ybg04SP8Z445LeX2EFJs6Ee5n6NdzdBJK9r4uAVT7mil7WwS7ycidD2iA3tsvElXDHIn0JeBT4rbdoFPCkT5miW0URJKbAiNi5bVPUS0iA/HnuCMbElXAbeb8KLMDdEA1V3QYM9StUVCsvguHTISk16CTxZcw81xem7kDQSUwEhVtgmlS1ueOFiCTh+sH0L20tsPs9Oz3yQ0c7TPk7weYwERVugXlNRP4NN/n3QuAR4Cn/YkWpfRugtcF68Pph5Cx36mntMHEl3AJzG1AJrAO+jLvf9A/8ChW1rIHXP8lprshYO0xcCesytaq2i8iTwJOqWulvpChWUQQDh8Og/KCTxKcx8+Dtu6GlAZLTg05jIqCn+yKJiPxYRKqAzcAWEakUkR/2TbwoU/6u62AnEnSS+DRmPrR77VwmLvR0ivRN3NWjOaqao6pDgNOBBSLyLb/DRZW6KtfT1E6P/NMx/ai1w8SNngrMtcDVqrqjY4GqlgKf897rPyqK3XO+DXD0TcYQ1+nO2mHiRk8FJllVqzov9NphkrtYP35VvAsJSW6SKeOfMfPcper29qCTmAjoqcA0H+d78aeiyE2PmWK35PbVmPnuZnaVm4JOYiKgpwIzQ0QOd/GoBU7ti4BRob0Ndq2y9pe+MMZrh7EOd3HhmAVGVRNVNauLR6aq9p9TpP2boPmIFZi+MHg8DBgK71uBiQe9mZO3/+roYGc9eP0n4o5iyq2hNx74WmBE5GIR2SIiJSJyWxfvTxGRt0WkSUS+05tt+1RFEWTkuL+uxn/5p8OhMqjdF3QSc4LCnXCq10QkEbgLWIi7G2SRiCxT1Y0hqx0Evg5ceRzb+qbzDHL5pW/RMngqe0tL++LjTf4891y+Aqb277vjxDo/j2DmAiWqWuqNxF5Kp3spqep+VS0CWnq7bV9JaKoh5XAZjXnTg/j4/mnEDHdTO2uHiXl+FphRQHnI6wpvWUS3FZGbRKRYRIorKyM/TCq1aj0ATbn956JZ4JJSYORsa4eJA34WmK4G7IQ7h0zY26rqElUtVNXCvLy8sMOFK61qHSoJNOaeEvF9m2MYczrsWQPN9UEnMSfAzwJTAYQOOx4N7O6DbSMqrXItzdkFaPKAID6+/8qfB+2tsLt/Tv0cL/wsMEXARBEZLyIpwGJgWR9sGznaTmrVehrt9KjvdYz5snFJMc23q0iq2ioitwLPA4nA/aq6QURu9t6/V0SGA8VAFtAuIt/E3UHycFfb+pW1O8k1ZSS2HKHJGnj7XsfAx/J3g05iToBvBQZAVZ/BzX4XuuzekK/34k5/wtq2r6VVrQWwI5igjDkdNi5zAx8TrE9oLLJ/tWNIq1xHW0oWLVljg47SP+XPg8Zqd7cBE5OswBxDatU6mnKngdiPKRAdE1DZ5eqYZb853ZDmI6RUb7fToyDlFEBGrnW4i2FWYLqRdmADgloP3iCJuKMYO4KJWVZgupFauQ7AnSKZ4Iw5HQ6WwpH9QScxx8EKTDfSKtfQPKiA9pTMoKP0bx8MfLTTpFhkBaYr2k5a1To7PYoGI2dCYqp1uItRVmC6kHy4jMTmwzTmWQNv4JJS3R0frcNdTLIC04W0Sq+DXd6MgJMYwBv4uBpaGoNOYnrJCkwXrINdlMmfB23NdsfHGGQFpgtplWtc+4t1sIsO1uEuZvk6FikmNVSTUlNK7biLg04S9zpPTXos+Vljadn8D/aOuOIj7xUUFEQylokg+xPdmXeLWBtBHV0a82a4tjENd84yEw2swHRW8a43g511sIsmjXkzSWyqJvlwWdBRTC9Ygems/B2asyeiyXaL2GjSONRd0UvbvybgJKY3rMCEam+DipXWwS4KtWSNoy01m7TK1UFHMb1gBSbU/k3QXGsd7KKRCI1500mrtCOYWGIFJlSF6y1qHeyiU2PeTFIO7ySh8VDQUUyYrMCEKi+CjFxaB3Y5i6cJ2AftMF5PaxP9rMCEKl/hZrOXrm7LZILWlDMVTUi2dpgYYgWmQ+1eN+/I2DOCTmK6oYmpNOZMJX3fyqCjmDBZgemwc7l7tgIT1RqHFZJ6YBPSUhd0FBMGKzAddi6H5AEw3Bp4o1nD8EJE20jfbwMfY4EVmA47l7tpARJteFY0a8ybgSYkk763KOgoJgxWYADqD8L+DXZ6FAM0KY3G3FNJ21scdBQTBisw8OF0jGMXBJvDhKVheCGph7aQ0FwbdBTTAyswADvfcvO+jpwddBIThobhcxBtJ23fqqCjmB5YgQHX/jK6EJLTgk5iwtCYeyrtianWDhMDrMA01cKeNdb+EksSU2jMm076PmuHiXZWYMrfBW2zAhNjGobNIfXQVhKaqoOOYo7BCszOt0ASYfTcoJOYXmgcfhqA9eqNclZgSl+DUadB6sCgk5heaMydRnvyANJ320Tg0ax/F5jGGti9CiacG3QS01sJydQPn0vG7uU2T28U698FpuxN0HYrMDGqYeR8kuv2QNW2oKOYbvTvAlP6KiRnwOg5QScxx6F+xHz3RclLwQYx3bICM3YBJKUEncQch9bMUTRnjYPtLwcdxXSj/xaYml1QtRUmnBN0EnMC6kfOd6e6LQ1BRzFd8LXAiMjFIrJFREpE5LYu3hcRudN7f62IzA55r0xE1onIahGJfI+qHa+5Z2t/iWkNI8+A1kbX3cBEHd8KjIgkAncBlwBTgatFZGqn1S4BJnqPm4B7Or1/nqrOVNXCiAcsfQ0ycmHoKRHftek7DcNmu3FkJXaaFI38PIKZC5SoaqmqNgNLgUWd1lkE/EGdFUC2iIzwMZOj6tpfxp8NCf33LDEeaFI6jDsTtj5nl6ujkJ+/XaOA8pDXFd6ycNdR4AURWSkiN3X3ISJyk4gUi0hxZWVleMn2roUje2HiwvDWN9FtyqVuPuXKLUEnMZ34WWC6mpq/85+YY62zQFVn406jvioiZ3f1Iaq6RFULVbUwLy8vvGRbX3DPJ10Q3vomuk2+1D1vfjrYHOYj/CwwFUB+yOvRwO5w11HVjuf9wBO4U67I2Pa8m/tl4NCI7dIEKGukG+6x+e9BJzGd+FlgioCJIjJeRFKAxcCyTussA671ribNA2pUdY+IDBCRTAARGQBcCKyPSKq6A1BRDJMuisjuTJSYcpkb9lGzK+gkJoRvBUZVW4FbgeeBTcBfVXWDiNwsIjd7qz0DlAIlwP8DvuItHwa8KSJrgHeBv6vqcxEJVvISoNb+Em+mfNw9b3km2BzmKL5Ooa+qz+CKSOiye0O+VuCrXWxXCvhz/5Btz8OAoTBili+7NwHJmwQ5E2HTUzD3S0GnMZ7+dY22rcX1lzjpArs8HY9OuRLK3oDafUEnMZ7+9VtW9gY0VrvzdRN/Tr3KjY5f/1jQSYynfxWYjcvc3RtPOj/oJMYPeZNh+HRY99egkxhP/ykw7W2un8SkCyE5Peg0xi/TPw2734OqkqCTGPpTgXn/bairhJOvCDqJ8dO0TwIC6x4JOomhPxWYjcsgKQ0mXhh0EuOnrJEw/ixYuxTa24NO0+/1jwLT3u4uXxacb5N79wezroVDZbDj1aCT9Hv9o8DsfAtqd8O0fwo6iekLU6+AjBwo/n3QSfq9/lFg1iyFlMwPB8WZ+JaUCjM/68Ym1e4NOk2/Fv8FprkeNj4JpyyClIyg05i+ctoN7o6dKx8MOkm/Fv8FZvPfofkITF8cdBLTl3IKXIP+u0tsvt4AxX+BWbsUBuW7uweY/mXBN6C+ClY/FHSSfiu+C0zNLtj+D5j+GRt71B+NXQCjCmH5/0Bba9Bp+qX4/q1b+YCbp3X2tUEnMUEQgTO/6S5Z2/CBQMRvgWlthlUPuvPwwWODTmOCMvkyGDETXvkPaGkMOk2/E78FZvPTcGQfzPli0ElMkBISYOHtUFMORfcFnabfid8C8+4SyB5rI6eNu7lewcfg9V/CkTDvPGEiIj4LzPvvuMGN874CCYlBpzHR4OJfQEs9PPeRG4waH8VngXnr15A+BGZ/PugkJlrkTYaz/hnWPwpbnw86Tb8RfwVm/yY38fPpX4aUAUGnMdHkzG9B3snwt6/C4T1Bp+kX4q/AvPpzSBkIc7u9GaTpr5JS4aoHoLkOHr3BzdFsfOXrXQX6XEs9bPwbnHMbZAwJOo2JRkOnwMfvhMe/CMu+Dovu+qAT5vbt2yPyEQUFBRHZT7TlOR7xVWAO73bD9Od/5E4oxnxo+lVwaAe88jM3P9DF/2k9vX0SXwWmqRbO/jmkZQWdxES7s78LjTXw9v9CXRVceU/QieJSfBWYpDTrWGfCIwIX/hQG5MFLP4IDJaQU/oDmwScFnSyuxNdxYXY+JCYHncLEio6xSov/Aod3M/qZz5Kz8r9JaDwUdLK4EV9HMCk23645DlMuhfy51D72TQZteoisLY9wZOxCagsup3HoTEiwP1rHK74KjDHHa0AulQtup3radWRvfIiBO18gq/Qp2pMyaBg6i+bBk2jOnkBL1lha03NpS8uxo+UwWIExJkTLoAlUzv93quZ8h4xdy0nf+w7p+94jY887iB49p0xbyiDakwfQnpyBJqW7r5PSYc1QV3wSU7xHyNcJSR9+nZTi2g2TUjs9p0NSKon1tbSlDY7pIygrMMZ0QZPSqRt7PnVjvcGy7S0kHy4nubacxMYDJDUcILHxAAktdUhLPQmtDSS01pPYUAW1pa4TX3sLtDW7r9uaob13k16N857bUgbRlp5Dy8CRtAwaT3PWWJqzC2geMgVNTI3o9x1pVmCMCUdCMi3ZE2jJntDjqt12bGtvP7rotDZBW5N7bm08+rmlnsqdm0lsPEiiV8ySa8tJ31tEQlsTAJqQRNPgKTTmnUrDiNNpGDYHjbLbIluBMaavJCRAQqo7FQrD4dRpH13Y3kZS3V5SD20htWodaZXryNr2BNmb/4ImJNMwdDb1oxZQN+ZjtA4cGeFvoPeswBgTSxISac0cRWvmKOrGfAwAaWsibf9qMnYvJ2PXcnJX3kHuyjtozJnGkbEXwJAvBDaroxUYY2KcJqa6U6QRp3PgtG+RVFvOwJ0vM2Dni+Su+jWs+jWMnA1TF7nHkPF9ls0KjDFxpjUzn+pp11M97XqSaisYW7caNjzpeiy/9CMYPt0rNldCrr89l63AGBPHWjNHw8xz3D2iDu2ETU+5GQf+8RP3GHqKu5f3SQth5MyIzwBpBcaY/mLwWDjjVveo2fVhsXn1F24epbRsGH82FJwH+fPcLIAnWHB8LTAicjHwGyARuE9Vf9HpffHevxSoB65X1VXhbGuMOQGDRsG8m92jrgpKX4Xtr0DpK7BpmVsnZSCMnOUeeVOO62N8KzAikgjcBSwEKoAiEVmmqhtDVrsEmOg9TgfuAU4Pc1tjTCQMyIVTP+UeqnCgBCqKYVcx7FoJK+5x/XeOg59HMHOBElUtBRCRpcAiILRILAL+oKoKrBCRbBEZgevE2NO2xphIE4Hcie4x82q3rK0VqnfC7b1vEPazwIwCykNeV+COUnpaZ1SY2wIgIjcBHRPwNonI+hPIHGm5QFXQIUJYnmOzPMc2ubcb+FlgpItlGuY64WzrFqouAZYAiEixqhb2JqSfLM+xWZ5ji8Y8vd3GzwJTAeSHvB4N7A5znZQwtjXGRDk/Z7QrAiaKyHgRSQEWA8s6rbMMuFaceUCNqu4Jc1tjTJTz7QhGVVtF5Fbgedyl5vtVdYOI3Oy9fy/wDO4SdQnuMvUNx9o2jI9dEvnv5IRYnmOzPMcW83nEXcAxxpjIi69Jv40xUcUKjDHGN3FRYETkYhHZIiIlInJbwFnyReQVEdkkIhtE5BtB5ukgIoki8p6IPB0FWbJF5FER2ez9nOYHnOdb3r/VehH5i4ikBZDhfhHZH9qPS0SGiMiLIrLNex4ccJ5fev9ma0XkCRHJ7mk/MV9gQoYVXAJMBa4WkakBRmoF/llVTwbmAV8NOE+HbwCbgg7h+Q3wnKpOAWYQYC4RGQV8HShU1Wm4iwqLA4jyAHBxp2W3AS+r6kTgZe91kHleBKap6nRgK/CvPe0k5gsMIUMSVLUZ6BhWEAhV3dMxYFNVa3G/PKOCygMgIqOBy4D7gszhZckCzgZ+B6CqzapaHWgodzU1XUSSgAwC6HOlqq8DBzstXgQ86H39IHBlkHlU9QXVD26tsALXP+2Y4qHAdDfcIHAiMg6YBbwTcJRfA98D2gPOATABqAR+752y3SciA4IKo6q7gF8B7wN7cH2xXggqTyfDvH5heM9DA84T6gvAsz2tFA8FJuxhBX1JRAYCjwHfVNXDAea4HNivqiuDytBJEjAbuEdVZwF19O2h/1G8do1FwHhgJDBARD4XVJ5YICLfxzUFPNTTuvFQYMIZktCnRCQZV1weUtXHg8wCLACuEJEy3Onjx0TkTwHmqQAqVLXjqO5RXMEJygXADlWtVNUW4HHgjADzhNrnzS6A97w/4DyIyHXA5cA1GkYnungoMFE1rMCbROt3wCZVvSOoHB1U9V9VdbSqjsP9bP6hqoH9hVbVvUC5iHSMzD2fYKfheB+YJyIZ3r/d+URPY/gy4Drv6+uAvwWYpWMSuH8BrlDV+rA2UtWYf+CGG2wFtgPfDzjLmbhTtLXAau9xadA/Iy/bucDTUZBjJlDs/YyeBAYHnOd2YDOwHvgjkBpAhr/g2oBacEd5NwI5uKtH27znIQHnKcG1d3b8v763p/3YUAFjjG/i4RTJGBOlrMAYY3xjBcYY4xsrMMYY31iBMcb4xgpMAEREReSPIa+TRKTyREc6i8iPReQ7J7D9uSJS43Xh3ywivwpjmyuPZzCnt90Pjy9pj/v+mYiUi8iRTstTReRhb9T9O95Qjo73rvNGLW/zOpN1td9uRzeLyL96+90iIheFLD9NRNZ5793p9bVBRG4VkRsi/s1HGSswwagDpolIuvd6IbCrNztw0xjLCf37eYP7OntDXRf+WcDlIrKgh91ciRvF3lvfA+4+ju2O4o2m7+wp3CDYzm4EDqnqScB/A//p7WMI8CPcrXHmAj/qZmqELkc3ewV2MXAKbgTy3SG57sHdVqfjBoMdI5Tvx43ijmtWYILzLG6EM8DVuI5NwEePRLx5SsZ5j00icjewCsgXke97fzVfIuS+NSJSICLPichKEXlDRKZ4yx8QkTtE5BW8X7CuqGoDrjPVKG+7L4lIkYisEZHHvJ6vZwBXAL8UkdXeZ3b5uaFEZBLQpKpVIZnu9dbf6o2f6pjD5pfe564VkS97y88VN+fOn4F1XWRfod4gwU5CRyc/CpzvHVFcBLyoqgdV9RBuWoLOUxV03j50dPMiYKmqNqnqDlyHtLle9/4sVX1bXYezP3Rso64nbJmIdFUI44YVmOAsBRaLm9xoOuGPuJ6MuxvmLNyNuRbjjjb+CZgTst4S4GuqehrwHY4+WpgEXKCq/9zdh3h/wScCr3uLHlfVOaraMX/Ljaq6HNed/buqOlNVt/fwuR0W4ApkqHHAObiie6/3c7kRN7p5jve9fUlExnvrz8X12u7N0dMHI+/VTTtQg+stG+6I/O5GNx/rBoIVx9hvMXBWL/LHHD/vi2SOQVXXem0AV+PurhCunaq6wvv6LOAJ768hIrLMex6IG7D3iHfKD5Aaso9HVLWtm/2fJSJrcYXsF+rGDoE7pfspkA0MxN3x4ShhfG6HEbgpG0L9VVXbgW0iUgpMAS4EpovIp7x1BuGKXjPwrne00BsnfKO/CO93P+77jFtWYIK1DDcXybm4v6QdWjn66DJ0Cse6Tvvo6hchAahW1ZndfG7nfYR6Q1Uv905j3hSRJ1R1NW6GsytVdY2IXO9l7u3ndmjAFYtQnb+Pjl/Qr6nqUcVMRM7t4XvoTsfI+wqv/WkQblKlCo7+fkYDr3ax/T4RGaGqe+To0c3djeiv4OhJmTqP9E/D/Szilp0iBet+4P+oaud2hDK8KQxEZDZurpKuvA58QkTSRSQT+DiAuvlndojIVd4+RERm9CaYqm4Ffo4bPQuQCewRNxXFNSGr1nrv9eZzNwGd76R+lYgkiEgBblKqLbijpFu8z0REJsmJTU4VOjr5U7iR5ep9zoUiMtg7NbzQW4aI/FxEPtHF9qGjm5fhTndTvVO4ibgjrD1ArYjM89p6ruXoEdGTcAMs45YVmACpaoWq/qaLtx4DhojIauAW3EjxrrZfBTyMa4x9DHgj5O1rgBtFZA2wgeObRvRe4Gzvl+bfce1EL+JGHndYCnxX3KXtgjA/93VgVsclW88W4DVc4/fNqtqIm+JzI7BK3OTTvyWMo24R+S8RqQAyRKRCRH7svfU7IEdESoBv410FUtWDwE9wU38U4Yp+x3SRpwIdp4m/ABaKyDbclb9feNtvAP7qZX0O+GrIKegt3vdRghvtHzoL3ALgpZ6+n1hmo6lNIETkN8BTqvqSiDyAm0bi0YBjfYSIPK+qF/W8Zq/3Owv4tqp+PtL7jiZ2BGOC8h+4Cbajmh/FxZOLOyqMa3YEY4zxjR3BGGN8YwXGGOMbKzDGGN9YgTHG+MYKjDHGN/8fYaSxHwe9SlMAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = state_df['Murder.Rate'].plot.hist(density=True, xlim=[0, 12], facecolor='gainsboro',\n", " bins=range(1,12), figsize=(4, 4))\n", "state_df['Murder.Rate'].plot.density(ax=ax)\n", "ax.set_xlabel('Murder Rate (per 100,000)')\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Correlation\n", "\n", "Pearson's correlation coefficient: $\\frac{\\sum_{i=1}^n (x_i-\\bar{x})(y_i-\\bar{y})}{(n-1)s_xs_y}$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "0.5837062198659806" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gapminderdatapath ='data/gapminder/' # change this to adjust relative path\n", "gap_df = pd.read_csv(gapminderdatapath+'gapminder.tsv', sep='\\t')\n", "gap_df['lifeExp'].corr(gap_df['gdpPercap'])" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lifeExppopgdpPercap
lifeExp1.0000000.0649550.583706
pop0.0649551.000000-0.025600
gdpPercap0.583706-0.0256001.000000
\n", "
" ], "text/plain": [ " lifeExp pop gdpPercap\n", "lifeExp 1.000000 0.064955 0.583706\n", "pop 0.064955 1.000000 -0.025600\n", "gdpPercap 0.583706 -0.025600 1.000000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gap_df[['lifeExp','pop','gdpPercap']].corr()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVwAAAEYCAYAAAAQ305WAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAcT0lEQVR4nO3de/gdVX3v8fcnAeQuBAskXAQxVmMLqDTogaNQCE+gxUClQqDIpTRiibdW25xHq7H2nAfrqbQckBA0EkCI2kITJOUWQQQFEiDkIiIxRAmJRO5Xi0m+549ZPxg2+/fbM9l7T+a39+f1POvZc1kzs/bO7/my+M6aNYoIzMys+0Zs7gaYmfULB1wzs4o44JqZVcQB18ysIg64ZmYVccA1M6uIA66Z9SxJsyStk7RskP2SdL6kFZKWSHp3bt9ESQ+mfdM60R4HXDPrZZcCE4fYfzQwNpUpwEUAkkYCF6b944DJksa12xgHXDPrWRFxG/DkEFUmAZdF5k5gJ0mjgfHAiohYGREvA3NS3bZs0e4JWjls+gV+lK2Dvn3OyZu7CT3lijvu3dxN6Cl/f9yRauf4svHih1/6+EfJeqYDZkbEzBKn2AN4JLe+Om1rtv3gMm1rpusB18ysW1JwLRNgGzX7D0QMsb0tDrhmVhtSWx3kTbEa2Cu3viewBthqkO1tcQ7XzGpjhFSqdMA84CNptMJ7gWciYi2wEBgraV9JWwEnpbptcQ/XzGqj0x1cSVcBhwFvkrQa+CKwJUBEzADmA8cAK4AXgTPSvvWSpgI3ACOBWRGxvN32OOCaWW10qNf6ioiY3GJ/AOcMsm8+WUDuGAdcM6uNzZDDrZQDrpnVhgOumVlFRvR2vHXANbP6UNPhr73DAdfMamPkiN4eqeqAa2a10eMpXAdcM6uPTg8LqxsHXDOrDY9SMDOriAOumVlFPCzMzKwi7uGamVXEN83MzCriBx/MzCrS4x1cB1wzqw+nFMzMKuKbZmZmFXHANTOriMfhmplVxD1cM7OK+KaZmVlF3MM1M6tIb4db6O3p1c1sWBkhlSpFSJoo6UFJKyRNa7L/s5IWp7JM0gZJo9K+VZKWpn2L2v1+7uGaWW10OqUgaSRwITABWA0slDQvIn46UCcivgp8NdU/Fvh0RDyZO83hEfF4J9rjHq6Z1YakUqWA8cCKiFgZES8Dc4BJQ9SfDFzVga/SlAOumdXGCJUrkqZIWpQrUxpOuQfwSG59ddr2OpK2BSYC/5HbHMCNku5pcu7SnFIws9oom1KIiJnAzKFO2eywQeoeC9zRkE44JCLWSNoVuEnSzyLitlKNzHHANbPa6MJr0lcDe+XW9wTWDFL3JBrSCRGxJn2uk3QNWYpikwOuUwpmVhtSuVLAQmCspH0lbUUWVOe9/rp6I/ABYG5u23aSdhhYBo4ClrXz/dzDNbPa6PSTZhGxXtJU4AZgJDArIpZLOjvtn5GqHg/cGBEv5A7fDbgmpTm2AK6MiOvbaY8DrpnVRjfe+BAR84H5DdtmNKxfClzasG0lcEAn2+KAa2a14bkUzMwq0uPx1gHXzOrDk9ckknYnGxIRwMKI+HXXWmVmfanXUwqFhoVJOgu4G/gz4ATgTklnDlH/lac/1txzR2daamY9rwuP9tZK0R7uZ4F3RcQTAJJ2AX4MzGpWOf/0x2HTLxjsqQ4zs9cYhjG0lKIBdzXwXG79OV77fLKZWdt6PaVQNOA+CtwlaS5ZDncScLekvwGIiK91qX1m1keGY5qgjKIB9xepDBh4/G2HzjbHzPrZiB5/50PRgPuViPhtfoOkN3VqUl4zM+j9HG7RyWvulvTegRVJHyK7aWZm1jEepZA5BZgl6VZgDLAL8MfdapSZ9SffNAMiYqmk/w1cTjZC4f0RsbqrLTOzvtPj8bZYwJX0TWA/YH/gbcC1ki6IiAu72Tgz6y/DMU1QRtGUwjLgrIgI4OGUz/VQMDPrqL5OKUjaMSKejYjz8tsj4hlJX+pu08ys3/R6D7fVKIVbBxYkLWjY95+dboyZ9beyb+0dblqlFPJfadQQ+8zM2tbrPdxWATcGWW62bmbWlm68YqdOWgXcXdN8Ccotk9Z/r6stM7O+M3I45glKaBVwL+HV+RLyywDf6EqLzKxv9fUohYjwSAQzq0yv53CLvvHhbZIWSFqW1veX9PnuNs3M+k035lKQNFHSg5JWSJrWZP9hkp6RtDiVLxQ9tqyik9dcAvwv4HcAEbEEOKndi5uZ5XV6WJikkcCFwNHAOGCypHFNqv4oIg5M5R9LHlv8+xWst21E3N2wbX07FzYza9SFHu54YEVErIyIl4E5ZC9Q6PaxTRUNuI9L2o80FEzSCcDadi5sZtZohFSq5F9Ym8qUhlPuwWtfB7Y6bWv0Pkn3S/ovSe8seWxhRedSOIfspZBvl/Qo8DDZlI1mZh1TdpRC/oW1g2h2wsZnCO4F3hwRz0s6huwp2rEFjy1lyB6upE+mxdERcSTZ2Nu3R8ShEfHLdi5sZtaoCymF1cBeufU9gTX5Cmm+mOfT8nxgS0lvKnJsWa1SCmekz/+XGvNCRDw3RH0zs03WhYC7EBgraV9JW5Hd7J/XcM3dlU4maTxZXHyiyLFltUopPCBpFfB7kpbk2whEROzfzsXNzPI6/aBZRKyXNBW4ARgJzIqI5ZLOTvtnACcAH5O0HngJOClNRdv02Hba0+rBh8mSdk8X/GA7FzIza6UbDz6kNMH8hm0zcssXABcUPbYdLW+aRcSvgQM6dUEzs8H09eQ1kr4bER+WtJTX3p1zSsHMOq6v51IABkYp/Gm3G2Jm1uPxtmUOd2369BAwM+u6vu7hSnqO5gN9B1IKO3alVWbWl3p9trBWPdwdhtpvZtZJfR1wzcyq1OMvfHDANbP6cA/XzKwiDrhmZhUZ0c8PPpiZVanHO7gOuGZWH309DtfMrEojRhR9Cc3w5IBrZrXhYWFt+vY5J3f7En3llAuv3NxN6CnzPjF5czfBcjxKwcysIs7hmplVxD1cM7OK9Hi8dcA1s/rwgw9mZhVxSsHMrCK9HnB7e5SxmQ0rI1SuFCFpoqQHJa2QNK3J/lMkLUnlx5IOyO1bJWmppMWSFrX7/dzDNbPa6HQPV9JI4EJgArAaWChpXkT8NFftYeADEfGUpKOBmcDBuf2HR8TjnWiPA66Z1UYXxuGOB1ZExEoASXOAScArATcifpyrfyewZ6cbMcApBTOrDUllyxRJi3JlSsMp9wAeya2vTtsG85fAf+XWA7hR0j1Nzl2ae7hmVhtlO7gRMZMsBTDoKZsd1vzaOpws4B6a23xIRKyRtCtwk6SfRcRt5Vr5Kvdwzaw2RkilSgGrgb1y63sCaxorSdof+AYwKSKeGNgeEWvS5zrgGrIUxSZzwDWz2iibUihgITBW0r6StgJOAuY1XHNv4Grg1Ij4eW77dpJ2GFgGjgKWtfP9nFIws9pQh580i4j1kqYCNwAjgVkRsVzS2Wn/DOALwC7A11MQXx8RBwG7AdekbVsAV0bE9e20xwHXzGqjG/PhRsR8YH7Dthm55bOAs5octxI4oHF7Oxxwzaw2ev1JMwdcM6sNz4drZlYR93DNzCrS4/HWAdfM6sMpBTOzioz0a9LNzKrR2/1bB1wzqxHfNDMzq4hzuGZmFXEP18ysIj0ebx1wzaw+nFIwM6uIUwpmZhVxD9fMrCI9Hm8dcM2sPpxSMDOryIgef9bMAdfMasM9XDOzinTjFTt14oBrZrXhHq6ZWUUccM3MKtLr43B7e7ZfMxtWpHKl2Dk1UdKDklZImtZkvySdn/YvkfTuoseW5YBrZrUhqVQpcL6RwIXA0cA4YLKkcQ3VjgbGpjIFuKjEsaUUCriS3iLpWkmPS1onaa6kt7RzYTOzRiOkUqWA8cCKiFgZES8Dc4BJDXUmAZdF5k5gJ0mjCx5b7vsVrHcl8F1gd2AM8D3gqsEqS5oiaZGkRVdcNrud9plZHxmhciUfa1KZ0nDKPYBHcuur07YidYocW0rRm2aKiMtz61dImjpY5YiYCcwEePQ3T0Yb7TOzPjIiyoWLfKwZRLNucONFBqtT5NhSigbcW1LCeE664InAdZJGAUTEk+00wswMgNjY6TOuBvbKre8JrClYZ6sCx5ZSNOCemD4/2rD9TLIA7HyumbUtNmzo9CkXAmMl7Qs8CpwEnNxQZx4wVdIc4GDgmYhYK+k3BY4tpVDAjYh927mImVkhJVMKrU8X61P68wZgJDArIpZLOjvtnwHMB44BVgAvAmcMdWw77SkUcCVtCXwMeH/adCtwcUT8rp2Lm5m9RudTCkTEfLKgmt82I7ccwDlFj21H0ZTCRcCWwNfT+qlp21mdaoiZWWzs7XvsRQPuH0XEAbn1H0i6vxsNMrM+1uGUQt0UDbgbJO0XEb+A7EEIoOPZbTPrb9GFlEKdFA24nyUbGrYyre9DSiybmXVMj/dwiz5pdgdwMbAxlYuBn3SrUWbWp2JjuTLMFO3hXgY8C3w5rU8GLgf+vBuNMrP+5Jtmmd9vuGl2i2+amVnHDcNeaxlFUwr3SXrvwIqkg8nSDGZmnRNRrgwzRXu4BwMfkfSrtL438ICkpWTjhvfvSuvMrK/EMAyiZRQNuBO72gozM4CNvZ1SKDqXwi+73RAzM/dwzcyq0uM3zRxwzaw+3MM1M6uGH+01M6uKH3wwM6uIe7hmZtXwKAUzs6o44JqZVcM3zczMquKbZmZm1YiNvf0iGQdcM6uPHk8pFJ2e0cys+zZGudIGSaMk3STpofS5c5M6e0m6RdIDkpZL+mRu33RJj0panMoxra7pgGtmtRERpUqbpgELImIssCCtN1oP/G1EvAN4L3COpHG5/edFxIGpzG91QQdcM6uPat9pNgmYnZZnA8e9rjkRayPi3rT8HPAAsMemXtAB18xqo2wPV9IUSYtyZUqJy+0WEWvTddcCuw5VWdI+wLuAu3Kbp0paImlWs5REI980M7P6KNlrjYiZwMzB9ku6Gdi9ya7PlbmOpO2B/wA+FRHPps0Xkb1YN9LnvwBnDnUeB1wzq48Oj8ONiCMH2yfpMUmjI2KtpNHAukHqbUkWbL8dEVfnzv1Yrs4lwPdbtccpBTOrjYiNpUqb5gGnpeXTgLmNFSQJ+CbwQER8rWHf6Nzq8cCyVhd0wDWz+qj2rb3nAhMkPQRMSOtIGiNpYMTBIcCpwB83Gf71z5KWSloCHA58utUFnVIws/qo8MGHiHgCOKLJ9jXAMWn5dkCDHH9q2Ws64JpZbYTnUjAzq4inZzQzq4anZzQzq4p7uO254o57u32JvjLvE5M3dxN6ygfPv2pzN6Gn3Dp9ansncA/XzKwavmlmZlYV93DNzCriHK6ZWTX8mnQzs6psdErBzKwS7uGamVXFN83MzKrh16SbmVXF43DNzCriHK6ZWTU8eY2ZWVXcwzUzq4Z7uGZmVfFNMzOziriHa2ZWDT9pZmZWlQp7uJJGAd8B9gFWAR+OiKea1FsFPAdsANZHxEFljs8b0anGm5m1bWOUK+2ZBiyIiLHAgrQ+mMMj4sCBYLsJxwMOuGZWIxFRqrRpEjA7Lc8Gjuv28Q64ZlYfsbFUkTRF0qJcmVLiartFxFqA9LnrYK0CbpR0T8P5ix7/Cudwzaw2ouR8uBExE5g52H5JNwO7N9n1uRKXOSQi1kjaFbhJ0s8i4rZSDU0ccM2sPjo8SiEijhxsn6THJI2OiLWSRgPrBjnHmvS5TtI1wHjgNqDQ8XlOKZhZfZRMKbRpHnBaWj4NmNtYQdJ2knYYWAaOApYVPb6RA66Z1UbFN83OBSZIegiYkNaRNEbS/FRnN+B2SfcDdwPXRcT1Qx0/FKcUzKw+KnynWUQ8ARzRZPsa4Ji0vBI4oMzxQ3HANbPa8JNmZmZV8VwKZmYVcQ/XzKwa4ekZzcwq4pSCmVk1/Jp0M7OqOIdrZlaNsnMpDDcOuGZWH+7hmplVxDfNzMyq4SfNzMyq4nG4ZmYVcUrBzKwavZ5SKDQfrqS3SLpW0uOS1kmaK+kt3W6cmfWZaicgr1zRCcivBL5L9m6gMcD3gKsGq5x/sdtdN17XfivNrC9UPAF55YoGXEXE5RGxPpUryN5k2VREzIyIgyLioIOP+pPOtNTMet/GjeXKMFM0h3uLpGnAHLJAeyJwnaRRABHxZJfaZ2b9ZBj2WssoGnBPTJ8fbdh+JlkAdj7XzNo2HNMEZRQKuBGxb7cbYmY2HG+ElVF4WJikPwDGAVsPbIuIy7rRKDPrU37wASR9ETiMLODOB44GbgcccM2sY6LHe7hFRymcQPY64F9HxBlkrw1+Q9daZWb9KaJcaYOkUZJukvRQ+ty5SZ3fl7Q4V56V9Km0b7qkR3P7jml1zaIB96XI/tOzXtKOwDp8o8zMOixiY6nSpmnAgogYCyxI6w3tiQcj4sCIOBB4D/AicE2uynkD+yNifqsLFg24iyTtBFwC3APcC9xd8Fgzs2Iq7OECk4DZaXk2cFyL+kcAv4iIX27qBYuOUvjrtDhD0vXAjhGxZFMvambWVMmbZpKmAFNym2ZGxMyCh+8WEWsBImKtpF1b1D+J1z9hO1XSR4BFwN9GxFNDnaDoXArHS3pjatgq4FeSjityrJlZUWVTCvmnWlN5TbCVdLOkZU3KpDLtkrQV8EGyaQ0GXATsBxwIrAX+pdV5ig4L+2JEvJK3iIin08iF/yx4vJlZax1+8CEijhxsn6THJI1OvdvRZPemBnM0cG9EPJY79yvLki4Bvt+qPUVzuM3qeWpHM+uo2LihVGnTPOC0tHwaMHeIupNpSCekID3geGBZqwuWuWn2NUn7pakazyO7eWZm1jnVTl5zLjBB0kPAhLSOpDGSXhlxIGnbtP/qhuP/WdJSSUuAw4FPt7pg0V7qx4F/AL6T1m8EPl/wWDOzQqqcSyEiniAbedC4fQ1wTG79RWCXJvVOLXvNlgFX0khg7lC5EDOzjujxJ81aBtyI2CDpRUlvjIhnqmiUmfUpzxYGwG+BpZJuAl4Y2BgRn+hKq8ysL4UnrwHgulTMzLqn31MKABExW9I2wN4R8WCX22Rm/arHUwpFnzQ7FlgMXJ/WD5Q0r4vtMrM+VPHkNZUrOg53OjAeeBogIhYDfguEmXVWtZPXVK5oDnd9RDwjKb9t+H1bM6u1GIZv4i2jaMBdJulkYKSkscAngB93r1lm1peGYa+1jKIphY8D7wT+G7gSeAb4VJfaZGb9qp9TCpK2Bs4G3gosBd4XEeuraJiZ9Z/heCOsjFYphdnA74AfkU1P9g7cszWzbunzBx/GRcQfAkj6Jn6tjpl1U5/3cH83sBAR6xtGKZiZdVSVs4VtDq0C7gGSnk3LArZJ6wIiInbsauvMrL/0cw83IkZW1RAzs37v4b5C0ruBQ8keeLg9Iu7rWqvMrD/1+E2zonMpfIFsxMIuwJuASyX5jQ9m1lmxsVwZZor2cCcD74qI3wJIOhe4F/inbjXMzPqPUwqZVcDWZBORA7wB+EU3GmRmfWwY9lrLKBpw/xtYnt74EGRvsLxd0vngNz+YWYf0eA63aMC9JpUBt3a+KWbW72Ljhs3dhK4q/MaHbjfEzKzKuRQk/TnZXN/vAMZHxKJB6k0E/g0YCXwjIs5N20cB3wH2IUu7fjginhrqmkOOUpC0VNKSwUqpb2dm1kq1s4UtA/4MuG2wCpJGAheSzSUzDpgsaVzaPQ1YEBFjgQVpfUiterh/mj7PSZ+Xp89TgBdbndzMrJQKc7gR8QBAiykLxgMrImJlqjsHmAT8NH0elurNJku1/v1QJ1ORYRiS7oiIQ1ptG84kTYmImZu7Hb3Cv2dn+fdsTtIUYEpu08yyv5OkW4HPNEspSDoBmBgRZ6X1U4GDI2KqpKcjYqdc3aciYuehrlV0AvLtJB2aO/EhwHYFjx0uprSuYiX49+ws/55NRMTMiDgoV14TbCXdLGlZkzKp4CWadX83uRtedJTCmcC3JL0xXewZ4IxNvaiZWRUi4sg2T7Ea2Cu3viewJi0/Jml0RKyVNBpY1+pkRQPuYWQ5iu2BF8iC7rslRXqDr5lZL1oIjJW0L/AocBJwcto3DzgNODd9zm11sqIphYPIXrWzIzCG7H9vDgMukfR3JRpfZ86PdZZ/z87y79lhko6XtBp4H3CdpBvS9jGS5kM2DzgwFbgBeAD4bkQsT6c4F5gg6SGyh8HObXnNgjfNbgA+FBHPp/XtgX8HjgfuiYhxQx1vZmbFe7h7Ay/n1n8HvDkiXiJ77NfMzFoomsO9ErhT0kCO4ljgKknbkY1HMzOzFgqlFAAkvYdsAnKRTUDe9DG4zUXS8xGxvaQxwPkRcULafhXwTuBbEXHeIMdOB/4K+E1u82ER8XR3W239QNI+wPcj4g9a1NsALCXrCD0AnBYRfsCohxQOuHU3EHAbtu0O3BURb25x7HTg+Yj4v11sovWpEgH3lb9hSd8muz/ytQLn3yLd3LGaK5rDHTYk7SNpWVq9EdhV0mJJ/1PSfpKul3SPpB9JenuLc/2NpFlp+Q/TgOltJU2XdLmkH0h6SNJfdft71U36nX8maXaaW+Pf029zhKT70jwcsyS9IdVfJekrku5O5a2b+zt0iqR/SL/FTZKukvQZSe+RdL+kn/Dqo/FIOl3S3PR3+KCkLw5y2h8Bb5W0XfodF6bfdVLuPN+TdC1wo6TtJX0rN//Jh1K9iyQtkrRc0pdy7ejZf49ai4ieKGQ9VMhm7lnWuJzWFwBj0/LBwA/S8nSyMXaLU7klbR9BNrHF8cAi4JBc/fuBbcheOfQIMGZz/wYV/977kI3HHvhNZgGfT7/F29K2y4BPpeVVwOfS8kfIenyb/Xt04Hc4KP3NbAPsADwEfAZYAnwg1flq7m/ydGAt2euqtiGbQOWghr/hLcjGdH4M+D/AX6TtOwE/J3vK83SyQfmj0r6vAP+aa9fO6XNg/0iyZ/337+V/j7qXnuvhDiYNZfsfwPckLQYuBkbnqpwXEQemcjhAZHPFnU42ac8PI+KOXP25EfFSRDwO3EI2yUW/eST3m1wBHAE8HBE/T9tmA+/P1b8q9/m+aprYdYfy6t/Cc8C1ZAFxp4j4YapzecMxN0XEE5GN8rk6nQNgm/S3uQj4FfBN4ChgWtp+K9mbV/bOnefJtHwk2axWAMSr0wR+WNK9wH1k9zLyQzh78d+j1gq/tbcHjACejogDSx43Fnie7IGPvMbkd28kw8sp+51jkOXhrNmz9gNPYw5msL+dlxr/PiWJbAz8gw3bD07XybcjGursS9bb/qOIeErSpWQBu1k7euXfo9b6pocbEc8CDyubdBhlDhjqmDR3xL+R9dJ2UTZz0IBJkraWtAvZU3cLu9PyWttb0kDPaDJwM7BPLh94KvDDXP0Tc58/qaaJXXc7cGz6W9ge+JO0/Rm9OuHTKQ3HTJA0StI2wHHAHQzuBuDjKfAi6V2D1LuR7IkoUr2dyZ4MfSG1ZTeyOV3zevHfo9b6qYcL2R/+Rcpe8b4lMIcsFwvwaUl/kat7HPAF4OsR8XNJfwncImlgsuK7gevI/vfuyxGxhv7zAHCapIvJcpefBO4kS9tsQfYfoRm5+m+QdBfZf+gnV93YboiIhZLmkf0d/ZIsHTAwudMsSS+SBc2828nSDG8Froyhh1h+GfhXYEkKuqt4dZ7qvH8CLkw3jDcAX4qIqyXdBywHVvL6wN5z/x511zPDwqrkYWTFhzrl6q8iuzn0eDfbtTlI2j4inpe0LdlN1ikRce8gdU8n+x2mNttflV7+96izfuvhmnXDTGWvXdkamD1YsDVzD9fMrCJ9c9PMzGxzc8A1M6uIA66ZWUUccM3MKuKAa2ZWkf8PWxwzygJ14sIAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(5, 4))\n", "ax = sns.heatmap(gap_df[['lifeExp','pop','gdpPercap']].corr(), vmin=-1, vmax=1, \n", " cmap=sns.diverging_palette(20, 220, as_cmap=True),\n", " ax=ax)\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Alternatives\n", "\n", "- Instead of correlation of the values, if we wanted to work with ranks:\n", " * Spearman’s $\\rho$ \n", " * Kendall’s $\\tau$ \n", " \n", "Recall: We earlier encountered the idea of ranking based correlation in Module 2!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Sampling\n", "\n", "\"Sampling\"
\n", "\n", "- Underlying unknown distribution (left)\n", "- Empirical distribution of the available sample (right)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAChCAYAAABkr2xhAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAALpElEQVR4nO3d3Y9dVR3G8WdK3yjltVIoBVr9IRZCCEhpgWJVCgXFI8NAWwoNAREkpWgxRVREDdVIBGWiiVEw8cKXxGi4WfEKG0SiF5gY9cJo4vkDvDEhGvXCOF7sPbadTs+cc2bv/Vsv30/SG0r3fjrNPN39nb3WmpiZmREAoBtLvAMAQEkoXQDoEKULAB2idAGgQ5QuAHSI0gWADlG6ANAhShcAOkTpAkCHKF0A6BClCwAdonQBoENLvQPkIJitlHSNpLMk/bnX7//FNxGAWE2wy9j4gtlySU9KOqSqcGe9KemTvX7/Vx65gLYdmZ6ekaQdBw9OeGdJDeOFMQWzcyW9LulLOr5wJWmLpDeC2aGucwGIG6U7hmC2RtIvJF034H+bkPR8MPtcJ6EAJIHSHVEwWybpFUmXD/lLDgez+1qMBCAhlO7ovixp+4i/5qVgdlkbYQCkhdIdQTB7j6oPzUa1StL3gxlviwCFo3SHFMxWSHpZ1ax2HNdI+kRziQCkiNId3kFJ71rkNb4YzNY1kAVAoijdIQSztZKebuBSqyUdbuA6ABJF6Q7ns5JOb+haDwazTQ1dC0BiKN0FBLP1kh5t8JJLJH2hwesBSAilu7AnJa1o+Jp7eNoFykTpDhDM3ibpkRYuPSHpUy1cF0DkKN3B9ks6taVr7wtmF7R0bQCRonRPot6u8bEWb7FMVakDKAile3J7JK1t+R6P1uUOoBCU7skd6OAea1SVO4BCULrzCGabJW3u6HaMGICCsAHL/Jp8L3chW4LZVb1+/3cd3hNY0OzpELM4JaIZPOnOEcxOl3RPx7f9aMf3A+CE0j3RHkmndXzPfcGsrVfTAESE0j3Rgw73PFPSnQ73BdAxSvcYwexSSTc43f4Bp/sC6BCle7z7He99c725DoCMUbq1YLZE0j7HCBOSOMASyByle9Q2SRucM1C6QOYo3aPu9Q4g6cpgdoV3CADtoXQlBbPlknZ756jt9Q4AoD2UbuVmSed4h6jtDWas/AEyRelWul6BNsjbJV3rHQJAO4ov3XprxUnvHHOw8xiQqeJLV9Ktau6k36bsrl9hA5AZvrHj+QDtWBdK2uodAkDzii7derTQ885xEru8A6A8c7dzRPOKLl1JOxXfaGHW3bzFAOSn9NK9yzvAABepu9MrAHSk2NKtF0R82DvHAu72DgCgWcWWrqSbJJ3lHWIBU4wYgLyUXLpT3gGGcIkk9mIAMlJk6QazUyTd4Z1jSDHPnQGMqMjSVXU6xFrvEEPiGB804sj09MzcV8Lm+2/jXnux1yhFqaWbUpFdGcze4R0CQDOKK936g6lJ7xwjSukvCQADFFe6kq5UtZNXSia9AwBoxlLvAA4mvQOMYVswO6/X7//VOwjS0/W8dfZ+Ow4e5HXHeZT4pDvpHWAME4p3jwgAIyiqdIPZRklXOccYVyqvuAEYoKjSVdrFdUswW+0dAsDilDbTnfQOsAgrJN0m6afeQZC3k82Aj0xPzwya0/Ku7nCKedINZmskbffOsUgpP6kDUEGlK+l2pf/7vT2YLfMOAWB8qZfQKHJ4Sjxb0o3eIQCMr4iZbjA7VdU8NAeTkl7zDoH4dTFjHXSPY39unHd2F5ohp6qUJ90dklZ5h2jIHeyxC6SrlNLNYbQwa4OqpcwAEpT9eKHeOzf2Y3lGNSnp994hkK75tnj0ylKaEp50tyqdvXOHldOTO1CUEkp30jtAC64OZhu8QwAYXdalW3/glOtTYa6/LyBrWZeupE2SLvUO0RJKF//X1LE7aF/upTvpHaBF7w1m53iHADAaSjddp6ha2gwgIdmWbjBbL2mLd46WcXYakJic39MtYeZ5azBb1ev3/+kdBHFo+6icxc6N5/76HJf5LiTbJ12V8RS4StIt3iEADC/L0q0/YHq/d46OTHkHADC8LEtX0odUfdBUgh577ALpyLV0SxgtzDpb6Z+IARQju9INZqcpn71zh3WXdwAAw8mudFUV7krvEB2bDGY5/lkC2cnxG7XEp751kq73DoFujfv6lsdyYZYpH5VV6QazlZJ63jmc3O0dAMDCsipdSTslrfYO4WSKY3yA+OVWuiWOFmZdLOla7xAABsumdIPZcpWx9HcQRgxISolz3mxKV9Vy2DO9QzjbxYgBiFtOpbvbO0AENooRAxC1LHYZC2YrlPfeuaPYLelN7xDwldI/20vbeSyXJ93bJJ3hHSISu1koAcQrl2/OPd4BInKRWCgBRCv50q33Wij9rYW57vEOAGB+yZeuqhVoq7xDRGZXMMtiXo+jWEqbhxxK917vABE6T9JN3iEAnCjp0g1ma1TeNo7Dus87AIATJV26knZJ4tSE+U0FM8YuQGRSL9193gEitlp8wIjE5TjHTrZ0g5lJ2uadI3L3ewcAcLxkS1cUyjB2BrN13iEAHJXka0X1iitKd2FLVH2g9oJ3EIwvt39eL8bs1yLlpcKpPuluV7W5Cxb2IDuPAfFItXQf8g6QkMslbfEOAaCSXOkGs7PEZt2j4i8pIBLJla6qFWilHbG+WHuDWalnxyExw7wmlvKcO6nSrWeTD3vnSNBqsQkOEIWkSlfVqQhXeYdI1Me8AwBIr3T3ewdI2OZgttk7BFC6ZN7TrTe3YbPyxdkv6SPeIdCclGebo8jp95nSk+5D4gO0xdpb/+UFwEkSpVtvyM1oYfFWig8iAVdJlK6qk343eIfIxIFgxnaYgJNUSvcJ7wAZWS8WlwBuoi/dYHadpBu8c2TmEPsxAD6iL11JT3kHyNC7xRlqgIuoSzeYXa5qnovmfcY7QKlyPA0Bw4u6dCV92jtAxnYEs63eIYDSRFu6wewScaJt2z7vHQAoTbSlK+kZxZ0vBx/kaRfoVpTLgIPZZeKk364clrTTOwSGxzw4bbE+SR5WvNlyc0swe793CKAU0RVbMLte0l3eOQrz1fqwTwAti2q8UL+w/zXvHAXaLGmvpB96BynV3JHB7Gm3jBLyE9vTzV5J13uHKNRzwew07xBA7qIp3WB2uqTnvXMU7EJJT3uHAHIXTelKelbSBd4hCneoXgUIoCVRzHTrY2Q+7p0DWibppWC2vdfv/9c7TMmY5Y5u9ms2Ow+PlfuTbjBbIel7MWSBJGmbpAPeIYBcxVB0z0q6wjsEjvNcMNvkHQLIkWvpBrObJD3pmQHzOlXSj+p/hQBokNtMN5idr+q90KjnLwW7WtILkh73DpIL5rTdi3HO6/KkG8yWS/qxpPM97o+hHQhm7IEBNKjz0q1XnX1T0vau742xvMxOZEBzPJ50n5L0iMN9MZ6VkkK9vzEaxAkSzZj7dRz3a9rVn0WnpRvMHpb0lS7viUacK+nVYLbeOwiQus5KN5g9JOk7Xd0Pjdso6bVgdqF3ECBlnZRuMHtC0nfFmwqpe6ekN4LZpd5BgFRNzMy0N8YIZkslvShWOOXmb5Kmev3+695BUsHs1td8r4zNfZ3s2D+jNl8xa+1JN5itk/SqKNwcnSPp58HsifptFABDaqV0g9mUpD9Iel8b10cUlkr6uqSf8QEbMLxGV6QFsw2qvhGnmrwuovYBSX8MZk9L+nav3/+PdyAgZo2UbjBbq2oPhcclsV6/PGeoWvDyWDB7RtIruW4NeWR6embuvC/GpaY4ubnz9a7n7Ysq3WB2haT9kh5QtUkKyrZJ0k8k/SmYfUPSD3r9/t+dMwFRGbl0g9lGSXeqOs/s2qYDIQubJH1L0gvB7BVVRfxqr9//l28swN/A0q0/mb5Y0lZJN0raIYnjXDCsVZL21T/+Hcx+Kek1Sb+W9Ntev/8Pz3CjGHaZ6clO9UX6mhojLfSke4aqNxx+U/94cTE3A46xRlIypQs0ZWDp9vr9tyS91VEWAMheDMf1AEAxojgNGPA034x2vqWh41wHcRh1HrvY7SEH3YcnXQDoEKULAB2idAGgQ61u7QgAOB5PugDQIUoXADpE6QJAhyhdAOgQpQsAHaJ0AaBD/wM0zDOpU7EyFwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from scipy import stats # https://docs.scipy.org/doc/scipy/reference/tutorial/stats.html\n", "np.random.seed(seed=5)\n", "x = np.linspace(-3, 3, 300) # Return evenly spaced numbers over the specified interval\n", "xsample = stats.norm.rvs(size=1000) # generate 1000 random variates for 'norm'al distribution\n", "fig, axes = plt.subplots(ncols=2, figsize=(6, 2.7))\n", "ax = axes[0]\n", "ax.fill(x, stats.norm.pdf(x),'firebrick') # Probability Density Function\n", "ax.set_axis_off()\n", "ax.set_xlim(-3, 3)\n", "ax = axes[1]\n", "ax.hist(xsample, bins=100,color='rosybrown') # Histogram of the random variate samples\n", "ax.set_axis_off()\n", "ax.set_xlim(-3, 3)\n", "ax.set_position\n", "# plt.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=0, hspace=0)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Big data & sampling\n", "\n", "- Now that we have scalable Big data processing tools, why bother with samples? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* quality of the data collection may vary\n", " - e.g., in an IoT set-up, sensors may have different degrees of defectiveness" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* the data may not be representative\n", " - e.g., using social media data to gauge the nation's sentiment \n", " * are all sections of the society rightly represented on social media? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* time and effort spent on random sampling may help reduces bias and make it easier to explore\n", " - visualize and manually interpret information (including missing data/outliers) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### (Uniform) random sampling\n", "\n", "- each available member of the population being sampled has an equal chance of being chosen\n", " * with replacement: observations are put back in the population after each draw \n", " * for possible future reselection\n", " * without replacement\n", " * once selected, unavailable for future draws" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Stratified sampling: Dividing the population into strata and randomly sampling from each strata.\n", "- Without stratification\n", " * a.k.a. simple random sampling" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "2020 US presidential election | FiveThirtyEight projection\n", ":-------------------------:|:-------------------------:\n", "\"Sampling\" | \"Sampling\"\n", "Screenshots from https://projects.fivethirtyeight.com/2020-election-forecast/" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Bias \n", "\n", "- Not always obvious how to get a good random sample\n", " * e.g., customer behavior: time of the day, day of the week, period of the year\n", " * Anecdotes: The literary digest presidential poll of 1936\n", " * Polled ten million individuals, of whom 2.27 million responded\n", " * The magnitude of the magazine's error: 39.08% for the popular vote for Roosevelt v Landon\n", " * Gallup correctly predicted the results using a much smaller sample size of just 50,000\n", " * Contemporary claim to fame Nate Silver & FiveThirtyEight \n", " * \"balance out the polls with comparative demographic data\"" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- **Selection bias**: selectively choosing data—consciously or unconsciously—in a way that leads to a conclusion that is misleading or ephemeral \n", " * e.g., Regression to the mean: Is your coin really 'lucky'?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- **Data snooping**: extensive hunting through data in search of something interesting\n", " * e.g., Crowd-sourced coin tossing to find a `lucky coin' \n", "- **Vast search effect**: Bias or nonreproducibility resulting from repeated data modeling, or modeling data with large numbers of predictor variables.\n", " * Beware the promise of Big Data: You can run many models and ask many questions. But is it really a needly that you find in the haystack?\n", " * Mitigations: Holdout set, Target shuffling" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Methodology check-point\n", "\n", "\"Statistical\n", "\n", "- Specify hypothesis first, then design experiment and accordingly collect data following randomization principles to avoid falling into the traps of biases\n", "- Traps of biases resulting from the data collection/analysis process:\n", " * repeated running of models in data mining & data snooping\n", " * after-the-fact selection of interesting events" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Sampling distribution of a statistic\n", "\n", "**Sampling distribution**: *distribution of some sample statistic* over many samples drawn from the same population.\n", "\n", "**Standard error**: The variability (standard deviation) of a sample statistic over many samples (not to be confused with standard deviation, which by itself, refers to variability of\n", "individual data values)\n", "\n", "**Central Limit Theorem**: The tendency of the sampling distribution to take on a *normal shape* as the sample size rises." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# Central Limit Theorem in action - an example\n", "\n", "loans_income = pd.read_csv(practicalstatspath+'loans_income.csv', squeeze=True)\n", "\n", "sample_data = pd.DataFrame({\n", " 'income': loans_income.sample(1000),\n", " 'type': 'Data',\n", "})\n", "\n", "sample_mean_05 = pd.DataFrame({\n", " 'income': [loans_income.sample(5).mean() for _ in range(1000)],\n", " 'type': 'Mean of 5',\n", "})\n", "\n", "sample_mean_10 = pd.DataFrame({\n", " 'income': [loans_income.sample(10).mean() for _ in range(1000)],\n", " 'type': 'Mean of 10',\n", "})\n", "\n", "sample_mean_20 = pd.DataFrame({\n", " 'income': [loans_income.sample(20).mean() for _ in range(1000)],\n", " 'type': 'Mean of 20',\n", "})\n", "\n", "results = pd.concat([sample_data, sample_mean_05, sample_mean_10, sample_mean_20])\n", "#print(results.sample(10))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA1gAAADXCAYAAAAHkfA4AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAivElEQVR4nO3deZgsZXn38e+PRXAhLHJEZPHoEV8DJoLvEfeEXGpEjMEkihA0aEhIIsa4voImigsJMUYx7qiIGERwIYKiuOEWF0TCIgrKAZQjJyyKCFGJHO73j3pGmrFnpuec7pnume/nuuqa6qeeqrq7uu+Zubuqnk5VIUmSJEnaeJssdgCSJEmStFRYYEmSJEnSkFhgSZIkSdKQWGBJkiRJ0pBYYEmSJEnSkFhgSZIkSdKQWGBJWjKSXJxknxHvo5Lcr82/Pck/DLDOzUnuO8q4xlWSK5M8dgPW2+BjNsz3QZKjkvz7MLa1mJK8NMm7xiCOkeeoJC22zRY7AEkaRJKzgK9X1cunte8PvAPYuar2WMiYquqvB+x3t1HFkOT+wNHA7wGbA98HTgDeWFXrN2K7K4ErgM2r6taNj7TvPk4A1lbV309ftjHHrPd9kOQo4H5V9fQZYri55+FdgFuAqeP2Vxsaw7ipqn9c6H32e30XOkclaTF4BkvSpDgBeEaSTGt/BnDSqIqAcZZkFfB14Crgt6pqa+CpwGpgqwXY/8R/SFdVd5uagB8AT+ppO2mh4lgKx1KS1LHAkjQp/gPYDnj0VEOSbYE/AE5sj391OVqSvZOcm+SnSa5J8vrWvk+Stb0b7rPeV5P8JMm6JG9Ocqd+ASU5Iclr2vwZ7bK2qem2JM9sy3ovKzwhyVuSfDzJTUm+3gqlqW3+fpJLk9yY5K1JvpDkL2Y4Jq8EvlJVL6iqdQBVdWlV/WlV/aRt72FJvtKezwW9l2cl+XySVyf5zxbLp5Js3xZ/sf38SXs+D0/yzNb3DUl+DByVZFWSzyX5UZLrk5yUZJsZ4h1Yn2P21iSfaLH8Z5J7Jjk2yQ1JLkmyV8+6VyZ5bJJ9gZcCT2vrXbCB4dwpyYntGF2cZHXPvu6V5MNJrktyRZLn9izbosV4dZuOTbJFW7ZPkrVJXpLkv4H3JNkkyRFJ1rTjeWqS7Vr/le2YPCvJVe15/3WShyS5sL2+b57leP7qUseebR2S5AftdXvZLOs+Mcl/tVy6Kt1Zwd7lj+p5j13V3ieHAQcD/68d+zNa395cG+T4vDDJtely8VnzfeEkaTFYYEmaCFX1c+BU4M96mg8ALqmqfv84v5HuMrnfAFa1dQexHng+sD3wcOAxwLMHiO9JPWdCngL8N/DZGbofRFccbQtcRneJH624+RBwJHB34FLgEbPs9rGtf19JdgI+DryGrjh9EfDhJCt6uv0p8CzgHsCdWh+A32k/t2nP66vt8UOBy1v/o4EA/wTcC/hNYBfgqFli3lAHAH9P97rcAnwVOK89/hDw+ukrVNUngX8ETmnP4UEbuO8/BD4AbAOcDrwZIMkmwBnABcBOdO+V5yV5fFvvZcDDgD2BBwF7t+cw5Z50r8u9gcOA5wJPBn6X7njeALxlWiwPBXYDngYc2/bxWGAP4IAkvzuP5/Uo4P+0uF+e5Ddn6Pc/dHm3DfBE4G+SPBkgya7AJ4A3ASvacz2/qo4DTgJe2479k/psd5DjszXdsT0UeEu6D1UkaaxZYEmaJO8Fnprkzu3xn7W2fn4J3C/J9lV1c1V9bZAdVNU3q+prVXVrVV1Jd3/XwP+0prsn6kTgaVV11QzdPlJV57TLGk+i+wcTYD/g4qr6SFv2b3SF2kzuDqybZfnTgTOr6syquq2qPg2c2/Yz5T1V9d2eAnbPPtvpdXVVvakdn59X1WVV9emquqWqrqMrdObzT/6gTmuvzS+A04BfVNWJ7T6zU4C9Zl99o3y5HcP1wPvoigGAhwArqupVVfW/VXU58E7gwLb8YOBVVXVtOzavpLukdcptwCvasfs53T1fL6uqtVV1C12h+pTc8fLBV1fVL6rqU3SFz8lt+z8EvsT8jsMr22t4AV2R2LcArarPV9VF7T10IXAyt7/GBwOfqaqTq+qXVfWjqjp/wP3PdXx+2Zb/sqrOBG6mKwglaax5zbekiVFVX05yHbB/knPo/sH94xm6Hwq8CrgkyRV0/0x+bK59tALp9XT3Md2F7vfkNweJL8nWwEeBf6iqL83Stbdo+hkwNaDDvejupwKgqirTLmec5kfAjrMsvzddQdp79mBz4OwBYpnJHYrGJPegKwQfTXff1yZ0Z16G7Zqe+Z/3eTyygUT49WO0ZSt67g3cK8lPepZvSlfoQPd6fr9n2fdb25TrWsE45d7AaUlu62lbD+zQ83iYx2Gg1z7JQ4FjgAfSneXcAvhgW7wLsGYe++w11/H50bR7Kwd5f0rSovMMlqRJcyLdmatnAJ+qqmv6daqq71XVQXSXsv0z8KEkd6X71P8uU/2SbEp3adOUtwGXALu1ywtfSncZ3Kza5WLvB86uqndsyBOjOxu1c8820/u4j88AfzLL8quA91XVNj3TXavqmAFiqQHb/6m1/XY7Xk9ngOO1gGZ6HsNwFXDFtOO7VVVNnSG8mq5omrJra5sptquAJ0zb3pbt7NRiej/dpZG7tIFU3s7tr/FVdJfg9jPXsZ/r+EjSRLLAkjRpTqS75+QvmfnyQJI8PcmKqroN+ElrXg98l+4MxBOTbE53z8cWPatuBfwUuDnJA4C/GTCuo4G7An83j+cy3ceB30ry5HaG5HC6+1Bm8grgEUn+Jck9AZLcL8m/pxto4t+BJyV5fJJNk2zZBg+YrWibch3dJWxzfRfVVnSXbv2k3fP14gG23Wsqrqmp74AiG+EaYGUrgIftHOCnbaCKO7dj/MAkD2nLTwb+PsmKdn/dy+lek5m8HTg6yb0B2nr7jyDu+doK+HFV/SLJ3nT37U05CXhskgOSbJbk7kn2bMuuYfb3z3yPjyRNBAssSROl3Rf1Fbpi5vRZuu4LXJzue47eCBzY7l25kW7QincBP6Q7o9V7Gd6L6P6BvInufppTBgztILob9m/I7SMJHjzwEwOq6nq6YdZfS3f53+5090zdMkP/NXQDcayke643Ah9u69zU7gHbn+4s3HV0ZxtezAC/+6vqZ3RF43+20eEeNkPXVwIPBm6kKxA/Mshz7XEE3aVtU9Pn5rn+XKYuZftRkvOGueF2T9aT6O5buwK4nu59tXXr8hq61+JC4CK6QTleM8sm30j3nv5UkpuAr9ENarHYng28qsX0cnoGjKmqH9Dd0/dC4MfA+dx+L9e7gd3b++c/+mx3vsdHkiZCqkZ59YQkaUO1sy5rgYOr6uy5+kuSpMXnGSxJGiPtcr5t2vcBTd3/NdAIiJIkafFZYEnSeHk43ahs19NdfvbkNoS3JEmaAF4iKEmSJElD4hksSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSC6xlKsn6JOcnuTjJBUlekGTW90OSlUn+dKFilEYhSSV5X8/jzZJcl+RjixTPA1ou/leSVdOWfT7JpW35+UnusRgxanmZsBw5OslVSW6e1r5FklOSXJbk60lWLmjQWjYmJV+S3CXJx5Nc0v73O6ZnmfkyZBZYy9fPq2rPqtoDeBywH/CKOdZZCVhgadL9D/DAJHdujx8H/HAR43ky8NGq2quq1vRZfnDL1T2r6toFjk3L0yTlyBnA3n3WORS4oaruB7wB+OfRhqhlbJLy5XVV9QBgL+CRSZ7Q2s2XIbPAEu2ftsOA56SzMsmXkpzXpke0rscAj26fjDx/ln7SuPsE8MQ2fxBw8tSCJHdNcnySb7RPAPdv7X3f70n2aWeaPtQ+GTwpSabvMMmeSb6W5MIkpyXZNsl+wPOAv0hy9qiftDQPE5EjVfW1qlrXJ/79gfe2+Q8Bj+m3T2lIxj5fqupnVXV2m/9f4Dxg57bYfBm2qnJahhNwc5+2G4AdgLsAW7a23YBz2/w+wMd6+vft5+Q0zhNwM/DbdH9EtgTO731vA/8IPL3NbwN8F7jrHHlxI90fqk2ArwKP6rPfC4HfbfOvAo5t80cBL5oh1s8DF7UY/wHIYh8/p6U/TVKO9MY87fG3gJ17Hq8Btl/sY+u09KYJzZdtgMuB+7bH5suQp82Qbjf1acXmwJuT7AmsB+4/Q/9B+0ljpaoubNeYHwScOW3x7wN/mORF7fGWwK7A1cz8fj+nqtYCJDmf7nLaL08tTLI1sE1VfaE1vRf44AChHlxVP0yyFfBh4BnAiYM9S2nDTVCOzKTfp++1EduTZjRJ+ZJkM7ozbP9WVZdPNfd7WoNsT/1ZYAmAJPelS/Br6e7FugZ4EN2nJ7+YYbXnD9hPGkenA6+j+7Tw7j3tAf6kqi7t7ZzkKGZ+v9/SM7+eIf1uraoftp83JXk/3b0mFlhaKGOfI7NYC+wCrG3/UG4N/HjE+9TyNin5chzwvao6tqfNfBky78ESSVYAbwfeXN254a2BdVV1G90n5pu2rjcBW/WsOlM/aRIcD7yqqi6a1n4W8LdT158n2au1b/D7vapuBG5I8ujW9AzgC7OsMjUS1fZtfnPgD+gu45AWyljnyBxOBw5p808BPtf+vkmjMvb5kuQ1bb/Pm7bIfBkyC6zl685pw7QDnwE+BbyyLXsrcEiSr9Gdsv6f1n4hcGu6Yd2fP0s/aexV1dqqemOfRa+mu/z1wiTfao9h49/vhwD/kuRCYE+6a+ZnswVwVut/Pt2oVO+c5z6lDTYBOUKS1yZZC9wlydp2VgDg3cDdk1wGvAA4Yp6xSPMy7vmSZGfgZcDuwHntf8C/aIvNlyGLBaokSZIkDYdnsCRJkiRpSCywJEmSJGlILLAkSZIkaUgssCRJkiRpSCywJEmSJGlIJrrA2nfffYvum6adnCZ9GjnzxWkJTQvCnHFaQtPImS9OS2jaaBNdYF1//fWLHYI0McwXaX7MGWlw5ot0u4kusCRJkiRpnFhgSZIkSdKQWGBJkiRJ0pBYYEmSJEnSkGy22AFIkiRJk2bNmjV3eLxq1apFikTjxjNYkiRJkjQkFliSJEmSNCQWWJIkSZI0JCMrsJLskuTsJN9JcnGSv2vtRyX5YZLz27RfzzpHJrksyaVJHj+q2CRJkiRpFEY5yMWtwAur6rwkWwHfTPLptuwNVfW63s5JdgcOBPYA7gV8Jsn9q2r9CGOUJEmSpKEZ2RmsqlpXVee1+ZuA7wA7zbLK/sAHquqWqroCuAzYe1TxSZIkSdKwLcg9WElWAnsBX29Nz0lyYZLjk2zb2nYCrupZbS2zF2SSJEmSNFZGXmAluRvwYeB5VfVT4G3AKmBPYB3wr1Nd+6xefbZ3WJJzk5x73XXXjSZoaYkwX6T5MWekwZkvUn8jLbCSbE5XXJ1UVR8BqKprqmp9Vd0GvJPbLwNcC+zSs/rOwNXTt1lVx1XV6qpavWLFilGGL00880WaH3NGGpz5IvU3ylEEA7wb+E5Vvb6nfceebn8EfKvNnw4cmGSLJPcBdgPOGVV8kiRJkjRsoxxF8JHAM4CLkpzf2l4KHJRkT7rL/64E/gqgqi5OcirwbboRCA93BEFJkiRJk2RkBVZVfZn+91WdOcs6RwNHjyomSZIkSRqlBRlFUJIkSZKWAwssSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSCyxJkiRJGhILLEmSJEkaEgssSZIkSRoSCyxJkiRJGhILLEmSJEkaks1GteEkuwAnAvcEbgOOq6o3JtkOOAVYCVwJHFBVN7R1jgQOBdYDz62qs0YVnyRJkjSoNWvWLHYImhAjK7CAW4EXVtV5SbYCvpnk08Azgc9W1TFJjgCOAF6SZHfgQGAP4F7AZ5Lcv6rWjzBGSZIkaaNNL8BWrVq1SJFosY2swKqqdcC6Nn9Tku8AOwH7A/u0bu8FPg+8pLV/oKpuAa5IchmwN/DVUcUoSZJu5z+IkrTxFuQerCQrgb2ArwM7tOJrqgi7R+u2E3BVz2prW5skSZIkTYSRF1hJ7gZ8GHheVf10tq592qrP9g5Lcm6Sc6+77rphhSktSeaLND/mjDQ480Xqb6QFVpLN6Yqrk6rqI635miQ7tuU7Ate29rXALj2r7wxcPX2bVXVcVa2uqtUrVqwYXfDSEmC+SPNjzkiDM1+k/kZWYCUJ8G7gO1X1+p5FpwOHtPlDgI/2tB+YZIsk9wF2A84ZVXySJEmSNGyjHEXwkcAzgIuSnN/aXgocA5ya5FDgB8BTAarq4iSnAt+mG4HwcEcQlCRJkjRJRjmK4Jfpf18VwGNmWOdo4OhRxSRJkiRJo7QgowhKkiRJ0nJggSVJkiRJQ2KBJUmSJElDYoElSZIkSUNigSVJkiRJQ2KBJUmSJElDYoElSZIkSUMyyi8aliRJY2zNmjWLHYIkLTmewZIkSZKkIRmowEryyEHaJEmSJGk5G/QM1psGbJMkSZKkZWvWe7CSPBx4BLAiyQt6Fv0GsOkoA5MkSZKkSTPXIBd3Au7W+m3V0/5T4CmjCkqSJEmSJtGsBVZVfQH4QpITqur7CxSTJEmSJE2kQe/B2iLJcUk+leRzU9NsKyQ5Psm1Sb7V03ZUkh8mOb9N+/UsOzLJZUkuTfL4DXw+kiRJkrRoBv0erA8CbwfeBawfcJ0TgDcDJ05rf0NVva63IcnuwIHAHsC9gM8kuX9VDbovSZIkSVp0gxZYt1bV2+az4ar6YpKVA3bfH/hAVd0CXJHkMmBv4Kvz2ackSRqe6V9EvGrVqkWKRJImx6CXCJ6R5NlJdkyy3dS0gft8TpIL2yWE27a2nYCrevqsbW2SJEmSNDEGPYN1SPv54p62Au47z/29DXh1W/fVwL8Cfw6kT9/qt4EkhwGHAey6667z3L20vJgv0vyYM9LgzJfZeQZ4+RqowKqq+wxjZ1V1zdR8kncCH2sP1wK79HTdGbh6hm0cBxwHsHr16r5FmKTOcs+X6X/cwD9wmt1yzxlpPswXqb+BCqwkf9avvaqmD2Ax13Z2rKp17eEfAVMjDJ4OvD/J6+kGudgNOGc+25YkSZKkxTboJYIP6ZnfEngMcB6/PkLgryQ5GdgH2D7JWuAVwD5J9qS7/O9K4K8AquriJKcC3wZuBQ53BEFJ89XvjJUkSdJCGvQSwb/tfZxka+B9c6xzUJ/md8/S/2jg6EHikSRJkqRxNOgogtP9jO4yPkmSJElSM+g9WGdw+6h+mwK/CZw6qqAkaVQc1UmSJI3SoPdgva5n/lbg+1W1dgTxSJIkSdLEGvQerC8k2YHbB7v43uhCkiRJ48gzwJI0t4HuwUpyAN2w6U8FDgC+nuQpowxMkiRJkibNoJcIvgx4SFVdC5BkBfAZ4EOjCkySJEmSJs2gowhuMlVcNT+ax7qSJEmStCwMegbrk0nOAk5uj58GnDmakCRJ0rD5RdyStDBmLbCS3A/YoapenOSPgUcBAb4KnLQA8UmSJEnSxJjrMr9jgZsAquojVfWCqno+3dmrY0cbmiRJkiRNlrkuEVxZVRdOb6yqc5OsHE1IkjQYL3mSJEnjZq4zWFvOsuzOwwxEkiRJkibdXAXWN5L85fTGJIcC3xxNSJIkSZI0mea6RPB5wGlJDub2gmo1cCfgj2ZbMcnxwB8A11bVA1vbdsApwErgSuCAqrqhLTsSOBRYDzy3qs6a/9ORJEmSpMUz6xmsqrqmqh4BvJKuILoSeGVVPbyq/nuObZ8A7Dut7Qjgs1W1G/DZ9pgkuwMHAnu0dd6aZNN5PRNJkiRJWmQDfQ9WVZ0NnD2fDVfVF/sMhLE/sE+bfy/weeAlrf0DVXULcEWSy4C96YaDlyRJkqSJMOgXDQ/LDlW1DqCq1iW5R2vfCfhaT7+1rU2SRmr6SISrVq1apEgkSdJSMNcgFwslfdqqb8fksCTnJjn3uuuuG3FY0mQzX6T5MWekwZkvUn8LXWBdk2RHgPbz2ta+Ftilp9/OwNX9NlBVx1XV6qpavWLFipEGK00680WaH3NGGpz5IvW30AXW6cAhbf4Q4KM97Qcm2SLJfYDdgHMWODZJkiRJ2igjuwcrycl0A1psn2Qt8ArgGODU9j1aPwCeClBVFyc5Ffg2cCtweFWtH1VskiRJkjQKIyuwquqgGRY9Zob+RwNHjyoeSZIkSRq1cRnkQpIkSZIm3kIP077sOAS0JEmStHx4BkuSJEmShsQzWJIkSdI0069CkgblGSxJkiRJGhILLEmSJEkaEgssSZIkSRoS78FaZI4yKI0Xc1KSJG0MCyxJE8MbjiVJ0rizwBpzfpouSZIkTQ7vwZIkSZKkIfEM1oTzDJckSZI0Piywxoz3mEiSJEmTywJrgVlASZIkSUvXohRYSa4EbgLWA7dW1eok2wGnACuBK4EDquqGxYhPkiTNzcvUJenXLeYZrN+rqut7Hh8BfLaqjklyRHv8ksUJbXAL/cfFM2CSJEnS+BqnSwT3B/Zp8+8FPs8EFFjTWQBJkiRJy9diFVgFfCpJAe+oquOAHapqHUBVrUtyj34rJjkMOAxg1113Xah4pYlkvkjzs5RyZjE+8POSweVlKeXLQjA/lo/F+h6sR1bVg4EnAIcn+Z1BV6yq46pqdVWtXrFixegilJYA82XjrVmz5g6TljZzRhqc+SL1tyhnsKrq6vbz2iSnAXsD1yTZsZ292hG4djFikzQ+LGgkSdKkWfACK8ldgU2q6qY2//vAq4DTgUOAY9rPjy50bEtBv39IPQUtSZIkLYzFOIO1A3Bakqn9v7+qPpnkG8CpSQ4FfgA8dRFiW5K85leSJElaGAteYFXV5cCD+rT/CHjMQscjSfPhBxaSJGk24zRMuxaJ/zBKkiRJw7FYowhKkiRJ0pJjgSVJkiRJQ+Ilgvo1cw2N7SWEkiRJUn+ewZIkSZKkIfEM1jwthS8+XQrPQUuT701JkjTpLLC04By1UJIkSUuVBZY22lwFk2cltJT5gYHGhb9rJWk8eA+WJEmSJA2JZ7A0b35KKkmSlhr/v9GwWGDNwWSTJGnDeAmtpOXIAktDN+yi1D/QmiS+XyVJg/DvxdJlgaVF5y8YSZIkLRVjV2Al2Rd4I7Ap8K6qOmaRQ5I0Il6CKy0vfqAmaTkYqwIryabAW4DHAWuBbyQ5vaq+Pap9+st+/PhPtyRJkibVWBVYwN7AZVV1OUCSDwD7AyMrsKbzn/vxtyGv0VzfzTXf7+6yENeg/BBHo7IU/l6ZH5KWonErsHYCrup5vBZ46DB3sBT+IGn+5nrd5/u+mG+B5j8NHfPP4l2ajb87tZDG7W+S7/+lY9wKrPRpqzt0SA4DDmsPb05y6Szb2x64fkixjYoxbrxxjw/mjvGTVbXvsHc6z3yB8T+W4x4fGOMwLEq+wJL7GzPu8YExDsMg8Y3D35hxP44w/jGOe3ywNGLc6HxJVc3da4EkeThwVFU9vj0+EqCq/mkDt3duVa0eYohDZ4wbb9zjg8mIEcY/znGPD4xxGMY9vinjHue4xwfGOAzjHt+USYhz3GMc9/jAGKdsMsqNb4BvALsluU+SOwEHAqcvckySJEmSNJCxukSwqm5N8hzgLLph2o+vqosXOSxJkiRJGshYFVgAVXUmcOaQNnfckLYzSsa48cY9PpiMGGH84xz3+MAYh2Hc45sy7nGOe3xgjMMw7vFNmYQ4xz3GcY8PjBEYs3uwJEmSJGmSjds9WJIkSZI0uapqSU7AvsClwGXAEQuwvyuBi4DzgXNb23bAp4HvtZ/b9vQ/ssV2KfD4nvb/27ZzGfBv3H6WcQvglNb+dWDlADEdD1wLfKunbUFiAg5p+/gecMg84jsK+GE7jucD+y1WfK3fLsDZwHeAi4G/G7fjaL4sj3yZhJzBfDFfxuh1niHGozBfzBfzxXwZcb6M9E2+WBPdABlrgPsCdwIuAHYf8T6vBLaf1vZa2i8T4Ajgn9v87i2mLYD7tFg3bcvOAR5O951gnwCe0NqfDby9zR8InDJATL8DPHhasow8pvZmv7z93LbNbztgfEcBL+rTd8Hja313BB7c5rcCvttiGZvjaL4sj3yZhJzBfDFfxuh1niHGozBf5nUczRfzxXyZf76M7A2+mFM7aGf1PD4SOHLE+7ySX0/oS4Ede94Yl/aLh27UxIe3Ppf0tB8EvKO3T5vfjO4L0jJAXCunJcvIY+rt05a9AzhowPiOon8yL0p8feL4KPC4cTuO5svyyJdJyxnMF/PFfBn4vYj5Yr6YLwO/FxnzfFmq92DtBFzV83htaxulAj6V5Jvtm80BdqiqdQDt5z3miG+nNj+9/Q7rVNWtwI3A3TcgzoWIaWOP/3OSXJjk+CTbjkt8SVYCe9GdNp6E4zgo82Vmk/I6j13OmC9DZb6YL4se4wYyX2Y2Ka+z+bIBMS7VAit92mrE+3xkVT0YeAJweJLfmaXvTPHNFveon9MwY9qYWN8GrAL2BNYB/zoO8SW5G/Bh4HlV9dPZui5mnBvIfJm/cXqdxy5nzJehM18GW2cQ5ov50st8mZ35soHHcakWWGvpboabsjNw9Sh3WFVXt5/XAqcBewPXJNkRoP28do741rb5fnH/ap0kmwFbAz/egFAXIqYNPv5VdU1Vra+q24B30h3HRY0vyeZ0yXxSVX2kNY/1cZwn82VmY/86j1vOmC/DZ76YL4sZ40YyX2Y29q+z+bIRx3GuaxwncaK7bvJyupvapm6q3GOE+7srsFXP/FfoRs35F+54491r2/we3PHGu8u5/ca7bwAP4/Yb7/Zr7YdzxxvvTh0wtpXc8XrakcdEdxPgFXQ3Am7b5rcbML4de+afD3xgkeMLcCJw7LT2sTqO5svyyJdxzxnMF/NlzF7nPjGaL+aL+WK+jDxfRvIGH4cJ2I9uhJE1wMtGvK/7thfxArqhI1/W2u8OfJZuSMfP9r4YwMtabJfSRi9p7auBb7Vlb4ZfDR25JfBBuqEjzwHuO0BcJ9Od0v0lXfV96ELFBPx5a78MeNY84nsf3dCZFwKnc8fkXtD4Wr9H0Z0GvpCeYUrH6TiaL8sjXyYhZzBfzJcxep1niNF8MV/MF/Nl5PkytUFJkiRJ0kZaqvdgSZIkSdKCs8CSJEmSpCGxwJIkSZKkIbHAkiRJkqQhscCSJEmSpCGxwFrikty82DFIk8J8kQZnvkiDM1+WFwssSZIkSRoSC6xlIsk+ST6f5ENJLklyUpK0ZQ9J8pUkFyQ5J8lWSbZM8p4kFyX5ryS/1/o+M8l/JDkjyRVJnpPkBa3P15Js1/qtSvLJJN9M8qUkD1jM5y/Nh/kiDc58kQZnviwTo/xGbafFn4Cb2899gBuBnekK66/SfSv2nYDLgYe0fr8BbAa8EHhPa3sA8AO6b7h+Jt23WG8FrGjb/OvW7w3A89r8Z4Hd2vxDgc8t9rFwcpprMl+cnAafzBcnp8En82V5TZuh5eScqloLkOR8YCVdQq6rqm8AVNVP2/JHAW9qbZck+T5w/7ads6vqJuCmJDcCZ7T2i4DfTnI34BHAB9uHMgBbjPapSUNnvkiDM1+kwZkvS5wF1vJyS8/8errXP0D16Zs+bf22c1vP49vaNjcBflJVe25wpNLiM1+kwZkv0uDMlyXOe7B0CXCvJA8BaNf7bgZ8ETi4td0f2BW4dJANtk9drkjy1LZ+kjxoFMFLC8x8kQZnvkiDM1+WEAusZa6q/hd4GvCmJBcAn6a7tvetwKZJLgJOAZ5ZVbfMvKVfczBwaNvmxcD+w41cWnjmizQ480UanPmytKSq39lISZIkSdJ8eQZLkiRJkobEAkuSJEmShsQCS5IkSZKGxAJLkiRJkobEAkuSJEmShsQCS5IkSZKGxAJLkiRJkobEAkuSJEmShuT/AwH09XeVBggsAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "g = sns.FacetGrid(results, col='type', col_wrap=4, height=3, aspect=1)\n", "g.map(plt.hist, 'income', range=[0, 200000], bins=40, facecolor='gainsboro')\n", "g.set_axis_labels('Income', 'Count')\n", "g.set_titles('{col_name}')\n", "g.fig.suptitle('Visualizing Central Limit Theorem in action')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Confidence interval\n", "\n", "- x% confidence interval: It is the interval that encloses the central x% of the *bootstrap sampling distribution* of a sample statistic\n", "- Bootstrap sampling based confidence interval computation\n", " 1. Draw a random sample of size n with replacement from the data (a resample)\n", " 2. Record the statistic of interest for the resample\n", " 3. Repeat steps 1–2 many (R) times\n", " 4. For an x% confidence interval, trim [(100-x) / 2]% of the R resample results from\n", "either end of the distribution\n", " 5. The trim points are the endpoints of an x% bootstrap confidence interval\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "from sklearn.utils import resample\n", "#print('Data Mean: '+str(loans_income.mean()))\n", "np.random.seed(seed=3) \n", "# create a sample of 20 loan income data\n", "#sample20 = resample(loans_income, n_samples=20, replace=False)\n", "#print('Sample Mean: '+str(sample20.mean()))\n", "results = []\n", "for _ in range(500):\n", " sample = resample(loans_income, n_samples=20, replace=True)\n", " #sample = resample(sample20) # One could also use a small initial sample, to keep re-sampling\n", " results.append(sample.mean())\n", "results = pd.Series(results)\n", "\n", "confidence_interval = list(results.quantile([0.05, 0.95]))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAATwAAAD0CAYAAAAGyZprAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAjsklEQVR4nO3deXxU9bn48c9DEheCQFhNoBGURfYIoSyKAYIsYkNFIWDpjYqm3gtaS8GiXi/S0oKttcqvvGhxI/dqQZAl1KqAqUoFlU1ADCAEFDApoKAiIcjg8/tjTsYAIUyGnMxMzvN+veZ1zvnOWZ6ZgSff71m+X1FVjDHGC2qFOwBjjKkulvCMMZ5hCc8Y4xmW8IwxnmEJzxjjGZbwjDGe4WrCE5FfiMhHIrJVROaJyCUi0kBEVorITmea4GYMxhhTyrWEJyLNgPuAVFXtCMQAo4DJQJ6qtgbynGVjjHGd203aWOBSEYkFagOFwDAgx3k/B/ixyzEYYwzgT0iuUNXPRORxYC9wHFihqitEpKmqFjnrFIlIk/K2F5FsIBsgPj6+29VXX+1WqMaYCLFhw4bPVbWxW/t3LeE55+aGAS2BL4GFIjIm2O1VdQ4wByA1NVXXr1/vRpjGmAgiIp+6uX83m7QDgD2qekhVTwKLgd7AARFJBHCmB12MwRhjAtxMeHuBniJSW0QESAe2AcuALGedLCDXxRiMMSbAzXN474vIy8BGwAd8gL+JWgdYICJj8SfFEW7FYIwxZbmW8ABUdQow5YziE/hre8YYU63sSQtjjGdYwjPGeIYlPGOMZ1jCM8Z4hiU8Y4xnWMIzxniGJTxjjGdYwjPGeIYlPGOMZ1jCM8Z4hiU8Y4xnWMIzxniGJTxjjGdYwjPGeIYlPGOMZ1jCM8Z4hiW8ELRo0YJOnTqRkpJCamoqAJmZmaSkpJCSkkKLFi1ISUkJrL9lyxZ69epFhw4d6NSpEyUlJQDMmzePTp060blzZwYPHsznn38OwF/+8pfA/q+77jry8/Or/TPWRH/605/o0KEDHTt2ZPTo0ZSUlLBw4UI6dOhArVq1OHOgqOnTp9OqVSvatm3L8uXLA+WDBw+mS5cudOjQgXvuuYdTp04B8Itf/CLwb6BNmzbUr1+/Oj+eCYaquvIC2gKbyry+Bu4HGgArgZ3ONOF8++rWrZtGkiuuuEIPHTp0zvcnTJigU6dOVVXVkydPaqdOnXTTpk2qqvr555+rz+fTkydPauPGjQP7mTRpkk6ZMkVVVb/66qvAvnJzc3XQoEEufRLv2L9/v7Zo0UKLi4tVVXXEiBH6/PPPa35+vm7fvl3T0tJ03bp1gfU/+ugj7dy5s5aUlOju3bv1yiuvVJ/Pp6rf/z7fffedDh8+XOfNm3fW8WbOnKl33HFHNXyymgVYry7lJFV1r4anqjtUNUVVU4BuQDGwBJgM5KlqayDPWa4xVJUFCxYwevRoAFasWEHnzp3p0qULAA0bNiQmJibwAxw7dgxV5euvvyYpKQmAunXrBvZ37Ngx/GMgmQvl8/k4fvw4Pp+P4uJikpKSaNeuHW3btj1r3dzcXEaNGsXFF19My5YtadWqFWvXrgW+/318Ph/ffvttub/PvHnzAv8GTOSoriZtOlCgqp/iH6s2xynPAX5cTTFUGRFh4MCBdOvWjTlz5pz23r/+9S+aNm1K69atAfj4448REQYNGkTXrl35/e9/D0BcXByzZ8+mU6dOJCUlkZ+fz9ixYwP7mTVrFldddRUPPPAAM2fOrL4PV0M1a9aMiRMnkpycTGJiIvXq1WPgwIHnXP+zzz7jBz/4QWC5efPmfPbZZ4HlQYMG0aRJEy677DJuvfXW07b99NNP2bNnD/3796/6D2IuSHUlvFHAPGe+qaoWATjTJtUUQ5VZvXo1Gzdu5LXXXmPWrFmsWrUq8N6Zf9l9Ph/vvPMOL774Iu+88w5LliwhLy+PkydPMnv2bD744AMKCwvp3Lkz06dPD2w3btw4CgoKeOyxx5g2bVq1fr6a6MiRI+Tm5rJnzx4KCws5duwYL7zwwjnX97euTle2Jrd8+XKKioo4ceIE//znP09bb/78+dx6663ExMRU3QcwVcL1hCciFwEZwMJKbpctIutFZP2hQ4fcCS5EpU3PJk2acPPNNweaOj6fj8WLF5OZmRlYt3nz5qSlpdGoUSNq167NjTfeyMaNG9m0aRMAV111FSLCyJEjWbNmzVnHGjVqFEuXLnX9M9V0b7zxBi1btqRx48bExcUxfPjwcr/vUs2bN2ffvn2B5f379wd+91KXXHIJGRkZ5OaePrTy/PnzrTkboaqjhjcE2KiqB5zlAyKSCOBMD5a3karOUdVUVU1t3LhxNYQZnGPHjnH06NHA/IoVK+jYsSPg/0919dVX07x588D6gwYNYsuWLRQXF+Pz+Xj77bdp3749zZo1Iz8/n9JkvnLlStq1awfAzp07A9v/4x//CDSPTeiSk5N57733KC4uRlXJy8sLfN/lycjIYP78+Zw4cYI9e/awc+dOfvjDH/LNN99QVFQE+P/Avfrqq1x99dWB7Xbs2MGRI0fo1auX65/JVJ6r49I6RvN9cxZgGZAFzHCmueVtFKkOHDjAzTffDPj/wd92220MHjwYKP8ve0JCAhMmTKB79+6ICDfeeCNDhw4FYMqUKVx//fXExcVxxRVXMHfuXAD+/Oc/88YbbxAXF0dCQgI5OTmYC9OjRw9uvfVWunbtSmxsLNdccw3Z2dksWbKEe++9l0OHDjF06FBSUlJYvnw5HTp0YOTIkbRv357Y2FhmzZpFTEwMx44dIyMjgxMnTnDq1Cn69+/PPffcEzjOvHnzGDVqlF1oilBS3rmKKtu5SG1gH3Clqn7llDUEFgDJwF5ghKoermg/qampeuY9UsaYmkdENqhqqlv7d7WGp6rFQMMzyr7Af9XWGGOqlT1pYYzxDEt4xhjPsIRnjPEMS3jGGM+whGeM8QxLeMYYz7CEZ4zxDEt41ej+++/n/vvvD3cYphLsN6tZquPRMuMo7TDARA/7zWoWq+EZYzzDEp4xxjMs4RljPMMSnjHGMyzhGWM8wxKeMcYzLOEZYzzDEp4xxjMs4dUwTz31FB07dqRDhw48+eSTgfLDhw9zww030Lp1a2644QaOHDkC+Iec7Ny5M927d2fXrl0AfPnllwwaNKjcoQoB7rrrLvLz8yuMY+nSpeddpyrMnTuX8ePHu34cUzO4mvBEpL6IvCwi20Vkm4j0EpEGIrJSRHY60wQ3Y/CSrVu38vTTT7N27Vo2b97MK6+8EhgBbcaMGaSnp7Nz507S09OZMWMGAH/84x9ZtGgRv/vd75g9ezYAv/nNb3jooYfOORDNM888Q/v27SuMJZSE5/P5KrW+MZXldg3vKeB1Vb0a6AJsAyYDearaGshzlk0V2LZtGz179qR27drExsaSlpbGkiVLAMjNzSUrKwuArKyswFi3cXFxHD9+nOLiYuLi4igoKOCzzz4jLS3tnMfp27cvpYMq1alTh4cffpguXbrQs2dPDhw4wJo1a1i2bBmTJk0iJSWFgoICCgoKGDx4MN26daNPnz5s374dgNtvv50JEybQr18/Jk2aRIsWLfjyyy8Dx2rVqhUHDhzg73//Oz169OCaa65hwIABHDhwoLzQjKmYqrryAuoCe3BGRitTvgNIdOYTgR3n21e3bt20JkhLS9O0tDTX9p+fn6+tW7fWzz//XI8dO6Y9e/bU8ePHq6pqvXr1Tlu3fv36qqr6wQcfaI8ePbRv3766b98+zczM1I8//rjC46Slpem6detUVRXQZcuWqarqpEmT9De/+Y2qqmZlZenChQsD2/Tv3z+w3/fee0/79esXWG/o0KHq8/lUVfW+++7T5557LrBeenq6qqoePnxYv/vuO1VVffrpp3XChAmqqvr888/ruHHjKvtVBc3t38ycDlivLuUkVXW184ArgUPA8yLSBdgA/BxoqqpFTrItEpEm5W0sItlANvgHUTbn165dO371q19xww03UKdOHbp06UJsbMU/cUpKCu+99x4Aq1atIikpCVUlMzOTuLg4/vjHP9K0adNzbn/RRRdx0003AdCtWzdWrlx51jrffPMNa9asYcSIEYGyEydOBOZHjBhBTEwMAJmZmfz617/mjjvuYP78+WRmZgKwf/9+MjMzKSoq4ttvv6Vly5ZBfivGfM/NJm0s0BWYrarXAMeoRPNVVeeoaqqqpjZu3NitGGucsWPHsnHjRlatWkWDBg1o3bo1AE2bNqWoqAiAoqIimjQ5/e+MqjJt2jQeeeQRpk6dytSpUxkzZgwzZ86s8HhxcXGBc30xMTHlnof77rvvqF+/Pps2bQq8tm3bFng/Pj4+MN+rVy927drFoUOHWLp0KcOHDwfg3nvvZfz48Xz44Yf89a9/paSkJIRvx3idmwlvP7BfVd93ll/GnwAPiEgigDM96GIMnnPwoP/r3Lt3L4sXL2b06NEAZGRkkJOTA0BOTg7Dhg07bbucnByGDh1KQkICxcXF1KpVi1q1alFcXBxSHJdddhlHjx4FoG7durRs2ZKFCxcC/uS6efPmcrcTEW6++WYmTJhAu3btaNjQP6zxV199RbNmzQKxGhMK15q0qvpvEdknIm1VdQf+wbfznVcWMMOZ5roVgxfdcsstfPHFF8TFxTFr1iwSEvwXwSdPnszIkSN59tlnSU5ODiQfgOLiYnJyclixYgUAEyZM4JZbbuGiiy5i3rx5IcUxatQo7r77bmbOnMnLL7/Miy++yH/+538ybdo0Tp48yahRo+jSpUu522ZmZtK9e3fmzp0bKHv00UcZMWIEzZo1o2fPnuzZsyekuIy3iZ7jXqsq2blICvAMcBGwG7gDf61yAZAM7AVGqOrhivaTmpqqpVcFo1nfvn0BeOutt8Iahwme/WbVS0Q2qGqqW/t3tcdjVd0ElBd8upvHNcaY8tiTFsYYz7CEZ4zxDEt4xhjPsIRnjPEMS3jGGM+whGeM8QwbiLsa7dq1i2+++SZwb5eJfJs2baJOnTrhDsNUEavhGWM8w2p41ahVq1aA3bUfTaw2XrNYDc8Y4xmW8IwxnmEJzxjjGZbwjDGeYQnPGOMZlvCMMZ5hCc8Y4xmW8IwxnuHqjcci8glwFDgF+FQ1VUQaAC8BLYBPgJGqesTNOIwxBqqnhtdPVVPK9FM/GchT1dZAHpUYutEYYy5EOJq0w4DScfZygB+HIQZjjAe5nfAUWCEiG0Qk2ylrqqpFAM60SXkbiki2iKwXkfWHDh1yOUwTir59+9qzpiaquN15wLWqWigiTYCVIrI92A1VdQ4wB/zDNLoVoDHGO1yt4alqoTM9CCwBfggcEJFEAGd60M0YjDGmlGs1PBGJB2qp6lFnfiDwa2AZkAXMcKa5bsVg/AoKCs753lVXXRXyftu0aRPytsaEg5tN2qbAEhEpPc7fVPV1EVkHLBCRscBeYISLMRgXzZkzJ9whGFMpriU8Vd0NdCmn/Asg3a3jGmPMudiTFiZk2dnZZGdnn39FYyKEdfFuQvbxxx+HOwRjKsVqeMYYz7CEZ4zxDEt4xhjPqPQ5PBFJAH6gqltciMdEkZSUlHCHYEylBJXwROQtIMNZfxNwSETeVtUJ7oVmIt2TTz4Z7hCMqZRgm7T1VPVrYDjwvKp2Awa4F5YxxlS9YBNerPPc60jgFRfjMVFkzJgxjBkzJtxhGBO0YM/hTQWWA++o6joRuRLY6V5YJhrs378/3CEYUynBJrwiVe1cuqCqu0XkCZdiMsYYVwTbpP1/QZYZY0zEqrCGJyK9gN5AYxEpe0W2LhDjZmDGGFPVztekvQio46x3WZnyr4Fb3QrKRIdevXqFOwRjKqXChKeqbwNvi8hcVf20mmIyUWL69OnhDsGYSgn2osXFIjIH/1iygW1Utb8bQZnIUVFvyXBhPSYbU92CTXgLgb8Az+AfVNsYxo0bR3x8PIsWLQp3KMYEJdiE51PV2aEcQERigPXAZ6p6k4g0AF7CX1v8BBipqkdC2bcJryNHjnDkyJHz1gLPxWqHproFe1vK30Xkv0QkUUQalL6C3PbnwLYyy5OBPFVtDeQ5y8YY47pgE14WMAlYA2xwXuvPt5GINAeG4m8KlxoG5DjzOcCPg4zBGGMuSFBNWlVtGeL+nwQe4PRbWpqqapGz3yJnkO6ziEg2kA2QnJwc4uGNMeZ7wXYP9R/llavq/1awzU3AQVXdICJ9KxuYqs4B5gCkpqZqZbc37uvdu3e4QzCmUoK9aNG9zPwl+IdZ3AicM+EB1wIZInKjs01dEXkBOCAiiU7tLhE4GELcJgKMHz8+3CEYUynBNmnvLbssIvWA/zvPNg8CDzrr9wUmquoYEfkD/nOCM5xpbqWjNsaYEIQ6pkUx0DrEbWcAN4jITuAGZ9lEoTvvvJM777wz3GEYE7Rgz+H9HSg9jxYDtAMWBHsQVX0LeMuZ/wJ/k9hEuZKSknCHYEylBHsO7/Ey8z7gU1W13h+NMVElqCat04nAdvy3lyQA37oZlDHGuCGohCciI4G1wAj841q8LyLWPZQxJqoE26R9GOiuqgcBRKQx8AbwsluBmcjXv791lmOiS7AJr1ZpsnN8QehXeE0Ncdddd4U7BGMqJdiE97qILAfmOcuZwKvuhGTKU1GPJNHa60hN/Ewmsp1vTItW+J99nSQiw4HrAAHeBV6shvhMBLvtttsA+Nvf/hbmSIwJzvlqeE8CDwGo6mJgMYCIpDrv/cjF2Ew1CLUvO7dZT8vGDec7D9dCVbecWaiq6/F34GmMMVHjfAnvkgreu7QqAzHGGLedL+GtE5G7zywUkbH4OwE1xpiocb5zePcDS0TkJ3yf4FLxj1d7s4txmShw4403hjsEYyrlfOPSHgB6i0g/oKNT/A9V/afrkZmIN2bMmHCHYEylBNsf3pvAmy7HYqLM8ePHAbj0Ujuda6JDsDceG3OWsWPHAnYfnoke9niYMcYzLOEZYzzDtYQnIpeIyFoR2SwiH4nIVKe8gYisFJGdzjTBrRiMMaYsN2t4J4D+qtoFSAEGi0hPYDKQp6qtgTxn2RhjXOfaRQtVVeAbZzHOeSkwDOjrlOfgH+viV27FYdxzyy23hDsEYyrF1au0IhKD/4blVsAsVX1fRJqqahGAMzZtk3Nsmw1kAyQnJ7sZpgmRJTwTbVy9aKGqp1Q1BWgO/FBEOp5nk7LbzlHVVFVNbdy4sWsxmtAdPnyYw4cPhzsMY4JWLffhqeqXIvIWMBg4ICKJTu0uEThY8dYmUo0fPx6w+/BM9HDzKm1jEanvzF8KDMA/8tkyIMtZLQvIdSsGY4wpy80aXiKQ45zHqwUsUNVXRORdYIHT48pe/COhGWOM69y8SrsFuKac8i+AdLeOa4wx52LP0pqoZF3Am1BYwjMh+8lPfhLuEIypFEt4NUC4BuIZOnRoWI5rTKis8wATssLCQgoLC8MdhjFBsxqeCdnEiRMBuw/PRA9LeBEiUseHNaYmsSatMcYzLOEZYzzDEp4xxjPsHJ4JWekgPsZEC0t4JmTp6faEoIku1qQ1Idu9eze7d+8OdxjGBM1qeCZk//3f/w3YfXgmelgNzxjjGZbwjDGeYQnPGOMZlvCMMZ7h2kULEfkB8L/A5cB3wBxVfUpEGgAvAS2AT4CRqnrErTiMe8aNGxfuEIypFDev0vqAX6rqRhG5DNggIiuB24E8VZ0hIpOBydhA3FHp2muvDXcIxlSKa01aVS1S1Y3O/FFgG9AMGAbkOKvlAD92Kwbjrvz8fPLz88MdhjFBq5b78ESkBf4Bfd4HmqpqEfiToog0Occ22UA2QHJycnWEaSpp2rRpgN2HZ6KH6xctRKQOsAi4X1W/DnY7VZ2jqqmqmtq4cWP3AjTGeIarCU9E4vAnuxdVdbFTfEBEEp33E4GDbsZgjDGl3LxKK8CzwDZVfaLMW8uALGCGM811KwbjXRX1IG1DOHqXm+fwrgV+CnwoIpucsofwJ7oFIjIW2AuMcDEGY4wJcC3hqeo7gJzjbetXqAYoHcTHmGhhvaWYkHXt2jXcIRhTKfZomQnZxo0b2bhxY7jDMCZoVsOrJgUFBRw/fjwwXxM8/vjjgN2HZ6KH1fCMMZ5hCc8Y4xmW8IwxnmEJzxjjGXbRwoSsdBAfY6KFJTwTsvbt24c7BFeUvYp+5pV1eywtulmT1oRs9erVrF69OtxhGBM0q+GZkM2aNQuwno9N9LAanjHGM6yGF0USEhKIjY2cn+yRRx4BINI6aPX5fBw5YuNCmbNFzv8ec16xsbH4fL5whxFQVFQEEFExARH1R8FEFmvSGmM8w/4UmpD1798/3CEYUylWw4tyDRs25J577gks+3w+2rRpw+jRo10/dkJCAgkJCQC88847pKWl0bt3b370ox8F1pk9eza9e/fm2muv5e6776akpASADz/8kIEDB5KWlkb//v3ZsGEDABs2bCAtLY20tDSuv/56XnnllcC+Nm3axHXXXUdqaiqTJ09GVV3/jKZmcS3hichzInJQRLaWKWsgIitFZKczTXDr+F4RHx/P9u3bAzfIvvXWWyQmJlbLsffs2cOePXv46quvmDRpEi+++CJr1qzhueeeA6CwsJA5c+aQl5fH6tWrOXXqFIsX+8dyevTRR3nggQd4++23efDBB5k6dSoA7dq1Iy8vj7fffpsFCxbwy1/+MnCOcOLEifzpT39i3bp17N69m7y8vGr5nKbmcLOGNxcYfEbZZCBPVVsDec6yuUDp6emsXLkSgEWLFjF8+PDAe8eOHePee+8lPT2dvn378uqrrwKwd+9ehg4dSr9+/ejXrx9r164F/DW1jIwMbr/9dnr06MHPfvazc9akSjsAffnll7npppto3rw5cPpVW5/PR0lJCT6fj+PHjweSsYhw9OhRAL7++msuv/xyAGrXrh246HDixAn8Y0HBv//9b44ePUr37t0RETIzMwOfpbIKCgoqfJmay7WEp6qrgMNnFA8Dcpz5HODHbh3fS4YPH87ixYspKSkhPz+fbt26Bd574okn6NOnD3l5eeTm5vLoo49y7NgxGjVqxKJFi3jzzTd55plnmDz5+789W7Zs4be//S3vvvsun3zyCe+//z4A06dP57XXXjvr+AUFBXz55ZdkZGTQv39/5s+fD0BSUhLjx4+nS5cutG/fnrp169KvXz8Afvvb3zJlyhQ6derE//zP/wRucQFYv349vXv3pk+fPjz++OPExsZSVFREUlJSYJ2kpKTAVWJjglXdFy2aqmoRgKoWiUiTaj5+jdShQwf27t3LokWLGDBgwGnvvfnmm7z++uuBpyJKSkrYv38/iYmJPPDAA2zdupWYmJjTajZdu3alWbNmAHTs2JG9e/fSs2dPHnzwwXKP7/P52Lx5M0uWLKGkpITBgweTmppKo0aNePXVV9m4cSP16tXjjjvuYMGCBYwcOZLnn3+eadOmkZGRwdKlS7nvvvtYsmQJAKmpqaxZs4YdO3Ywbtw4BgwYUG4ts7T2Z0ywIvYqrYhkA9kAycnJYY4m8g0ZMoQpU6awbNkyDh8+vWI9d+5cWrdufVrZY489RpMmTVi1ahXffffdabWniy++ODAfExNz3vvskpKSaNiwIfHx8cTHx9OrVy8++ugjAK644goaNWoEwE033cTatWsZOXIk8+fPZ/r06QAMGzaMn//852ftt23btsTHx7Nt2zaSkpIoLCwMvFdYWBhoBhsTrOq+SntARBIBnOnBc62oqnNUNVVVUyPtTv5IdNtttzFp0qSzejDp168fTz/9dKCGtGXLFsB/3qxp06bUqlWLl156iVOnToV87CFDhvDuu+/i8/koLi5mw4YNtGnThmbNmrF+/XqKi4tRVVatWkWbNm0AuPzyywMdD6xatSrQC8mnn34aSLD79u1j586dJCcnc/nll1OnTh3WrVuHqvLSSy8xZMiQkGM23lTdNbxlQBb+wbizgNxqPn6N1axZM372s5+dVT5x4kQefvhh+vTpg6qSnJzMvHnzuPPOO7n99tvJzc3luuuuIz4+/rzHmD59OikpKYFEM3DgQAAuu+wy0tPT6dOnD7Vq1eKnP/0p7dq1AyAjI4N+/foRGxtLp06dyMrKAuDJJ5/koYcewufzcfHFF/PEE08A8N577/HUU08RFxdHrVq1+MMf/kDDhg0B/6BB48ePp6SkhPT09LOa78acj7h1L5OIzAP6Ao2AA8AUYCmwAEgG9gIjVPXMCxtnSU1N1fXr17sSZ3UpKCjgtttuA0If5atx48YR9xhXJIqNjeXQoUNVsq8zfzPrD89dIrJBVVPd2r9rNTxVPdedr+luHdNUr507dwKcdX7QmEgVsRctTOQrPR9oCc9EC3u0zBjjGVbDO0NFd9qf7/yN3aVvzvdvwM4BhpfV8IwxnmE1vCji8/kiqnPL0udiIykmiLwOSU3kiKx/qaZCkdZtedOmTQGq7BYQY9xmCc+ErEGDBuEOwZhKsXN4JmSLFi1i0aJF4Q7DmKBZwjMhs4Rnoo0lPGOMZ9g5PGMq4ULvtbyQ+zzNhbManjHGMyzhGWM8w3NNWnv8q+o8++yz4Q6hRrHH0tznuYRnqs6ll14a7hCMqRRr0pqQvfDCC7zwwgvhDsOYoNW4Gp6bTVZrDp+udFzYMWPGhDkSY4JjNTxjjGeEpYYnIoOBp4AY4BlVnRGOOIyJJm721XghF0Si6d7Caq/hiUgMMAsYArQHRotI+4q3MsaYCxeOJu0PgV2qultVvwXmA8PCEIcxxmPC0aRtBuwrs7wf6HHmSiKSDWQ7i9+IyI4QjtUI+DyE7VzVqlWrYFaLyNjLU87niZrYy1Fu7EH+ZuFWE773K9w8SDgSnpRTdtbguKo6B5hzQQcSWe/mGJdustjDw2IPj+qKPRxN2v3AD8osNwcKwxCHMcZjwpHw1gGtRaSliFwEjAKWhSEOY4zHVHuTVlV9IjIeWI7/tpTnVPUjlw53QU3iMLPYw8NiD49qiV1Uzzp9ZowxNZI9aWGM8QxLeMYYz4iKhCcin4jIhyKySUTWO2UNRGSliOx0pgll1n9QRHaJyA4RGVSmvJuzn10iMlNExCm/WERecsrfF5EWVRh7fRF5WUS2i8g2EekVDbGLSFvn+y59fS0i90dD7M6+fyEiH4nIVhGZJyKXRFHsP3fi/khE7nfKIjJ2EXlORA6KyNYyZdUSq4hkOcfYKSJZQQWsqhH/Aj4BGp1R9ntgsjM/GXjMmW8PbAYuBloCBUCM895aoBf+ewFfA4Y45f8F/MWZHwW8VIWx5wB3OfMXAfWjJfYynyEG+Df+m0IjPnb8N7fvAS51lhcAt0dJ7B2BrUBt/BcV3wBaR2rswPVAV2Brdf7fBBoAu51pgjOfcN54q/o/hxsvyk94O4BEZz4R2OHMPwg8WGa95c4XmQhsL1M+Gvhr2XWc+Vj8d3xLFcRd1/mPJ9EW+xnxDgRWR0vsfP80TwNnv684nyEaYh+Bv0ON0uVHgAciOXagBacnPNdjLbuO895fgdHnizUqmrT4n8RYISIbxP/IGUBTVS0CcKZNnPLyHl1r5rz2l1N+2jaq6gO+AhpWQdxXAoeA50XkAxF5RkTioyT2skYB85z5iI9dVT8DHgf2AkXAV6q6Ihpix1+7u15EGopIbeBG/DfqR0Pspaoj1nPtq0LRkvCuVdWu+HtYGSci11ew7rkeXavokbagHncLQSz+6v5sVb0GOIa/in8ukRS7f+f+m8MzgIXnW/UccVR77M45o2H4m01JQLyIVNRLacTErqrbgMeAlcDr+JuAvgo2iZjYg1CVsYb0GaIi4alqoTM9CCzB3+PKARFJBHCmB53Vz/Xo2n5n/szy07YRkVigHnC4CkLfD+xX1fed5ZfxJ8BoiL3UEGCjqh5wlqMh9gHAHlU9pKongcVA7yiJHVV9VlW7qur1zj53RkvsjuqINaRHVCM+4YlIvIhcVjqP/1zMVvyPo5VemckCcp35ZcAo5+pOS/wnfNc6VeujItLTuQL0H2dsU7qvW4F/qnNi4EKo6r+BfSLS1ilKB/KjIfYyRvN9c/bM40Vq7HuBniJS2zlmOrAtSmJHRJo402RgOP7vPypiL2f/bsW6HBgoIglOjX6gU1axCz3J6vYL/3mwzc7rI+Bhp7whkIf/r18e0KDMNg/jvwK0A+dqj1Oeij9ZFgB/5vsnTS7B32Tbhf9q0ZVVGH8KsB7YAizFf0UpWmKvDXwB1CtTFi2xTwW2O8f9P/xXBqMl9n/h/8O4GUiP5O8dfzIuAk7ir3WNra5YgTud8l3AHcHEa4+WGWM8I+KbtMYYU1Us4RljPMMSnjHGMyzhGWM8wxKeMcYzLOEZYzzDEp4xxjP+P7zWAJIJd1A0AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "ax = results.plot.hist(bins=30, facecolor='gainsboro', figsize=(4.5,3.5))\n", "ax.plot(confidence_interval, [55, 55], color='black')\n", "for x in confidence_interval:\n", " ax.plot([x, x], [0, 65], color='black')\n", " ax.text(x, 70, f'{x:.0f}', horizontalalignment='center', verticalalignment='center')\n", "ax.text(sum(confidence_interval) / 2, 60, '90% interval', horizontalalignment='center', verticalalignment='center')\n", "meanIncome = results.mean()\n", "ax.plot([meanIncome, meanIncome], [0, 50], color='black', linestyle='--')\n", "ax.text(meanIncome, 10, f'Mean: {meanIncome:.0f}', bbox=dict(facecolor='white', edgecolor='white', alpha=0.5),\n", " horizontalalignment='center', verticalalignment='center')\n", "ax.set_ylim(0, 80)\n", "ax.set_ylabel('Counts')\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Interpreting confidence interval\n", "\n", "> A note of caution: Confidence interval DOES NOT answer the question “What is the probability that the true value lies within a certain interval?” " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Statistical experiments & significance" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> In 2012 a Microsoft employee working on Bing had an idea about changing the way the search engine displayed ad headlines. Developing it wouldn’t require much effort—just a few days of an engineer’s time—but it was one of hundreds of ideas proposed, and the program managers deemed it a low priority. So it languished for more than six months, until an engineer, who saw that the cost of writing the code for it would be small, launched a simple online controlled experiment—an A/B test—to assess its impact. Within hours the new headline variation was producing abnormally high revenue, triggering a “too good to be true” alert. Usually, such alerts signal a bug, but not in this case. An analysis showed that the change had increased revenue by an astonishing 12%—which on an annual basis would come to more than $100 million in the United States alone—without hurting key user-experience metrics. It was the best revenue-generating idea in Bing’s history, but until the test its value was underappreciated.\n", "\n", "Source: https://hbr.org/2017/09/the-surprising-power-of-online-experiments" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Statistical inference pipeline\n", "\n", "\"Statistical\n", "\n", "
Design experiment (typically, an A/B test):\n", "An A/B test is an experiment with two groups to establish which of two treatments,\n", "products, procedures, or the like is superior. Often one of the two treatments is the\n", "standard existing treatment, or no treatment. If a standard (or no) treatment is used,\n", "it is called the control. A typical hypothesis is that a new treatment is better than the\n", "control." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\"Bing
\n", "\n", "Image source: Harvard Business Review" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\"Key
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Randomization \n", "- Ideally, subjects ought to be assigned randomly to treatments.\n", "- Then, any difference between the treatment groups is due to one of two things\n", " - Effect of different treatments\n", " - Luck of the draw\n", " * Ideal randomization may not always be possible" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Control group\n", "\n", "- Why not just compare with the original baseline? \n", " * Without a control group, there is no assurance that “all other things are equal”" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Blinding\n", "\n", "- Blind study: A blind study is one in which the subjects are unaware of whether they are getting treatment A or treatment B\n", "- Double-blind study: A double-blind study is one in which the investigators and facilitators also are unaware which subjects are getting which treatment\n", " * Blinding maynot always be feasible" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Ethical & legal considerations\n", "\n", "- In the context of web applications and data products, do you need permission to carry out the study?\n", " * Anecdotes \n", " * Facebook's emotion study on its feeds was hugely controversial\n", " * OKCupid's study on compatibilty & matches " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Interpreting A/B test results with statistical rigour\n", "\n", "- Human brain/intuition is (typically) not good at comprehending or interpreting randomness \n", "- Hypothesis testing\n", " > Assess whether random chance is a reasonable explanation for the observed difference between treatment groups" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Hypothesis testing\n", "\n", "- Null hypothesis $H_{0}$\n", "- Alternative hypothesis $H_1$\n", " - Variations:\n", " * one-way/one-tail, e.g. $H_{0} \\leq \\mu $, $H_1 > \\mu$ \n", " * two-way/two-tail, e.g., $H_{0} = \\mu$, $H_1 \\neq \\mu$ \n", "- Caution: One only rejects, or fails to reject the Null hypothesis\n", " - DOES NOT PROVE anything\n", " * May nevertheless be `good enough' for a lot of decisions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Resampling\n", "\n", "> Resampling in statistics means to repeatedly sample values from observed data, with a\n", "general goal of assessing random variability in a statistic\n", "\n", "> Broadly, two variations:\n", "> - Bootstrap\n", "> * We saw this variant earlier, when exposing the idea of Central Limit Theorem\n", "> - Permutation tests\n", "> * Typically, this is what is used for hypothesis testing \n", " * A special case is an exhaustive permutation test (practical only for small data sets)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Permutation test\n", "\n", "\"Statistical" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Permutation test\n", "\n", "Compare observed difference between groups and to the set of permuted differences. \n", "\n", ">If the observed difference lies well within the set of permuted differences\n", ">- Can NOT reject Null hypothesis\n", ">\n", ">Else\n", ">- The difference is statistically significant, i.e., reject Null hypothesis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Web stickiness example" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PageTime
0Page A21.0
1Page B253.0
2Page A35.0
3Page B71.0
4Page A67.0
\n", "
" ], "text/plain": [ " Page Time\n", "0 Page A 21.0\n", "1 Page B 253.0\n", "2 Page A 35.0\n", "3 Page B 71.0\n", "4 Page A 67.0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Data\n", "session_times = pd.read_csv(practicalstatspath+'web_page_data.csv')\n", "session_times.Time = 100 * session_times.Time\n", "session_times.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAARgAAAEOCAYAAABSAQgKAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAXTElEQVR4nO3de5CddX3H8fcnIVwExGDCuobLUgxMiNUgW9QhrQlYQKwFHLHZqZrRtLFTsPVS28S0A5TuGDuCo7VQY+OQVgKGImMgGogxp5peiATCJawp0USI7BC1CoRJYxK//eP8Fk42u+c8Cfs7189r5sw55/c8zznfw0M++3uuP0UEZmY5jGt0AWbWvhwwZpaNA8bMsnHAmFk2Dhgzy8YBY2bZOGCsLiRtljSr0XVYfR3R6AKsPUjaVfH2FcAeYH96/+GImF7/qqzR5BPtbKxJ2g78UUR8u9G1WGN5E8nqQtJ2SW9Pr6+VdIekr0p6XtKjks6UtFDSTklPSbqoYtkTJC2VNCjpJ5L+TtL4xv0aK8oBY43yLuBfgYnAQ8C9lP9/nAL8LfClinmXAfuA1wHnABcBf1TPYu3wOGCsUb4XEfdGxD7gDmAysDgi9gK3Az2SXiWpC3gH8NGIeCEidgKfA+Y0rHIrzDt5rVGeqXi9G/hZROyveA9wHPBaYAIwKGlo/nHAU/Uo0l4eB4w1u6coH5GalHo71kK8iWRNLSIGgfuAGyS9UtI4SWdIeluja7PaHDDWCj4AHAk8DvwC+Degu6EVWSE+D8bMsnEPxsyyccCYWTYOGDPLxgFjZtk4YMwsm5Y+0W7SpEnR09PT6DKyeuGFFzj22GMbXYa9DJ2wDjdu3PiziJg8vL2lA6anp4cHHnig0WVkVSqVmDVrVqPLsJehE9ahpB+P1O5NJDPLxgFjZtk4YMwsGweMmWXjgDGzbBwwZpaNA8bMsnHAmFk2LX2inVkzqrh38Kg65T5M2Xowko6WtEHSw2nY0OtS+7VpbJtN6XFpxTILJW2VtEXSxblqM8spIg54nPZX9xzU1ily9mD2ABdExC5JE4D1kr6Vpn0uIj5bObOksykPRTGd8p3kvy3pzIo7zZtZi8nWg4myofGKJ6RHtei+DLg9IvZExDZgK3BervrMLL+sO3kljZe0CdgJrImI+9OkqyU9IukrkiamtikcONbNjtRmZi0q607etHkzQ9KrgLskvR64Gbiecm/meuAG4EPASHvGDurxSJoPzAfo6uqiVCplqb1Z7Nq1q+1/Yyfo2HU4fOdTrgdwDfAXw9p6gMfS64XAwopp9wJvrfaZ5557brSr5cuXx/Tp02PcuHExffr0WL58eaNLssN02l/d0+gSsgMeiBH+jWbrwUiaDOyNiF9KOgZ4O/AZSd1RHkwL4ArgsfR6JbBc0o2Ud/JOBTbkqq+Z3XbbbSxatIilS5eyf/9+xo8fz7x58wDo6+trcHVmxeXcB9MNrJP0CPB9yvtg7gH+XtKjqX028DGAiNgMrKA8uNZq4Kro0CNI/f39LF26lNmzZ3PEEUcwe/Zsli5dSn9/f6NLMzsk2XowEfEIcM4I7e+vskw/0PH/igYGBpg5c+YBbTNnzmRgYKBBFZkdHl8q0ISmTZvG+vXrD2hbv34906ZNa1BFZofHAdOEFi1axLx581i3bh379u1j3bp1zJs3j0WLFjW6NLND4muRmtDQjtyPfOQjDAwMMG3aNPr7+72D11qOA6ZJ9fX10dfX1xF3pLf25U0kM8vGAWNm2ThgzCwbB4yZZeOAMbNsHDBmlo0DxsyyccCYWTY+0a7J+I701k7cg2kyw2/Y08l3pLfW54Axs2wcMGaWjQPGzLJxwJhZNg4YM8vGAWNm2ThgzCybbAEj6WhJGyQ9LGmzpOtS+4mS1kh6Ij1PrFhmoaStkrZIujhXbWZWHzl7MHuACyLijcAM4BJJbwEWAGsjYiqwNr1H0tnAHGA6cAlwk6TxGeszs8yyBUwaUXJXejshPQK4DFiW2pcBl6fXlwG3R8SeiNgGbAXOy1WfmeWXdR+MpPGSNgE7KY/seD/QNTR0bHo+Kc0+BXiqYvEdqc3MWlTWix3T0K8zJL0KuEvS66vMPtJVfgddeCNpPjAfoKuri1KpNAaVNrdO+I3trlPXYV2upo6IX0oqUd638oyk7ogYlNRNuXcD5R7LKRWLnQw8PcJnLQGWAPT29kbbD+mxepWHLWl1HbwOcx5Fmpx6Lkg6Bng78ANgJTA3zTYX+EZ6vRKYI+koSacDU4ENueozs/xy9mC6gWXpSNA4YEVE3CPpv4AVkuYBTwJXAkTEZkkrgMeBfcBVaRPLzFpUtoCJiEeAc0Zo/zlw4SjL9AP9uWoys/rymbxmlo0DxsyyccCYWTYOGDPLxgFjZtk4YMwsGweMmWXjgDGzbBwwZpaNA8bMsnHAmFk2Dhgzy8YBY2bZOGDMLBsHjJll44Axs2wcMGaWjQPGzLJxwJhZNjXvySvpaOD3gN8GXgvsBh4DVkXE5rzlmVkrqxowkq4F3gWUgPspj2F0NHAmsDiFzyfSDb7NzA5Qqwfz/Yi4dpRpN0o6CTh1pImSTgH+BXgN8GtgSUR8PoXWHwM/TbN+KiK+mZZZCMwD9gN/FhH3HsJvMbMmUzVgImLV8DZJ44DjIuK5iNjJSyMzDrePcu/mQUnHAxslrUnTPhcRnx32uWcDc4DplDfFvi3pTI+NZNa6Cu3klbRc0islHUt5YLQtkj5ZbZmIGIyIB9Pr54EBqg9mfxlwe0TsiYhtwFbgvCL1mVlzKnoU6eyIeA64HPgm5c2i9xf9Ekk9lAdhuz81XS3pEUlfkTQxtU0BnqpYbAfVA8nMmlzRkR0nSJpAOWC+GBF7JUWRBSUdB9wJfDQinpN0M3A9EOn5BuBDgEZY/KDvkDQfmA/Q1dVFqVQq+BNaVyf8xnbXqeuwaMB8CdgOPAx8V9JpwHO1FkqhdCdwa0R8HSAinqmY/mXgnvR2B3BKxeInA08P/8yIWAIsAejt7Y1Zs2YV/AktavUq2v43trsOXoeFNpEi4gsRMSUiLo2yHwOzqy0jScBSYCAibqxo766Y7QrK59QArATmSDpK0unAVGDDIfwWM2sytc6D+XiN5W+sMu18yvtpHpW0KbV9CuiTNIPy5s924MMAEbFZ0grKO5H3AVf5CJJZa6u1iXR8ej4L+C3KvQwon3z33WoLRsR6Rt6v8s0qy/QD/TVqMrMWUes8mOsAJN0HvCkdbh46w/eO7NWZWUsrepj6VOBXFe9/BfSMeTVm1laKHkX6V2CDpLso7zu5gvJlAGZmoyoUMBHRL2k1MDM1fTAiHspXlpm1g6I9GIBNwODQMpJOjYgncxRl1kreeN19PLt7b9V5ehYcdFnfAU44ZgIPX3PRWJbVFAoFjKSPANcAz1C+0lmUN5XekK80s9bw7O69bF/8zlGnl0qlmifa1QqgVlW0B/PnwFkR8fOcxZhZeyl6FOkp4NmchZhZ+ynag/kRUJK0Ctgz1Fh5CYCZ2XBFA+bJ9DgyPczMaip6mHrojN7jy29jV9aqzKwtFL2j3eslPUT5yufNkjZKmp63NDNrdUV38i4BPh4Rp0XEacAngC/nK8vM2kHRgDk2ItYNvYmIEnBslorMrG0UPook6W8oX5ME8D5gW56SzKxdFO3BfAiYDHw9PSYBH8xVlJm1h6JHkX4B/FnmWsyszRQ9irRG0qsq3k+U5FEXzayqoptIkyLil0NvUo/mpCwVmVnbKBowv5b04hjUadiSQuMimVnnKnoUaRGwXtK/p/e/Qxr8zMxsNEXHRVoNvAn4GrACODciqu6DkXSKpHWSBiRtlvTnqf3EtE/nifQ8sWKZhZK2Stoi6eLD/1lm1gyK7uQVcAnlkQXuBl4hqdbA9PuAT0TENOAtwFWSzgYWAGsjYiqwNr0nTZsDTE/fdZOk8Yfxm8ysSRTdB3MT8FagL71/HvjHagtExGBEPJhePw8MUB7M/jJgWZptGeXxrkntt0fEnojYBmwFaoWYmTWxogHz5oi4Cvg/ePEoUuHbNkjqAc4B7ge6ImIwfc4gLx2NmkL5xlZDdqQ2M2tRRXfy7k2bKwEgaTLw6yILSjoOuBP4aEQ8V97aGnnWEdoOOlIlaT5pB3NXVxelUqlIGS2tE35jq6u2jnbt2lVoHbbjei4aMF8A7gJOktQPvAf461oLSZpAOVxujYivp+ZnJHVHxKCkbmBnat8BnFKx+MnA08M/MyKWUL66m97e3qh1M+WWt3pVzRtGW4PVWEdFbvrdruu56FGkW4G/BD5NeeiSyyOi6tCxacfwUmBg2K01VwJz0+u5wDcq2udIOkrS6cBUYEPRH2JmzafosCVnANsi4h8lzQJ+V9Jg5dm9IzgfeD/wqKRNqe1TwGJghaR5lG/DeSVARGyWtAJ4nPIRqKsiYv8h/yIzaxpFN5HuBHolvQ74Z+BuYDlw6WgLRMR6Rt6vAnDhKMv0A/0FazKzJlf4UoGI2Ae8G/h8RHwM6M5Xlpm1g6IBs1dSH/AB4J7UNiFPSWbWLooGzAcpn2jXHxHb0k7Yr+Yry8zaQdEbTj1OxQ2n0pm2i3MVZWbtoWgPxszskDlgzCwbB4yZZVP0RLszgU8Cp1UuExEXZKrLzNpA0RPt7gD+ifJojj671swKKRow+yLi5qyVmFnbKboP5m5JfyqpO93y8kRJJ2atzMxaXtEezNDVz5+saAvgN8a2nM7yxuvu49nde2vO17Ng1ajTTjhmAg9fc9FYlmU2ZoqeaHd67kI60bO797J98TurzlPrXiLVwses0aoGjKQLIuI7kt490vSKm0iZmR2kVg/mbcB3gHeNMC0AB4yZjapqwETENen5g/Upx8zaSdWjSJLeJ2nUeSSdIWnm2JdlZu2g1ibSq4GHJG0ENgI/BY4GXkd58+lnpIHTzMyGq7WJ9HlJXwQuoHyP3TcAuykPovb+iHgyf4lm1qpqHqZON95ekx5mZoX5amozyyZbwEj6iqSdkh6raLtW0k8kbUqPSyumLZS0VdIWSRfnqsvM6idnD+YW4JIR2j8XETPS45sAks4G5gDT0zI3paFqzayFFQoYSV2Slkr6Vnp/dho4bVQR8V3gfwvWcRlwe0TsSff73QqcV3BZM2tSRXswtwD3Aq9N7/8H+OhhfufVkh5Jm1ATU9sU4KmKeXakNjNrYUWvpp4UESskLQSIiH2SDufGUzcD11O+zOB64AbgQ4w8AmSM9AGS5gPzAbq6uiiVSodRRvOoVf+uXbtqztPq/w3aQbV1UGQd1vqMVlU0YF6Q9GrSP3pJbwGePdQvi4hnhl5L+jIvDeK2AzilYtaTgadH+YwlwBKA3t7eqHalcdNbvarqldJQ+2rqIp9hmdVYBzXXYYHPaFVFA+bjwErgDEn/AUwG3nOoXyapOyIG09srgKEjTCuB5ZJupLwZNhXYcKifb9YIx09bwG8uq3FC+7JanwFQ/dYdrajo/WAelPQ24CzKmzNbIqLqnZIk3QbMAiZJ2gFcA8ySNINyT2g78OH0+ZslrQAeB/YBV6UT/Mya3vMDi6ve16dID6Zd7+tTdFSB8cClQE9a5iJJRMSNoy0TEX0jNC+tMn8/0F+kHjNrDUU3ke4G/g94FPh1vnLMrJ0UDZiTI+INWSsxs7ZT9DyYb0nynaXN7JAU7cH8N3BXuvnUXso7eiMiXpmtMjNreUUD5gbgrcCjETHiCXBmZsMV3UR6AnjM4WJmh6JoD2YQKKWLHfcMNVY7TG1mVjRgtqXHkelhZlZT0TN5r8tdiJm1n1ojO34xIq6WdDcjXN0cEb+frTIza3m1ejAfAK4GPluHWsyszdQKmB8CRMS/16EWM2sztQJmsqSPjzbRR5HMrJpaATMeOI6R7zhnZlZVrYAZjIi/rUslZtZ2ap3J656LmR22WgFzYV2qMLO2VDVgIqLouEZmZgfx2NRmlo0DxsyyccCYWTbZAiYNDbtT0mMVbSdKWiPpifQ8sWLaQklbJW2RdHGuusysfnL2YG4BLhnWtgBYGxFTgbXpPZLOBuYA09MyN6WhUsyshWULmIj4LjD8KNRlvDTG3TLg8or22yNiT0RsA7YC5+Wqzczqo977YLqGho5Nzyel9inAUxXz7UhtZtbCit7RLreRzhge8f6/kuYD8wG6uroolUoZy8qvVv27du2qOU+r/zdoB9XWQZF1WOszWlW9A+YZSd0RMSipG9iZ2ncAp1TMdzLw9EgfEBFLgCUAvb29UWvM36a2elXNMYtrjmtc4DMssxrroMjY1O26Huu9ibQSmJtezwW+UdE+R9JRkk4HpgIb6lybmY2xbD0YSbcBs4BJknYA1wCLgRWS5gFPAlcCRMRmSSuAx4F9wFURsT9Xbc3i+GkL+M1lC2rPuGz0ScdPA3jnWJVkNqayBUxE9I0yacQLKCOiH+jPVU8zen5gMdsXVw+HWt3rngWrxrgqs7HjM3nNLBsHjJll44Axs2wcMGaWjQPGzLJxwJhZNg4YM8umWa5FMmtpNc9HWl19+gnHTBjDapqHA8bsZap1smTPglU152lX3kQys2wcMGaWjQPGzLJxwJhZNg4YM8vGAWNm2fgwdYMVup9LlXMo2vX8CWsPDpgGKnJuRCefQ2Gtz5tIZpaNA8bMsnHAmFk2Dhgzy6YhO3klbQeeB/YD+yKiV9KJwNeAHmA78N6I+EUj6jOzsdHIHszsiJgREb3p/QJgbURMBdam92bWwpppE+kyXhpibBlweeNKMbOx0KiACeA+SRvTYPYAXRExCJCeT2pQbWY2Rhp1ot35EfG0pJOANZJ+UHTBFEjzAbq6uiiVSplKbB6d8BvbXaeuw4YETEQ8nZ53SroLOA94RlJ3RAxK6gZ2jrLsEmAJQG9vb1QbVrUtrF5VdehYawEdvA7rvokk6VhJxw+9Bi4CHgNWAnPTbHOBb9S7NjMbW43owXQBd0ka+v7lEbFa0veBFZLmAU8CVzagNjMbQ3UPmIj4EfDGEdp/DlxY73rMxlr643lg22cOfB8RdaqmsZrpMLVZW4iIAx7r1q07qK1TOGDMLBsHjJll44Axs2wcMGaWjQPGzLJxwJhZNg4YM8vGAWNm2ThgzCwbB4yZZeOAMbNsHDBmlo2Hjm0yvhLX2ol7ME3GV+JaO3HAmFk2Dhgzy8YBY2bZOGDMLBsHjJll44Axs2wcMGaWjQPGzLJRK5+4JemnwI8bXUdmk4CfNboIe1k6YR2eFhGThze2dMB0AkkPRERvo+uww9fJ69CbSGaWjQPGzLJxwDS/JY0uwF62jl2H3gdjZtm4B2Nm2ThgMpO0X9ImSY9JukPSKzJ/32RJeyV9OOf3dJJ6rkNJJUlb0vcNSJqf67vqwQGT3+6ImBERrwd+BfxJ5u+7EvhvoC/z93SSeq/DP4yIGcD5wGckHZn5+7JxwNTX94DXSXqXpPslPSTp25K64MXexxpJD0r6kqQfS5qUpr1P0ob0l+1LksaP8h19wCeAkyVNqc/P6ij1WIdDjgNeAPbn/Un5OGDqRNIRwDuAR4H1wFsi4hzgduAv02zXAN+JiDcBdwGnpmWnAX8AnJ/+su0H/nCE7zgFeE1EbABWpGVsjNRjHSa3SnoE2AJcHxEtGzC+6Xd+x0jalF5/D1gKnAV8TVI3cCSwLU2fCVwBEBGrJf0itV8InAt8P90U/Bhg5wjfNYdysED5f/qlwI1j+WM6VD3XIZQ3kR6QNBn4T0mrI6IlL4lxwOS3O/3FepGkfwBujIiVkmYB1w5NGuUzBCyLiIU1vqsP6JI09JfxtZKmRsQTh1O4vaie6/BFEfFTSQ8Cb6ZFr7nzJlJjnAD8JL2eW9G+HngvgKSLgImpfS3wHkknpWknSjqt8gMlnQUcGxFTIqInInqAT1Pu1djYG/N1OFw6WnUO8MMxrLuuHDCNcS1wh6TvceBVttcBF6W/Wu8ABoHnI+Jx4K+B+9K2+Rqge9hn9lHe5q90Jz6alMu1jP06HHJr2iTbCNwSERvz/IT8fCZvE5F0FLA/IvZJeitw8/CuuTU3r8MDeR9MczkVWCFpHOXzLf64wfXYofM6rOAejJll430wZpaNA8bMsnHAmFk2Dhgzy8YBY2bZOGDMLJv/BzgriICqvDu0AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Understanding the data visually\n", "ax = session_times.boxplot(by='Page', column='Time', figsize=(4, 4))\n", "ax.set_xlabel('')\n", "ax.set_ylabel('Time (in seconds)')\n", "plt.suptitle('')\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "35.66666666666667\n" ] } ], "source": [ "# We will use \"mean\" as the statistics\n", "mean_a = session_times[session_times.Page == 'Page A'].Time.mean()\n", "mean_b = session_times[session_times.Page == 'Page B'].Time.mean()\n", "print(mean_b - mean_a)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "-8.790476190476198\n" ] } ], "source": [ "# Permutation test example with stickiness\n", "# Creating the permutation functionality\n", "def perm_fun(x, nA, nB):\n", " n = nA + nB\n", " idx_B = set(random.sample(range(n), nB))\n", " idx_A = set(range(n)) - idx_B\n", " return x.loc[idx_B].mean() - x.loc[idx_A].mean()\n", " \n", "nA = session_times[session_times.Page == 'Page A'].shape[0]\n", "nB = session_times[session_times.Page == 'Page B'].shape[0]\n", "print(perm_fun(session_times.Time, nA, nB))" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAD3CAYAAAAjdY4DAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAf1klEQVR4nO3deZgU1b3/8fcno7K4RAlocLsoQQx4BVlMjOIS3KIEXIOJCy758YtmwRgNIokx3oc85mbx6tVco4lRI5HF3WiiiPsSDaMo4I4rkasTiYrADxz5/v6oM9jgDNMzTHU1M5/X88zTVaerq751uvs7p09VnVJEYGZmlfepogMwM+uonIDNzAriBGxmVhAnYDOzgjgBm5kVxAnYzKwgGxQdwLro3r179OrVq+gwzKpCbW0tgwcPLjoMa0Rtbe0/I6LHmuXK6zxgSVcCI4C3I2KXVPYL4KvACmA+cFJEvJuemwCcAnwEfC8i7mxuG0OGDIlZs2blEr/Z+kYSPq+/OkmqjYgha5bn2QVxFXDwGmUzgF0iYlfgBWBCCq4fcAzQP73mN5JqcozNzKxwuSXgiHgAWLRG2V0RUZ9m/wZsm6ZHAVMiYnlEvAK8BOyeV2xmZtWgyINwJwN/SdPbAG+UPLcglX2CpLGSZkmaVVdXl3OIZmb5KSQBS5oI1AOTG4oaWazRzqyIuDwihkTEkB49PtGnbWa23qj4WRCSxpAdnBseHx8xWABsV7LYtsCblY7NzKySKtoClnQwMB4YGRFLS566FThGUidJOwB9gMcrGZuZWaXl1gKWdB2wL9Bd0gLgJ2RnPXQCZkgC+FtEfCsi5kmaBjxD1jXx7Yj4KK/YzMyqQW7nAVeCzwO2dTF//vyyl+3du3eOkbQNnwdcvYo4D9jMzNbCCdjMrCBOwGZmBXECNjMryHo9GppZpbW3A3dWLLeAzcwK4gRsZlYQJ2Azs4I4AZt1AAsWLGDUqFH06dOH3r17M27cOFasWMFVV13Fd77znaLD+4RNNtmk6BAqwgnYrJ2LCI444ggOO+wwXnzxRV544QU++OADJk6cmMv26uvrm1/IACdgs3bvnnvuoXPnzpx00kkA1NTUcOGFF3LllVeydOlS3njjDQ4++GD69u3LT3/6UwCWLFnCoYceyoABA9hll12YOnUqkN13bp999mHw4MEcdNBBLFy4EIB9992Xc845h3322YdJkybRq1cvVq5cCcDSpUvZbrvt+PDDD5k/fz4HH3wwgwcPZtiwYTz33HMAvPLKK+yxxx4MHTqUH//4x5WuosL4NDRrF3x6WNPmzZv3iZt1brbZZmy//fbU19fz+OOPM3fuXLp27crQoUM59NBDee2119h66625/fbbAXjvvff48MMP+e53v8stt9xCjx49mDp1KhMnTuTKK68E4N133+X+++8H4IknnuD+++9nv/3247bbbuOggw5iww03ZOzYsVx22WX06dOHxx57jNNOO4177rmHcePGceqpp3LCCSdw6aWXVraCCuQEbNbORQRp9MFGyw844AA+85nPAHDEEUfw0EMPccghh3DmmWcyfvx4RowYwbBhw5g7dy5z587lgAMOAOCjjz6iZ8+eq9Y3evTo1aanTp3Kfvvtx5QpUzjttNP44IMPeOSRRzj66KNXLbd8+XIAHn74YW644QYAjj/+eMaPH9/2FVGFnIDN2rn+/fuvSm4N3n//fd544w1qamo+kZwlsdNOO1FbW8sdd9zBhAkTOPDAAzn88MPp378/jz76aKPb2XjjjVdNjxw5kgkTJrBo0SJqa2v58pe/zJIlS9h8882ZPXt2o69v7J9Ee+c+YLN2bvjw4SxdupRrrrkGyFquP/jBDzjxxBPp2rUrM2bMYNGiRSxbtoybb76ZPffckzfffJOuXbty3HHHceaZZ/LEE0/Qt29f6urqViXgDz/8kHnz5jW6zU022YTdd9+dcePGMWLECGpqathss83YYYcdmD59OpC1wJ966ikA9txzT6ZMmQLA5MmTG11ne+QEbNbOSeKmm25i+vTp9OnTh5122onOnTvzs5/9DIC99tqL448/noEDB3LkkUcyZMgQ5syZw+67787AgQOZNGkSP/rRj9hoo424/vrrGT9+PAMGDGDgwIE88sgjTW539OjRXHvttat1TUyePJnf//73DBgwgP79+3PLLbcAcNFFF3HppZcydOhQ3nvvvXwrpIp4QHZrF1pzEK5Sr6kUD8hevTwgu5lZlXECNjMriBOwmVlBnIDNOpjzzjuPX/7yl5x77rncfffdADz44IP079+fgQMHsmzZMs466yz69+/PWWedVXC07ZvPAzbroM4///xV05MnT+bMM89cdbnyb3/7W+rq6ujUqVNZ66qvr2eDDZxOWsotYLMOYNKkSfTt25f999+f559/HoATTzyR66+/nt/97ndMmzaN888/n2OPPZaRI0eyZMkSvvCFLzB16lTq6uo48sgjGTp0KEOHDuXhhx8Gspb02LFjOfDAAznhhBPWutzJJ5/Mvvvuy4477sjFF1+8Kq5rrrmGXXfdlQEDBnD88ccDNLme9sj/sszaudraWqZMmcKTTz5JfX09gwYNWm1siG9+85s89NBDjBgxgqOOOgrILqRouGLtG9/4Bt///vfZa6+9eP311znooIN49tlnV637oYceokuXLmtd7rnnnuPee+9l8eLF9O3bl1NPPZUXXniBSZMm8fDDD9O9e3cWLVoEwLhx45pcT3uTWwKWdCUwAng7InZJZd2AqUAv4FXgaxHxr/TcBOAU4CPgexFxZ16xmXUkDz74IIcffjhdu3YFssuEW+Luu+/mmWeeWTX//vvvs3jx4lXr6tKlS7PLHXrooXTq1IlOnTqx5ZZb8tZbb3HPPfdw1FFH0b17dwC6deu21vVsuummLd31qpdnC/gq4BLgmpKys4GZEXGBpLPT/HhJ/YBjgP7A1sDdknaKiI9yjM+qVDVf7LC+WpdxFlauXMmjjz66KtGWKh3/YW3LlfYl19TUUF9f3+QgQWtbT3uTWx9wRDwALFqjeBRwdZq+GjispHxKRCyPiFeAl4Dd84rNrCPZe++9uemmm1i2bBmLFy/mtttua9HrDzzwQC655JJV800NplPucg2GDx/OtGnTeOeddwBWdUG0dD3rs0ofhNsqIhYCpMctU/k2wBslyy1IZZ8gaaykWZJm1dXV5RqsWXswaNAgRo8evWqsh2HDhrXo9RdffDGzZs1i1113pV+/flx22WXrtFyD/v37M3HiRPbZZx8GDBjAGWec0ar1rM9yHQtCUi/gzyV9wO9GxOYlz/8rIraQdCnwaERcm8p/D9wRETc0stpVPBZE+1TN4zpUc/eIx4KoXtUyFsRbknqmgHoCb6fyBcB2JcttC7xZ4djMzCqq0gn4VmBMmh4D3FJSfoykTpJ2APoAj1c4NjOzisrzNLTrgH2B7pIWAD8BLgCmSToFeB04GiAi5kmaBjwD1APf9hkQZtbe5ZaAI+LrTTw1vInlJwGT8orHzKza+FJkM7OCOAGbmRXECdjMrCBOwGZmBXECNjMriBOwmVlBPB6w5aqaL901K5oTsFnO/E/ImuIuCDOzgjgBm5kVxAnYzKwgTsBmZgVxAjYzK4gTsJlZQZyAzcwK4gRsZlYQJ2Azs4I4AZuZFcQJ2MysIE7AZmYFcQI2MyuIE7CZWUGcgM3MCuIEbGZWkEISsKTvS5onaa6k6yR1ltRN0gxJL6bHLYqIzcysUiqegCVtA3wPGBIRuwA1wDHA2cDMiOgDzEzzZmbtVlFdEBsAXSRtAHQF3gRGAVen568GDismNDOzyqh4Ao6IfwC/BF4HFgLvRcRdwFYRsTAtsxDYsrHXSxoraZakWXV1dZUK28yszRXRBbEFWWt3B2BrYGNJx5X7+oi4PCKGRMSQHj165BWmmVnuiuiC2B94JSLqIuJD4EbgS8BbknoCpMe3C4jNzKxiikjArwNflNRVkoDhwLPArcCYtMwY4JYCYjMzq5gNKr3BiHhM0vXAE0A98CRwObAJME3SKWRJ+uhKx2ZmVkkVT8AAEfET4CdrFC8naw2bmXUIvhLOzKwgZSVgSbvkHYiZWUdTbgv4MkmPSzpN0uZ5BmRm1lGUlYAjYi/gWGA7YJakP0k6INfIzMzaubL7gCPiReBHwHhgH+BiSc9JOiKv4MzM2rNy+4B3lXQh2fm6Xwa+GhGfT9MX5hifmVm7Ve5paJcAVwDnRMSyhsKIeFPSj3KJzKwDmz9/ftnL9u7dO8dILE/lJuBDgGUR8RGApE8BnSNiaUT8MbfozMzasXL7gO8GupTMd01lZmbWSuUm4M4R8UHDTJrumk9IZmYdQ7kJeImkQQ0zkgYDy9ayvJmZNaPcPuDTgemS3kzzPYHRuURkZtZBlJWAI+LvknYG+gICnktj+ZqZWSu1ZDS0oUCv9JrdJBER1+QSlZlZB1BWApb0R6A3MBv4KBUH4ARsZtZK5baAhwD9IiLyDMbMrCMp9yyIucBn8wzEzKyjKbcF3B14RtLjZHeuACAiRuYSlZlZB1BuAj4vzyDMzDqick9Du1/SvwF9IuJuSV2BmnxDMzNr38odjvL/ANcDv01F2wA35xSTmVmHUO5BuG8DewLvw6rB2bfMKygzs46g3D7g5RGxQhIAkjYgOw/YOhCPUWvWtsptAd8v6RygS7oX3HTgtvzCMjNr/8pNwGcDdcAc4P8Cd5DdH65VJG0u6fp0T7lnJe0hqZukGZJeTI9btHb9Zmbrg3LvirwyIq6IiKMj4qg0vS5dEBcBf42InYEBZPeaOxuYGRF9gJlp3sys3Sp3LIhXaKTPNyJ2bOkGJW0G7A2cmNaxAlghaRSwb1rsauA+sjswm5m1Sy0ZC6JBZ+BooFsrt7kjWXfGHyQNAGqBccBWEbEQICIWSmr0LAtJY4GxANtvv30rQzAzK165XRDvlPz9IyL+i+yW9K2xATAI+J+I2A1YQgu6GyLi8ogYEhFDevTo0coQzMyKV24XxKCS2U+RtYg3beU2FwALIuKxNH89WQJ+S1LP1PrtCbzdyvWbma0Xyu2C+FXJdD3wKvC11mwwIv5X0huS+kbE88Bw4Jn0Nwa4ID3e0pr1m5mtL8odC2K/Nt7ud4HJkjYCXgZOImtZT5N0CvA6WT+zmVm7VW4XxBlrez4ift2SjUbEbFY/sNdgeEvWY2a2PmvJWRBDgVvT/FeBB4A38gjKzKwjaMmA7IMiYjGApPOA6RHxzbwCMzNr78q9FHl7YEXJ/AqyOySbmVkrldsC/iPwuKSbyK6IOxzfEdnMbJ2UexbEJEl/AYalopMi4sn8wjIza//K7YIA6Aq8HxEXAQsk7ZBTTGZmHUK5tyT6CdnAOBNS0YbAtXkFZWbWEZTbB3w4sBvwBEBEvCmptZcim1lOyr1rie9YUh3K7YJYkcb/DQBJG+cXkplZx1BuAp4m6bfA5ukOyXcDV+QXlplZ+9dsF4SyO3FOBXYmuytyX+DciJiRc2xmZu1aswk4IkLSzRExGHDSNTNrI+V2QfxN0tBcIzEz62DKPQtiP+Bbkl4lu4OFyBrHu+YVmJlZe7fWBCxp+4h4HfhKheIxM+swmmsB30w2Ctprkm6IiCMrEJOZWYfQXB+wSqZbfAt6MzNrWnMJOJqYNjOzddRcF8QASe+TtYS7pGn4+CDcZrlGZ2bWjq01AUdETaUCMTPraFoyHKWZmbUhJ2Azs4I4AZuZFcQJ2MysIIUlYEk1kp6U9Oc0303SDEkvpsctiorNzKwSyh0LIg/jgGeBhlPZzgZmRsQFks5O8+OLCq69K/fOCeC7J5jlpZAWsKRtgUOB35UUjwKuTtNXA4dVOCwzs4oqqgviv4AfAitLyraKiIUA6XHLxl4oaaykWZJm1dXV5R6omVleKp6AJY0A3o6I2ta8PiIuj4ghETGkR48ebRydmVnlFNEHvCcwUtIhQGdgM0nXAm9J6hkRCyX1BN4uIDYzs4qpeAs4IiZExLYR0Qs4BrgnIo4DbgXGpMXGALdUOjYzs0qqpvOALwAOkPQicECaNzNrt4o8DY2IuA+4L02/AwwvMh4zs0qqphawmVmH4gRsZlYQJ2Azs4I4AZuZFcQJ2MysIIWeBWFtwwPrmK2f3AI2MyuIE7CZWUGcgM3MCuIEbGZWEB+EM+ugWnLwFnwANw9uAZuZFcQJ2MysIE7AZmYFcQI2MyuIE7CZWUGcgM3MCuIEbGZWECdgM7OCOAGbmRXECdjMrCBOwGZmBXECNjMriBOwmVlBKj4amqTtgGuAzwIrgcsj4iJJ3YCpQC/gVeBrEfGvSsdnZk3zCGptq4gWcD3wg4j4PPBF4NuS+gFnAzMjog8wM82bmbVbFU/AEbEwIp5I04uBZ4FtgFHA1Wmxq4HDKh2bmVklFdoHLKkXsBvwGLBVRCyELEkDWzbxmrGSZkmaVVdXV7FYzczaWmEJWNImwA3A6RHxfrmvi4jLI2JIRAzp0aNHfgGameWskAQsaUOy5Ds5Im5MxW9J6pme7wm8XURsZmaVUsRZEAJ+DzwbEb8ueepWYAxwQXq8pdKxVYOWHGX2EWaz9VsRN+XcEzgemCNpdio7hyzxTpN0CvA6cHQBsZmZVUzFE3BEPASoiaeHVzIWM7Mi+Uo4M7OCOAGbmRXECdjMrCBOwGZmBXECNjMriBOwmVlBnIDNzApSxIUYHYbHTjWztXEL2MysIE7AZmYFcReEmeXKXXFNcwvYzKwgTsBmZgVxAjYzK4gTsJlZQXwQzsyqTkc5cOcWsJlZQZyAzcwK4gRsZlYQ9wGXqaP0SZlZ5bgFbGZWECdgM7OCuAvCzNqF1nQTFt21WHUtYEkHS3pe0kuSzi46HjOzvFRVApZUA1wKfAXoB3xdUr9iozIzy0e1dUHsDrwUES8DSJoCjAKeacuNFP2zw8wMqqwFDGwDvFEyvyCVmZm1O9XWAlYjZbHaAtJYYGya/UDS802sqzvwzzaMbV04lk+qljigHcXyuc99rg1DqZp6qZY4oPWx/FtjhdWWgBcA25XMbwu8WbpARFwOXN7ciiTNioghbRte6ziW6o0DHEtTqiWWaokD2j6WauuC+DvQR9IOkjYCjgFuLTgmM7NcVFULOCLqJX0HuBOoAa6MiHkFh2VmlouqSsAAEXEHcEcbrKrZbooKciyfVC1xgGNpSrXEUi1xQBvHoohofikzM2tz1dYHbGbWYaz3CVjS0ZLmSVopacgaz01IlzQ/L+mgkvLBkuak5y6W1Njpb+sa11RJs9Pfq5Jmp/JekpaVPHdZW2+7kVjOk/SPkm0eUvJco3WUYyy/kPScpKcl3SRp81Re8XpJ2y3k0ndJ20m6V9Kz6fM7LpU3+V7lHM+r6TsxW9KsVNZN0gxJL6bHLSoQR9+SfZ8t6X1Jp1eqXiRdKeltSXNLypqsh3X+/kTEev0HfB7oC9wHDCkp7wc8BXQCdgDmAzXpuceBPcjOO/4L8JWcY/wVcG6a7gXMrXAdnQec2Uh5k3WUYywHAhuk6Z8DPy+wXmrSPu8IbJTqol+Ftt0TGJSmNwVeSO9Ho+9VBeJ5Fei+Rtl/Amen6bMb3qsKvz//S3YObUXqBdgbGFT6WWyqHtri+7Pet4Aj4tmIaOxijFHAlIhYHhGvAC8Bu0vqCWwWEY9GVovXAIflFV9qXX8NuC6vbayDRusozw1GxF0RUZ9m/0Z2rndRVl36HhErgIZL33MXEQsj4ok0vRh4luq76nMUcHWavpocvydNGA7Mj4jXKrXBiHgAWLRGcVP1sM7fn/U+Aa9FU5c1b5Om1yzPyzDgrYh4saRsB0lPSrpf0rAct13qO+ln/5UlP6GKvvT7ZLJfIA0qXS9F7z+Qdb8AuwGPpaLG3qu8BXCXpNp0tSnAVhGxELJ/GMCWFYqlwTGs3nApol6g6XpY58/PepGAJd0taW4jf2trrTR1WXOzlzu3cVxfZ/UP0UJg+4jYDTgD+JOkzVqz/RbE8j9Ab2Bg2v6vGl7WyKrW+bSYcupF0kSgHpicinKpl+ZCbaSsoqcFSdoEuAE4PSLep+n3Km97RsQgspEIvy1p7wptt1HKLsQaCUxPRUXVy9qs8+en6s4DbkxE7N+KlzV1WfMCVv/Z+4nLndsqLkkbAEcAg0tesxxYnqZrJc0HdgJmtSaGcmMpiekK4M9pttlLv/OIRdIYYAQwPHUD5VYvzchl/8slaUOy5Ds5Im4EiIi3Sp4vfa9yFRFvpse3Jd1E9lP6LUk9I2Jh6rp7uxKxJF8Bnmioj6LqJWmqHtb587NetIBb6VbgGEmdJO0A9AEeTz8hFkv6YuqfPQG4JacY9geei4hVXR6Seigb9xhJO6a4Xs5p+w3b7FkyezjQcIS30TrKOZaDgfHAyIhYWlJe8XqhwEvf02fv98CzEfHrkvKm3qs8Y9lY0qYN02QHSueS1cWYtNgY8vueNGa1X45F1EuJpuph3b8/lTyqmdNRy8PJ/hMtB94C7ix5biLZkcnnKTnTARhC9gbOBy4hXZCSQ2xXAd9ao+xIYB7Z0dMngK9WoI7+CMwBnk4fmp7N1VGOsbxE1m82O/1dVlS9pO0eQnYGwnxgYiW2mba7F9nP1adL6uKQtb1XOcayY6r3p9J7MDGVfwaYCbyYHrtVqG66Au8Any4pq0i9kCX9hcCHKa+csrZ6WNfvj6+EMzMrSHvugjAzq2pOwGZmBXECNjMriBOwmVlBnIDNzAriBGxmVhAn4ConaaKy4QqfTsPwfaGN1ru1pOvbaF2nS+paMn+H0jCTeZF0laSj0vTvJPVL00crG+Lx3jR/Xaq77+cZT1uT1FPSn9P0EEkXFx1TSygbXrTJiyUkbSTpgXS1aIfVoXe+2knag+yS3UERsVxSd7JhE9dZZJeeHtUW6wJOB64FlqZ1V2QM2wYR8c2S2VOA0yLiXkmfBb4UEY3eErwxkjaIj0drK9IZwBUAETGLfC/JrriIWCFpJjCaj8cD6XDcAq5uPYF/RjZOAhHxz5Q4GwaVvz+NXnVnw6Wakr4n6ZnU6puSyvbRxwNZPylp09IWiqTOkv6gbEDuJyXtl8pPlHSjpL8qG4z6P9cMUNL3gK2Be0tana9K6p628Vxqoc6VNFnS/pIeTuvbPS2/sbIRrv6etv+JQZaUuSTt2+2UjMwl6b7USjyX7AqzyyT9ArgL2DLt9zBJvdO+1Ep6UNLO6fVXSfp1iv/nzSx3saRHJL3c0AJPz/0w1d9Tki5IZU2t5+hUH09JeqCJ9/5I4K9p+X1LWsPnpbq6L8XwvUbqqibFOjfF9P1m4tlK2eD4T6W/L6XyM/TxQEqnp7Jeyn5hXKHsl9ldkrqUfCafkvQo8O2SePpLejy9D09L6pOeuhk4ton97xgqdeml/1p1WeQmZJeovgD8BtgnlW8IPAL0SPOjye4gDdlgIJ3S9Obp8Tay0a4a1rkBJQOgAz8A/pCmdwZeBzoDJ5KNx/DpNP8asF0jcb5KyWDeDfNpG/XAv5P9s68FriQbRWoUcHNa/mfAcQ0xp/3deI1tHAHMIBuke2vgXeCo9Nx9pMH415hetY9pfibQJ01/AbgnTV9FNrhLTRnLTU/70o9sLGHIBo55BOia5rs1s545wDal79Ea+7oDUFsyvy/w5zR9XtpWp1TH7wAbrvH6wcCMkvnNm4lnKtlobKT6/XRaxxxgY7LPzDyyITMb3tOBaflpJe/d03z8Gf0FH3++/hs4Nk1vBHQp2VZd0d+zIv/cBVHFIuIDSYPJxhTeD5iq7LY5s4BdgBnK7qZUQ3b9OmRfgsmSbiZrYQA8DPxa0mTgxohYoNXvwrQX2ZeEiHhO0mtkI5EBzIyI9wAkPUN2d4LSMVCb80pEzEmvn5fWF5LmkH2ZIRv8ZaSkM9N8Z2B7skHKG+wNXBcRHwFvSrqnBTE0DPv4JWB6yb53KllkekR8VMZyN0fESuAZSVulsv3J/oE1dMEsamY9DwNXSZoG3NhIuD2BurXszu2RRo+T9DawFauPcf0ysKOk/wZuJxvnd23xfJlsUCpS/b4naS/gpohYAiDpRrLP4a1k7+ns9NpaoJekT5Ml+vtT+R/J/jEBPApMlLQt2efvxYZtSVohadPIBqXvcJyAq1z6QtwH3JeS1hiyD/28iNijkZccSpasRgI/ltQ/Ii5IP9sPAf4maX/g/5W8Zm33xFteMv0RLf/MlL5+Zcn8ypJ1CTgyGr+zSal1GbjkU8C7ETGwieeXlLlc6f6o5HHN2JpcT0R8S9nB1EOB2ZIGRsQ7JYssI/sn1JS1vicR8S9JA4CDyLoCvkbWT7+2/VpTSz4TXWi8Dhri+ZOkx8j2905J34yIhn+gnVj9s9ihuA+4iim7QWGfkqKBZN0AzwM9lB2kQ9KGqZ/tU2RdBPcCPyT7Ob+JpN4RMScifk7Wet55jU09QOqLk7QTWeuzuWRYajHZfc1a607gu0pNM0m7NbLMA2RD/9Uo6+/eryUbiGyw81ckHZ22oZSkWrXcGu4CTlY6E0RSt7WtJ70fj0XEucA/WX1MWci6YHq1ZP9KKTtY+6mIuAH4MdlB3LXt10zg1FReo2wg/AeAwyR1VTZE5eHAg01tMyLe5eOWM5T07SobXvTliLiYrAW9ayr/DFkXxIet3df1nRNwddsEuFrpoBrppo2R3b/sKLIDRk+R9RN/iawr4trUUn4SuDB9MU5PB1KeImtd/WWN7fwGqEmvmwqcmH7iluty4C9KB+Fa4T/I+rWfVnZg8D8aWeYmsuEA55DdHeH+RpZpzrHAKake5tH0/d/KXQ6AiPgrWWKZpezu1w1dKU2t5xfKDo7NJUt0T62xviXAfEmfa+H+NdiG7BfTbLJ+6wnNxDMO2C+9/7VA/8juV3cV2fi2jwG/i4gnm9nuScCl6SDcspLy0cDcFM/OZPdhhOyf6B2t28X2wcNRmlUhSYcDgyPiR0XHkpfUrzyhjK6ndst9wGZVKCJuSj/R2yVldyC5uSMnX3AL2MysMO4DNjMriBOwmVlBnIDNzAriBGxmVhAnYDOzgvx/DPu5zTAa0MoAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Repeating the permutation experiment R times\n", "R=1000\n", "random.seed(1) # Using a seed helps make the randomized expeirments deterministic\n", "perm_diffs = [perm_fun(session_times.Time, nA, nB) for _ in range(R)]\n", "\n", "fig, ax = plt.subplots(figsize=(5, 3.54))\n", "ax.hist(perm_diffs, bins=21, rwidth=0.9,facecolor='gainsboro')\n", "ax.axvline(x = mean_b - mean_a, color='black', lw=1)\n", "ax.text(40, 100, 'Observed\\ndifference')\n", "ax.set_xlabel('Session time differences (in seconds)')\n", "ax.set_ylabel('Frequency')\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### p-value\n", "\n", "In what fraction of permutations does the difference in means exceed the observed difference?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.121" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len([x for x in perm_diffs if x > (mean_b - mean_a)])/len(perm_diffs)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "In what fraction of permutations does the difference in means exceed the observed difference?\n", "\n", "> A rather large fraction! \n", "\n", "For the observed difference to be \"meaningful\", it needs to be outside the range of chance variation. Otherwise, it is NOT statistically significant. \n", "\n", "> Therefore, we do NOT reject the Null hypothesis" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> Smaller the p-value, stronger the evidence to reject the Null hypothesis\n", "\n", "\"Statistical\n", "\n", "Image source: https://www.simplypsychology.org/p-value.html" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Level of confidence, singnificance & p-value\n", "\n", "- Choose the rigour of your statistical test upfront:\n", " - Level of confidence: $C$\n", " - Level of significance $\\alpha=1-C$\n", " > If p-value is smaller than $\\alpha$ then reject Null hypothesis! " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "What p-value represents:\n", "> The probability that, given a chance model, results as extreme as the observed results\n", "could occur.\n", "\n", "What p-value does NOT represent:\n", "> The probability that the result is due to chance. (which is what, one would ideally like to know!)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Errors\n", "\n", "> Type-1 error: Mistakenly conclude an effect is real (and thus, reject the Null hypothesis), when it is really just due to chance\n", "> - Related to the concept of precision (more nuances apply)\n", " \n", "\n", "> Type-2 error: Mistakenly conclude that an effect is not real, i.e., due to chance (and thus fail to reject the Null hypothesis), when it actually is real\n", "> - In the context of hypothesis testing, generally an issue of inadequate data\n", "> - Complement of recall" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### ANOVA: Analysis of Variance \n", "\n", "Web stickiness example with 4 pages" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAARgAAAENCAYAAADUlXqkAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAZXElEQVR4nO3dfZRkdX3n8feHGUSCgIYBFkWmh0VIMw2yDHF1M+I0k4VsFEliovRZlNVOsrMLk+PDugynWXk4p3cxa8gREZHYCMmRRsyoy0OCYKgWxiNBxhlwoGWd4UEmsBJAecoE6Oa7f9zboaanuvtWT/1uVd3+vM65Z6p+dR++v5meb9/63Xt/X0UEZmYp7NHuAMysupxgzCwZJxgzS8YJxsyScYIxs2QWtzuA3bFkyZLo6ekp9Zgvvvgi++yzT6nHLFPV+wfV72M7+rdx48anIuLA6e1dnWB6enq45557Sj3m2NgYq1atKvWYZap6/6D6fWxH/yQ92qjdX5HMLBknGDNLxgnGzJJxgjGzZJxgzCyZZAlG0lWSnpS0pa7t7ZJ+IOnHkm6UtF/e3iNph6TN+XJFqrjMrDwpz2CuBn5rWttXgHURcQzwLeDTdZ9ti4jj8mVNwrjMrCTJEkxE3AE8M635KOCO/PVtwAdSHd/M2q/sG+22AO8H/g/wB8Bb6z5bJmkT8BxwXkTc2WgHkv4Y+GOAgw8+mLGxsZYH2d/fP6/tarVaiyMp3wsvvJDk77STVL2PHdW/iEi2AD3Alrr3vwbcCmwEzgeeztv3Ag7IX68AHgP2m2v/K1asiLItPeem0o9Zplqt1u4Qkqt6H9vRP+CeaPB/tNQzmIj4CXAygKQjgffm7S8BL+WvN0raBhwJlPscgJm1VKmXqSUdlP+5B3AecEX+/kBJi/LXhwNvAx4qMzYza71kZzCSRoFVwBJJ28m+Er1B0ln5Kt8Evpq/PhG4SNIEMAmsiYjpA8Rm1mWSJZiIGJjho883WHc9sD5VLGbWHr6T18yScYIxs2ScYGzBGB0dpa+vj9WrV9PX18fo6Gi7Q6q8rp7Rzqyo0dFRhoaGGBkZYXJykkWLFjE4OAjAwMBMw4W2u3wGYwvC8PAwIyMj9Pf3s3jxYvr7+xkZGWF4eLjdoVWaE4wtCOPj46xcuXKntpUrVzI+Pt6miBYGJxhbEHp7e9mwYcNObRs2bKC3t7dNES0MTjC2IAwNDTE4OEitVmNiYoJarcbg4CBDQ0PtDq3SPMhrC8LUQO7atWsZHx+nt7eX4eFhD/Am5gRjC8bAwAADAwOVr4vUSfwVycyScYIxs2ScYMwsGScYM0vGCcbMkvFVpAVK0ry2y6ZfNSvGZzALVKMJmiOCpefcNNdE7maFdURlx/yzcyVtlfSgpFNSxWVm5emIyo6SjgZOB5bn21w+NQm4mXWvTqnseBpwXUS8FBEPA1uBd6SKzczK0SmVHd8C3FW33va8bRdlVHacS8dUzUuk6v3rqMqHCXRS/8pOMB8DLpX0GeAG4OW8vdEljYYjihFxJXAlwAknnBClP1Nyy83Vfo6l6v2Dyj+L1En964jKjmRnLPV1qg8FHi8zNjNrvY6o7Eh2NnO6pL0kLSOr7Hh3mbGZWet1RGXHiLhf0vXAA8AEcFZETKaKzczK0RGVHfP1hwHPwGxWIb6T18yScYIxs2ScYMwsGScYM0vGCcbMknGCMbNknGDMLBknGDNLxgnGzJJxgjGzZJxgzCwZJxgzS8YJxsyScYIxs2ScYMwsGScYM0um7MJrx0m6S9JmSfdIekfe3iNpR96+WdIVM+/ZzLpF2YXX/hS4MCKOAz6Tv5+yLSKOy5c1CeMys5KUXXgtgKlysfvjygFmlVZ2XaSPA9+R9Dmy5Pbv6j5bJmkT8BxwXkTc2WgHLrxW3Fl/9yIvvtL8dj3rbm5q/X32hC+u3qf5A7VJJxUmS6Gj+hcRyRagB9hS9/5S4AP56w8C381f7wUckL9eATwG7DfX/lesWBFlW3rOTaUfc77mE2utVivlOO00nz52k3b0D7gnGvwfLfsq0plk5UoAvkFefzqymtRP5683AtuAI0uOzcxarOwE8zjwnvz1ScBPASQdKGlR/vpwssJrD5Ucm5m1WNmF1/4I+LykxcA/k4+lACcCF0maACaBNRExfYDYzLpMOwqvrWiw7npgfapYzKw9fCevmSXjBGNmyTjBmFkyTjBmlsycg7ySXg+8D3g38GZgB7AFuDki7k8bnpl1s1kTjKQLgFOBMeDvgSeB15PdBHdxnnw+FRH3pQ3TzLrRXGcwP4yIC2b47BJJBwGHtTYkM6uKWRNMROzy1JukPYA3RMRzEfEk2VmNmdkuCg3ySrpW0n6S9gEeAB6U9Om0oZlZtyt6FenoiHgO+B3gb8i+Fn04VVBmVg1FHxXYU9KeZAnmsoh4RVKkC8taYd/edRxzzbrmN7ym2eMAvLf541jlFU0wXwYeAe4F7pC0lGxiKOtgz49fzCMXN/cff2xsjFWrVjW1TbMTVNnCUSjBRMSlZJNFTXlUUn+akMysKua6D+aTc2x/SQtjMbOKmesMZt/8z6OAXwduyN+fCtyRKigzq4a57oO5EEDSrcDxEfF8/v4CsikvzcxmVPQy9WHAy3XvXyab0NvMbEZFE8xfAXdLukDS+WTPJf3lbBs0U9kx/+xcSVslPSjplPl0xsw6S6EEExHDwMeAXwC/BD4aEf9zjs2upmBlR0lHA6cDy/NtLp+aBNzMulczc/JuBp6Y2kbSYRHxs5lWjog7JPVMb6ZxZcfTgOsi4iXgYUlbyUqa/KCJ+MyswxRKMJLWklUF+DnZrP8iSxbHNnm8j9O4suNbgLvq1tuetzWKxZUdm9BsrPOtCthNfycdVfkwgY7qX6NqbNMXYCt55cVmFopXdvwicEbdeiNT6822uLLj7FzZsTFXdmw9drOy42PAsy3IZw0rO5Kdsby1br1Dee3rk5l1qaJjMA8BY5JuBl6aaoyIZu/knarsOEZdZUeyG/iulXQJ2bScbwPubnLfZtZhiiaYn+XL6/JlTs1UdoyI+yVdTzbXzARwVkRMNtEPM+tARR92nLqjd9/sbbxQYJvClR3z9YeB4SLxmFl3KDqjXZ+kTWTVBO6XtFHS8rShmVm3KzrIeyXwyYhYGhFLgU8Bf5EuLDOrgqIJZp+IqE29iYgxYJ8kEZlZZRS+iiTpf5A9kwRwBvBwmpDMrCqKJpiPARfy2j0sdwAfTRKRtdS8prO8pblt9t97z+aPYQtC0atIvwD+JHEs1mLNzscLWUKaz3ZmjRS9inSbpDfWvX+TpO8ki8rMKqHoIO+SiPjl1Jv8jOagJBGZWWUUTTCvSvqXGtR52RLXRTKzWRUd5B0CNkj6Xv7+RPLb/LvZ2y+8lWd3vNL0ds0MnO6/957ce/7JTR8jNUkzf/bZmbfLHpw1K6boIO8tko4H3kk2F8wnIuKppJGV4NkdryQvTNapRclmShTzKbxmNpOig7wim8ry+Ii4EfiV+vl0zcwaKToGcznwLmDqAcbnySaJMjObUdExmH8bEcfnDzwSEb+QVGjaBjNbuIommFfyWf4DQNKBwKvJojIraKaB+kc/+7557W/pOTft0tapA/XdoGiCuRT4FnCQpGHg94HzkkVlVtCMA/UXz3y1qyoD9d2g6FWkr0naCKwmu4r0OxExPts2kq4C3gc8GRF9edvXyepcA7wR+GVEHJeXNxkHHsw/uysi1jTZFzPrMEXLlvxr4OGI+KKkVcC/l/RE/d29DVwNXEZdBciI+FDdPv+MnScS3xZZQTYzq4iiV5HWA5OSjgC+AiwDrp1tg4i4A3im0Wf5Ze8PAqPFQzWzblP4UYGImAB+D/h8RHwCOGQ3jvtu4OcR8dO6tmWSNkn6nqR378a+zaxDNHMVaQD4CHBq3rY7k4AMsPPZyxPAYRHxtKQVwLclLY+I56Zv2OrKjmVUPuyYKnsFdFRVwAL27V3HMdesa37Da5o5BoyNdc8Ejh31b9ioGtv0BTia7ErSQP5+GbCuwHY91FV2zNsWk5WgPXSW7caAE+ba/+5Wdiyj8qGrHqblf8NddVJlx6JXkR6gbsKpiHgYuHieOe03gZ9ExPaphvy+mmciYlLS4WSF1x6a5/7NrEMUHYNpWl547QfAUZK2SxrMPzqdXQd3TwTuk3Qv8NfAmohoOEBsZt2j6BhM02KGwmsR8Z8atK0nu1JlZhWS7AzGzKzojXZHAp8GltZvExEnJYrLzCqg6FekbwBXkFVzdFF6MyukaIKZiIgvJY3EzCqn6BjMjZL+q6RDJP3q1JI0MjPrekXPYM7M//x0XVsAh7c2HDOrkqI32i1LHYiZVc+sCUbSSRFxu6Tfa/R5RHyzUbuZGcx9BvMe4HZee8CxXgBOMGY2o1kTTEScn//50XLCMbMqmesr0hnAtRHRcILvfKa7QyJiQ4rgUivrUX9orribWVXM9RXpAGBTPh/vRuAfgdcDR5B9fXoKmMf/0M7w/PjFC7ayo1kZ5vqK9HlJlwEnAb8BHAvsIJug+8MR8bP0IZpZt5rzMnVETAK35YuZWWF+mtrMkkk2H4xZWRqNc7W6sqPNjxOMdbUZB+lbWNnR5q/QVyRJB0sakfS3+fuj66bAnGmbqyQ9KWlLXdvXJW3Ol0ckba777FxJWyU9KOmUefbHzDpI0TGYq4HvAG/O3/9f4OMFtvmt+oaI+FBEHBdZBcf15HcCSzqabK7e5fk2l0taVDA2M+tQRRPMkoi4HngVILIibLNOPBXNVXY8DbguIl7KKxZsBd5RMDYz61BFx2BelHQA2fNHSHonO9eVbtb0yo5vAe6q+3x73rYLF15Lq6OKdiVSlT729/fPa7tardbiSGbRqFjS9AU4Hvg+WVL5PtlXpGMLbNfDtMJrefuXgE/Vvf8icEbd+xHgA3Pt34XXWq/bCq/NR9X72I6fOXaz8NqPJL0HOAoQ8GBEvDKfhCZpMVmN6xV1zduBt9a9PxR4fD77N7POUbSqwCLgt8nOSBYDJ0siIi6ZxzF3qewI3ABcK+kSsoHktwF3z2PfZtZBio7B3Aj8M/Bj8oHeueSVHVcBSyRtB86PiBEaVHaMiPslXQ88AEwAZ0X2iIKZdbGiCebQiDi2mR1HE5Ud8/ZhYLiZY5hZZyt6mfpvJZ2cNBIzq5yiZzB3Ad+StAfwCtlAb0TEfskiM7OuVzTB/BnwLuDH+SUpM7M5Ff2K9FOy+1mcXMyssKJnME8AY/nDji9NNc7zMrWZNeHtF97Kszuau+2s2ala9997T+49v/XDrEUTzMP58rp8MbOSPLvjlabmjp7PdBSp5o4ueifvhUmObmaVNlfZkssi4mxJN5I/6FgvIt6fLDIz63pzncF8BDgb+FwJsZhZxcyVYLYBRMT3SojFzCpmrgRzoKRPzvShryKZ2WzmSjCLgDeQ3blrZtaUuRLMExFxUSmRmFnlzHUnr89czGze5jqDWV1KFGY2o31713HMNeua2+iaZo8BUPxmvqJmTTAR0bAqgJmV5/nxi7v2Tl7XpjazZJIlmEaVHfP2tXn1xvsl/Wne1iNpR13VxytSxWVm5UlZm/pq4DLgL6caJPWTFVk7NiJeknRQ3frbIqv4aGYVkewMJhpXdvwvwMUR8VK+zpOpjm9m7ZfyDKaRI4F3Sxomq1Lw3yLih/lnyyRtAp4DzouIOxvtwJUd06pK1cPZdGMfm4l3vv1L8nfSqBpbqxamVXYEtgCXkt1f8w6yOWYE7AUckK+zAngM2G+u/buyY+tVvephRPf1sdmfofn0b3d/TpmhsmPZV5G2A9/MY7qbrMbSksiK3j8NEBEbyR6yPLLk2MysxcpOMN8GTgKQdCTZ7HhPSTowrx6JpMPJKjs+VHJsZtZiycZgGlV2BK4CrsovXb8MnBkRIelE4CJJE8AksCZ8k59Z10uWYGKGyo7AGQ3WXQ+sTxXLbOZ1B+MtxbfZf+89m9+/2TSNfk4f/ez75rWvpefctEtbqp/Tsq8idZRmbr+e0rPu5nltZzZfM/68Xdy4itB8HhVIxY8K2IIxOjpKX18fq1evpq+vj9HR0XaHVHkL+gzGFo7R0VGGhoYYGRlhcnKSRYsWMTg4CMDAwEzf5m13+QzGFoTh4WFGRkbo7+9n8eLF9Pf3MzIywvDwcLtDqzQnGFsQxsfHWbly5U5tK1euZHx8vE0RLQxOMLYg9Pb2smHDhp3aNmzYQG9vb5siWhicYGxBGBoaYnBwkFqtxsTEBLVajcHBQYaGhtodWqV5kNcWhKmB3LVr1zI+Pk5vby/Dw8Me4E3MCcYWjIGBAQYGBjrqPpGq81ckM0vGCcbMknGCMbNknGDMLBknGDNLxgnGzJJxgjGzZDqi8Frefq6krflnp6SKy8zK0xGF1yQdDZwOLAfeDHxX0pERMZkwPjNLrFMKr50GXJdXF3gY2EpW1sTMulinFF57C3BX3Xrb87ZdtLrw2nx0W9GuZnRjUbJmVb2PndS/shPMYuBNwDuBXweuz8uUqMG6DSccjYgrgSsBTjjhhCj9mZJbbq70cywL4Tmdqvexk/rXEYXX8va31q13KPB4ybGZWYt1ROE14AbgdEl7SVpGVnjt7pJjM7MW64jCa8D9kq4HHgAmgLN8Bcms+3VE4bV8/WHAMzCbVYjv5DWzZJxgzCwZJxgzS8YJxsyScYIxs2ScYMwsGScYM0vGCcbMknGCMbNknGDMLBknGDNLxgnGzJJxgjGzZJxgzCyZsqfM7ApSoxk86z7/bOP2bGobM5viM5gGImLGpVarzfiZme3MCcbMkim1sqOkCyT9g6TN+fLbeXuPpB117VekisusqkZHR+nr62P16tX09fUxOjra7pDKreyY+/OI+FyD9bdFxHEJ4zGrrNHRUYaGhhgZGWFycpJFixYxODgIwMDATLPXpld2ZUczS2B4eJiRkRH6+/tZvHgx/f39jIyMMDzc3mmu23EV6WxJHwHuAT4VEb/I25dJ2gQ8B5wXEXc22rjdlR07qWpeClXvH1Szj+Pj40xOTjI2NvYv/ZucnGR8fLy9fZ3tisnuLkAPsKXu/cHAIrIzp2Hgqrx9L+CA/PUK4DFgv7n2v2LFiihbrVYr/Zhlqnr/IqrZx+XLl8ftt98eEa/17/bbb4/ly5eXcnzgnmjwf7TUq0gR8fOImIyIV4G/IC9wH1nR+6fz1xuBbWR1rM2sgKGhIQYHB6nVakxMTFCr1RgcHGRoaKitcZX6FUnSIRHxRP72d4EtefuBwDMRMZnXqn4b8FCZsZl1s6mB3LVr1zI+Pk5vby/Dw8NtHeCF8is7rpJ0HFlh+0eA/5yvfiJwkaQJYBJYExEeIDZrwsDAAAMDA4yNjbFq1ap2hwOUX9lxZIZ11wPrU8ViZu3hO3nNLBknGDNLxgnGzJJxgjGzZBRdPM2ApH8EHi35sEuAp0o+Zpmq3j+ofh/b0b+lEXHg9MauTjDtIOmeiDih3XGkUvX+QfX72En981ckM0vGCcbMknGCad6V7Q4gsar3D6rfx47pn8dgzCwZn8GYWTJOMGaWzIJLMJIm84nFt0j6hqRfSXissyVtlRSSlqQ6zrRjltm/r0l6MD/WVZL2THWsaccts48jku6VdJ+kv5b0hlTHmnbc0vpYd8wvSHqhlftccAkG2BERx0VEH/AysCbhsb4P/Cbl3gxYZv++BvwacAywN/CHCY9Vr8w+fiIi3h4RxwI/A85OeKx6ZfYRSScAb2z1fhdigql3J3CEpFMl/b2kTZK+K+lgyCbCknSbpB9J+rKkR6fORCSdIenu/LfMlyUtmr7ziNgUEY+U26WdpO7f39RNmXg3cGipvcuk7uNz+boiS6LtuCqStI952/8G/nurA1+wCUbSYuA/AD8GNgDvjIh/A1zHa3/R5wO3R8TxwLeAw/Jte4EPAb8RWamVSeA/ltqBOZTZv/yr0YeBW5J0ZubjltJHSV8F/h/Z2doXUvVnhmOX0cezgRvqZptsmYVYm3pvSZvz13eSTYJ1FPB1SYcArwMezj9fSTa1JxFxi6SpCgirySYn/2H2i429gSdLiX5u7ejf5cAdMUMliARK7WNEfDT/Lf8Fsv+wX211hxoopY+S3gz8Adnsk63XaCbwKi/ACw3axoD3569XAWP563uBZXXrPUP2INla4H81ccxHgCVV7B/Zb89vA3tU+d8w3/Y9wE1V6iPwXrKzs0fy5VVga6v6sWC/Ik2zP/AP+esz69o3AB8EkHQy8Ka8/e+A35d0UP7Zr0paWlKs85Gkf5L+EDgFGIisUkQ7tbyPyhwx9Ro4FfhJsh7MreV9jIibI+JfRURPRPQA/xQRR7Qs4rJ+63TKQuPfDKeRVTG4k2ywa+o3w0H5P9KPgD8HHgf2yj/7ELAZuA/YSPbdePp+/wTYDkzk236lYv2bICsxszlfPlOlf0OyMcrvk41/bCG7ajZnva5u6mOR4+7O4kcFZiFpL2AyIiYkvQv4UlSofnbV+wfuY7stxEHeZhwGXC9pD7J7Ef6ozfG0WtX7B+5jW/kMxsyS8SCvmSXjBGNmyTjBmFkyHuS1lpM0SXZpdzEwDpwZEf/U3qisHXwGYymU+iSwdS4nGEst6ZPA1tmcYCyZqj+xbnPzGIylUPUn1q0gJxhLYcf0W9UlfQG4JCJukLQKuGDqoxn2IeCaiDg3UYxWAn9FsrJU/Yl1a8AJxspyAfANSXeyc2H2C4GTJf2IbLzmCeD5iHgAOA+4VdJ9wG3AIeWGbLvLzyJZW3Xyk8C2+zwGY+3WsU8C2+7zGYyZJeMxGDNLxgnGzJJxgjGzZJxgzCwZJxgzS+b/AwJINX1WMlzqAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "four_sessions = pd.read_csv(practicalstatspath+'four_sessions.csv')\n", "\n", "ax = four_sessions.boxplot(by='Page', column='Time', figsize=(4, 4))\n", "ax.set_xlabel('Page')\n", "ax.set_ylabel('Time (in seconds)')\n", "plt.suptitle('')\n", "plt.title('')\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> Could all the pages have the same underlying stickiness, and the differences among them be due to randomness?\n", "\n", "- Null hypothesis: Pages do not have under distinct stickiness\n", "- Alternative hypothesis: Pages do have statistically significant distinct stickiness\n", " * Target level of confidence: 95%" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Observed means: [172.8 182.6 175.6 164.6]\n", "Variance: 55.426666666666655\n", "18.94666666666669\n" ] } ], "source": [ "print('Observed means:', four_sessions.groupby('Page').mean().values.ravel())\n", "observed_variance = four_sessions.groupby('Page').mean().var()[0]\n", "print('Variance:', observed_variance)\n", "# Permutation test example with stickiness\n", "# Usually you will permute a small subset of each kind, but in this example, the data is small as is\n", "def perm_test(df):\n", " df = df.copy()\n", " df['Time'] = np.random.permutation(df['Time'].values)\n", " return df.groupby('Page').mean().var()[0]\n", " \n", "print(perm_test(four_sessions))" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "p-value: 0.083\n", "Null hypothesis CANNOT be rejected\n" ] } ], "source": [ "random.seed(1)\n", "perm_variance = [perm_test(four_sessions) for _ in range(1000)]\n", "p_val=np.mean([var > observed_variance for var in perm_variance])\n", "print('p-value: ', p_val)\n", "if p_val<0.05:\n", " print('Null hypothesis rejected')\n", "else:\n", " print('Null hypothesis CANNOT be rejected')" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAD3CAYAAAAjdY4DAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAbbUlEQVR4nO3dfZQV1Znv8e/PFkOQISI0XkRJK0EUlAa6YeFFIkpQokTFaMBJ1DiZwQSTK4leEdFEZ4VkMkl0zPVtSGR8Q0BB1EyYTPAlOMYX0o2ooBhEURACbYggQhDwuX+cajxiQx+QOtV9+vdZq9ep2lW76qkWHjf77NpbEYGZmRXfflkHYGbWUjkBm5llxAnYzCwjTsBmZhlxAjYzy4gTsJlZRvbPOoBPomPHjlFRUZF1GGafSG1tLVVVVVmHYSmqra19OyLKdy5Xcx4HXF1dHTU1NVmHYfaJSKI5/z20xkmqjYjqncvdBWFmlhEnYDOzjDgBm5llxAnYzCwjqSVgSYdLelzSy5IWS7o0KT9Y0lxJS5PP9nl1Jkh6VdIrkk5NKzYzs6YgzRbwNuCyiDgGGAhcIqkncCXwaER0Bx5N9kmOjQZ6AcOBWySVpRifmVmmUkvAEbE6IhYk2+8CLwNdgDOBO5PT7gTOSrbPBKZHxJaIeB14FRiQVnxmZlkryosYkiqAvsCzwCERsRpySVpSp+S0LsAzedVWJmU7X2sMMAaga9euKUbd9C1btiyV63br1i2V69ons3LlSi655BJeeuklPvjgA0aMGMFPf/pT7r33XmpqarjpppuyDvEj2rZty8aNG7MOo0lL/Us4SW2BWcC4iNiwu1MbKPvY6PSImBwR1RFRXV7+sRdLzEpSRHD22Wdz1llnsXTpUv70pz+xceNGJk6cmMr9tm3blsp17aNSTcCSWpFLvlMj4oGkeI2kzsnxzsDapHwlcHhe9cOAVWnGZ9ZcPPbYY7Ru3ZqLLroIgLKyMm644QamTJnCpk2bWLFiBcOHD6dHjx5cd911ALz33nucfvrpVFZWcuyxxzJjxgwg9+rziSeeSFVVFaeeeiqrV68GYMiQIVx11VWceOKJTJo0iYqKCj744AMANm3axOGHH87WrVtZtmwZw4cPp6qqisGDB7NkyRIAXn/9dY4//nj69+/PNddcU+xfUbOUWheEJAG3Ay9HxPV5hx4GLgT+Jfl8KK/8XknXA4cC3YH5acVn1pwsXrz4Y/NFtGvXjq5du7Jt2zbmz5/PokWLaNOmDf379+f000/njTfe4NBDD+U3v/kNAOvXr2fr1q185zvf4aGHHqK8vJwZM2YwceJEpkyZAsA777zDvHnzAFiwYAHz5s3jpJNO4te//jWnnnoqrVq1YsyYMdx22210796dZ599lrFjx/LYY49x6aWX8q1vfYsLLriAm2++ubi/oGYqzT7gQcD5wIuSFiZlV5FLvPdJ+gbwJnAuQEQslnQf8BK5ERSXRMT2FOMzazYiglybpuHyYcOG0aFDBwDOPvtsnnzySU477TQuv/xyxo8fz4gRIxg8eDCLFi1i0aJFDBs2DIDt27fTuXPnHdcbNWrUR7ZnzJjBSSedxPTp0xk7diwbN27kqaee4txzz91x3pYtWwD4wx/+wKxZswA4//zzGT9+/L7/RZSY1BJwRDxJw/26AEN3UWcSMCmtmMyaq169eu1IbvU2bNjAihUrKCsr+1hylsRRRx1FbW0tc+bMYcKECZxyyimMHDmSXr168fTTTzd4nwMPPHDH9hlnnMGECRNYt24dtbW1nHzyybz33nscdNBBLFy4sMH6Df1PwnbNb8KZNQNDhw5l06ZN3HXXXUCu5XrZZZfx9a9/nTZt2jB37lzWrVvH5s2befDBBxk0aBCrVq2iTZs2fO1rX+Pyyy9nwYIF9OjRg7q6uh0JeOvWrSxevLjBe7Zt25YBAwZw6aWXMmLECMrKymjXrh1HHHEE999/P5BrgT///PMADBo0iOnTpwMwderUtH8lJcEJ2KwZkMTs2bO5//776d69O0cddRStW7fmRz/6EQAnnHAC559/Pn369OHLX/4y1dXVvPjiiwwYMIA+ffowadIkrr76ag444ABmzpzJ+PHjqayspE+fPjz11FO7vO+oUaO45557PtI1MXXqVG6//XYqKyvp1asXDz2U+xrnxhtv5Oabb6Z///6sX78+3V9IifB8wCkpxhhdjwMuDZ4PuPR5PmAzsybGCdjMLCNOwGZmGXECNitxp512Gu+8807WYVgDmvWqyGa2axFBRDBnzpysQ7FdcAvYrIkbP348t9xyy479a6+9luuuu46hQ4fSr18/jjvuuB1DwZYvX84xxxzD2LFj6devHytWrKCiooK3334bgLPOOouqqip69erF5MmTd1yzbdu2TJw4kcrKSgYOHMiaNWsAWLNmDSNHjqSyspLKysodQ9buueeeHUPcLr74YrZv90ure6NFDkMrlSFiHoZWGhobhvbcc88xbty4HXM09OzZk9/+9rccdNBBtGvXjrfffpuBAweydOlS3njjDY488kieeuopBg4cCEBFRQU1NTV07NiRdevWcfDBB7N582b69+/PvHnz6NChA5J4+OGH+dKXvsQVV1xBu3btuPrqqxk1ahTHH38848aNY/v27WzcuJFVq1ZxxRVX8MADD9CqVSvGjh3LwIEDueCCC4ry+2qOdjUMzV0QZk1c3759Wbt2LatWraKuro727dvTuXNnvvvd7/LEE0+w33778dZbb+1otX72s5/dkXx39otf/ILZs2cDsGLFCpYuXUqHDh044IADGDFiBABVVVXMnTsXyM3CVv/2XVlZGZ/5zGe4++67qa2tpX///gBs3ryZTp06NXA3a4wTsFkzcM455zBz5kz+/Oc/M3r0aKZOnUpdXR21tbW0atWKiooK/va3vwEfnc8h3+9//3seeeQRnn76adq0acOQIUN21GnVqtWOeRzKysp2Ox9wRHDhhRfy4x//eB8/ZcvjPmCzZmD06NFMnz6dmTNncs4557B+/Xo6depEq1atePzxx3njjTcavcb69etp3749bdq0YcmSJTzzzDON1hk6dCi33norkJt/YsOGDQwdOpSZM2eydm1uKu9169YVdH/7OCdgs2agV69evPvuu3Tp0oXOnTvz1a9+lZqaGqqrq5k6dSpHH310o9cYPnw427Zto3fv3lxzzTW77KbId+ONN/L4449z3HHHUVVVxeLFi+nZsyc//OEPOeWUU+jduzfDhg3bMam77Rl/CbcP+Us42xueC6L0FX0uCElTJK2VtCivbIakhcnP8vqJ2iVVSNqcd+y2tOIyM2sq0vwS7g7gJuCu+oKI2DGnnaSfA/lz1i2LiD4pxmNm1qSkuSLGE8ly9B+TrBf3FeDktO5vZtbUZfUl3GBgTUQszSs7QtJzkuZJGryripLGSKqRVFNXV5d+pGZmKckqAZ8HTMvbXw10jYi+wPfIrY7crqGKETE5Iqojorq8vLwIoZqZpaPoCVjS/sDZwIz6sojYEhF/SbZrgWXAUcWOzcysmLJoAX8BWBIRK+sLJJVLKku2jwS6A69lEJuZWdGkOQxtGvA00EPSSknfSA6N5qPdDwCfB16Q9DwwE/hmRKxLKzYzs6YgzVEQ5+2i/OsNlM0CZqUVi5lZU+RXkc3MMuIEbGaWESdgM7OMOAGbmWXECdjMLCNOwGZmGXECNjPLiBOwmVlGnIDNzDLiBGxmlhEnYDOzjDgBm5llxAnYzCwjTsBmZhlxAjYzy0iaE7JPkbRW0qK8smslvSVpYfJzWt6xCZJelfSKpFPTisvMrKlIswV8BzC8gfIbIqJP8jMHQFJPcitl9Erq3FK/RJGZWalKLQFHxBNAocsKnQlMTxbnfB14FRiQVmxmZk1BFn3A35b0QtJF0T4p6wKsyDtnZVL2MZLGSKqRVFNXV5d2rGZmqSl2Ar4V6Ab0AVYDP0/K1cC50dAFImJyRFRHRHV5eXkqQZqZFUNRE3BErImI7RHxAfBLPuxmWAkcnnfqYcCqYsZmZlZsRU3Akjrn7Y4E6kdIPAyMlvQpSUcA3YH5xYzNzKzYUluWXtI0YAjQUdJK4AfAEEl9yHUvLAcuBoiIxZLuA14CtgGXRMT2tGIzM2sKUkvAEXFeA8W37+b8ScCktOIxM2tq/CacmVlGnIDNzDLiBGxmlhEnYDOzjDgBm5llxAnYzCwjTsBmZhlxAjYzy4gTsJlZRpyAzcwy4gRsZpYRJ2Azs4w4AZuZZcQJ2MwsIwUlYEnHph2ImVlLU2gL+DZJ8yWNlXRQIRWSRTfXSlqUV/ZTSUuSRTln119LUoWkzZIWJj+37fGTmJk1MwVNyB4RJ0jqDvwDUCNpPvAfETF3N9XuAG4C7sormwtMiIhtkn4CTADGJ8eWRUSfPYzfUrZs2bJUrtutW7dUrmvWnBTcBxwRS4GrySXME4FfJK3Zs3dx/hPAup3KfhcR25LdZ8gtvmlm1iIV2gfcW9INwMvAycCXIuKYZPuGvbz3PwD/lbd/hKTnJM2TNHg3sYyRVCOppq6ubi9vbWaWvUJbwDcBC4DKiLgkIhYARMQqcq3iPSJpIrnFN6cmRauBrhHRF/gecK+kdg3VjYjJEVEdEdXl5eV7emszsyaj0EU5TwM2169ULGk/oHVEbIqIu/fkhpIuBEYAQyMiACJiC7Al2a6VtAw4CqjZk2ubmTUnhbaAHwE+nbffJinbI5KGk+tDPiMiNuWVl0sqS7aPBLoDr+3p9c3MmpNCW8CtI2Jj/U5EbJTUZncVJE0DhgAdJa0EfkBu1MOngLmSAJ6JiG8Cnwf+WdI2YDvwzYhY1+CFzcxKRKEJ+D1J/er7fiVVAZt3VyEizmug+PZdnDsLmFVgLGZmJaHQBDwOuF/SqmS/MzAqlYjMzFqIQl/E+KOko4EegIAlEbE11cjMzEpcoS1ggP5ARVKnryQi4q7dVzEzs10pKAFLuhvoBiwk9yUZQPDR14zNzGwPFNoCrgZ61o/bNTOzT67QccCLgP+VZiBmZi1NoS3gjsBLySxoW+oLI+KMVKIyM2sBCk3A16YZhJlZS1ToMLR5kj4LdI+IR5K34MrSDc3MrLQVOh3lPwEzgX9PiroAD6YUk5lZi1Dol3CXAIOADbBjcvZOaQVlZtYSFNoHvCUi3k8m0EHS/uTGAZt9ImkteQRe9siavkJbwPMkXQV8WtIw4H7g1+mFZWZW+gpNwFcCdcCLwMXAHPZiJQwzM/tQoaMgPgB+mfyYmdk+UOgoiNclvbbzTyN1pkhaK2lRXtnBkuZKWpp8ts87NkHSq5JekXTq3j+SmVnzUGgXRDW52dD6A4OBXwD3NFLnDmD4TmVXAo9GRHfg0WQfST2B0UCvpM4t9UsUmZmVqoIScET8Je/nrYj4N3JL0u+uzhPAzssKnQncmWzfCZyVVz49IrZExOvAq8CAwh7BzKx5KnQ6yn55u/uRaxH/3V7c75CIWA0QEasl1Y8l7gI8k3feyqSsoVjGAGMAunbtuhchmJk1DYWOA/553vY2YDnwlX0Yhxooa3CccURMBiYDVFdXeyyymTVbhY6COGkf3W+NpM5J67czsDYpXwkcnnfeYcCqj9U2MyshhXZBfG93xyPi+gLv9zBwIfAvyedDeeX3SroeOBToDswv8JpmZs3SnqyI0Z9cogT4EvAEsGJXFSRNA4YAHSWtBH5ALvHeJ+kbwJvAuQARsVjSfcBL5Lo4LomI7Q1e2MysROzJhOz9IuJdAEnXAvdHxD/uqkJEnLeLQ0N3cf4kYFKB8ZiZNXuFjgPuCryft/8+uRWSzcxsLxXaAr4bmC9pNrnRCSPxishmZp9IoaMgJkn6L3JvwQFcFBHPpReWmVnpK7QLAqANsCEibgRWSjoipZjMzFqEQifj+QEwHpiQFLWi8bkgzMxsNwptAY8EzgDeA4iIVezdq8hmZpYoNAG/HxFB8nqwpAPTC8nMrGUoNAHfJ+nfgYOSFZIfwZOzm5l9Io2OglBuJc4ZwNHkVkXuAXw/IuamHJuZWUlrNAFHREh6MCKqACddM7N9pNAuiGck9U81EjOzFqbQN+FOAr4paTm5kRAi1zjunVZgZmalbrcJWFLXiHgT+GKR4jEzazEaawE/SG4WtDckzYqILxchJjOzFqGxPuD8pYKOTDMQM7OWprEWcOxie69J6kFuWFu9I4HvAwcB/wTUJeVXRcScfXFPM7OmqLEEXClpA7mW8KeTbfjwS7h2e3rDiHgF6AMgqQx4C5gNXATcEBE/29Nrmpk1R7tNwBFRlvL9hwLLkj7mlG9lZta07Ml0lGkYDUzL2/+2pBckTZHUPqugzMyKIbMELOkAcjOs3Z8U3Qp0I9c9sRr4+S7qjZFUI6mmrq6uoVPMzJqFLFvAXwQWRMQagIhYExHbI+IDchP9DGioUkRMjojqiKguLy8vYrhmZvtWlgn4PPK6HyR1zjs2ElhU9IjMzIqo0FeR9ylJbYBhwMV5xf8qqQ+54W7LdzpmZlZyMknAEbEJ6LBT2flZxGJmlpWsR0GYmbVYTsBmZhlxAjYzy0gmfcBmxbRs2bLUrt2tW7fUrm2lzy1gM7OMOAGbmWXECdjMLCNOwGZmGXECNjPLiBOwmVlGnIDNzDLiBGxmlhEnYDOzjPhNOLN94JO+bbe7+n7brnS5BWxmlpGsJmRfDrwLbAe2RUS1pIOBGUAFuQnZvxIRf80iPjOzYsiyBXxSRPSJiOpk/0rg0YjoDjya7JuZlaym1AVxJnBnsn0ncFZ2oZiZpS+rBBzA7yTVShqTlB0SEasBks9OGcVmZlYUWY2CGBQRqyR1AuZKWlJoxSRhjwHo2rVrWvGZmaUukxZwRKxKPtcCs4EBwJr6pemTz7W7qDs5Iqojorq8vLxYIZuZ7XNFT8CSDpT0d/XbwCnAIuBh4MLktAuBh4odm5lZMWXRBXEIMFtS/f3vjYjfSvojcJ+kbwBvAudmEJuZWdEUPQFHxGtAZQPlfwGGFjseM7OsNKVhaGZmLYoTsJlZRpyAzcwy4gRsZpYRJ2Azs4w4AZuZZcQJ2MwsI07AZmYZcQI2M8uIE7CZWUacgM3MMuIEbGaWESdgM7OMOAGbmWXECdjMLCNZrIhxuKTHJb0sabGkS5PyayW9JWlh8nNasWMzMyumLFbE2AZcFhELkqWJaiXNTY7dEBE/yyAmM7Oiy2JFjNVA/fLz70p6GehS7DjMzLKWaR+wpAqgL/BsUvRtSS9ImiKpfXaRmZmlL7MELKktMAsYFxEbgFuBbkAfci3kn++i3hhJNZJq6urqihWumdk+l0UfMJJakUu+UyPiAYCIWJN3/JfAfzZUNyImA5MBqqurI/1ozVqOZcuWpXLdbt26pXLd5q7oCVi59ehvB16OiOvzyjsn/cMAI4FFxY7NrClzciw9WbSABwHnAy9KWpiUXQWcJ6kPEMBy4OIMYjMzK5osRkE8CaiBQ3OKHYuZWZb8JpyZWUacgM3MMuIEbGaWESdgM7OMOAGbmWXECdjMLCOZvAlnZi2XXyj5kFvAZmYZcQI2M8uIE7CZWUacgM3MMuIEbGaWESdgM7OMeBiamZWc5jLUzS1gM7OMNLkELGm4pFckvSrpyqzjMTNLS5NKwJLKgJuBLwI9ya2S0TPbqMzM0tGkEjAwAHg1Il6LiPeB6cCZGcdkZpaKppaAuwAr8vZXJmVmZiWnqY2CaGituI8sPS9pDDAm2d0o6ZU9uH5H4O29jK058vM2E5/73Of2plqzfd691Jyf97MNFTa1BLwSODxv/zBgVf4JETEZmLw3F5dUExHVex9e8+LnLW1+3uavqXVB/BHoLukISQcAo4GHM47JzCwVTaoFHBHbJH0b+G+gDJgSEYszDsvMLBVNKgEDRMQcYE5Kl9+rrotmzM9b2vy8zZwiovGzzMxsn2tqfcBmZi1Gi0jApf56s6TDJT0u6WVJiyVdmpQfLGmupKXJZ/usY92XJJVJek7Sfyb7pf68B0maKWlJ8t/6+FJ+ZknfTf48L5I0TVLrUnvekk/ALeT15m3AZRFxDDAQuCR5xiuBRyOiO/Bosl9KLgVeztsv9ee9EfhtRBwNVJJ79pJ8ZkldgP8DVEfEseS+lB9NiT1vySdgWsDrzRGxOiIWJNvvkvuL2YXcc96ZnHYncFYmAaZA0mHA6cCv8opL+XnbAZ8HbgeIiPcj4h1K+JnJDRL4tKT9gTbk3gkoqedtCQm4Rb3eLKkC6As8CxwSEashl6SBThmGtq/9G3AF8EFeWSk/75FAHfAfSbfLryQdSIk+c0S8BfwMeBNYDayPiN9RYs/bEhJwo683lwpJbYFZwLiI2JB1PGmRNAJYGxG1WcdSRPsD/YBbI6Iv8B7N/J/fu5P07Z4JHAEcChwo6WvZRrXvtYQE3OjrzaVAUityyXdqRDyQFK+R1Dk53hlYm1V8+9gg4AxJy8l1KZ0s6R5K93kh9+d4ZUQ8m+zPJJeQS/WZvwC8HhF1EbEVeAD435TY87aEBFzyrzdLErm+wZcj4vq8Qw8DFybbFwIPFTu2NETEhIg4LCIqyP33fCwivkaJPi9ARPwZWCGpR1I0FHiJ0n3mN4GBktokf76Hkvtuo6Set0W8iCHpNHJ9hvWvN0/KNqJ9S9IJwP8AL/Jhn+hV5PqB7wO6kvsDfW5ErMskyJRIGgJcHhEjJHWghJ9XUh9yXzoeALwGXESuEVWSzyzpOmAUuVE+zwH/CLSlhJ63RSRgM7OmqCV0QZiZNUlOwGZmGXECNjPLiBOwmVlGnIDNzDLiBGzNlqTfSzp1p7Jxkm4psP4/S/pCOtGZNc7D0KzZknQxMDAiLsorewb4vxHxP43ULYuI7WnHaLY7bgFbczYTGCHpU7BjIqJDgb+XVJPMJXtd/cmSlkv6vqQngXMl3SHpnOTY9yX9MZl7dnLy9lV9K/snkuZL+pOkwUl5maSfSXpR0guSvpOUV0maJ6lW0n/XvzZr1hAnYGu2IuIvwHxgeFI0GpgBTEyWL+8NnCipd161v0XECRExfafL3RQR/ZO5Zz8NjMg7tn9EDADGAT9IysaQmyimb0T0BqYm83H8P+CciKgCpgAl9dal7VtOwNbcTSOXeEk+pwFfkbSA3OurvchNxF9vxi6uc5KkZyW9CJyc1KtXP7lRLVCRbH8BuC0itgEkr8P2AI4F5kpaCFxNbvInswY1uVWRzfbQg8D1kvqRa7n+Fbgc6B8Rf5V0B9A67/z3dr6ApNbALeRWX1gh6dqd6mxJPrfz4d8Z8fFpTQUsjojjP8kDWcvhFrA1axGxEfg9uX/uTwPakUuy6yUdQm4pqsbUJ9u3kzmVzymgzu+AbyarNSDpYOAVoFzS8UlZK0m9dnMNa+GcgK0UTCO3Rtr0iHieXNfDYnJJ+Q+NVU6W9vkludnkHiQ3hWljfkVuNq4XJD0P/H2y5NU5wE+SsoXk5rA1a5CHoZmZZcQtYDOzjDgBm5llxAnYzCwjTsBmZhlxAjYzy4gTsJlZRpyAzcwy4gRsZpaR/w/HTdRexED5tgAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(figsize=(5, 3.54))\n", "ax.hist(perm_variance, bins=11, rwidth=0.9,facecolor='gainsboro')\n", "ax.axvline(x = observed_variance, color='black', lw=1)\n", "ax.text(58, 180, 'Observed\\nvariance')\n", "ax.set_xlabel('Variance')\n", "ax.set_ylabel('Frequency')\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Pragmatic (data product) practioner and statistical tests! \n", "\n", "> Conduct A/A tests" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> Be careful about assumptions of causality\n", "> - Anecdotes\n", "> * Microsoft Office's advanced features/attrition experiment\n", "> * Two teams conducted separate observational studies of two advanced features for Microsoft Office. Each concluded that the new feature it was assessing reduced attrition.\n", "> * Yahoo's experiment on whether the display of ads for a brand increase searches for the brand name or related keywords\n", "> * Importance of control." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> Complex experiment designs and chance of bugs in experiment design/data collection" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "> Sometime's understanding the \"why?\" is useful, but sometimes, you may have to just go with whatever \"floats your boat\"! \n", "> - Scurvy versus Bing's design color" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Suggested additional readings and references\n", "\n", "> Data-Driven Metric Development for Online Controlled Experiments: Seven Lessons Learned by Deng & Shi at KDD 2016\n", "\n", "\n", "> The Surprising Power of Online Experiments by Ron Kohavi and Stefan Thomke, Harvard Business Review\n", "\n", "\n", "> A compilation of numerous statistical tests with Python code snippets (Good collection of Python statistical test APIs. However, the content has not been vetted for correctness by me. Apply your own caution, particularly regarding the suggested interpretation of the tests.) " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\"Hippo" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4.445833981123393\n", "4.4\n" ] } ], "source": [ "#### solution for the 'toy' exercise on weighted statistics\n", "# Weighted mean and median using state populations as weights, to determine the national figures \n", "print(np.average(state_df['Murder.Rate'], weights=state_df['Population']))\n", "print(wquantiles.median(state_df['Murder.Rate'], weights=state_df['Population'])) \n", "# wquantiles provides weighted quantiles\n", "# You could use your own custom code instead as well" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }