diff --git a/notebooks/3.2 Exploratory data analysis II and working with texts.ipynb b/notebooks/3.2 Exploratory data analysis II and working with texts.ipynb deleted file mode 100644 index 3e279e0..0000000 --- a/notebooks/3.2 Exploratory data analysis II and working with texts.ipynb +++ /dev/null @@ -1,2564 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# 3.2 Exploratory data analysis and working with texts\n", - "\n", - "In this notebook, we learn about:\n", - "1. descriptive statistics to explore data;\n", - "2. working with texts (hints)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Part 1: descriptive statistics\n", - "\n", - "*The goal of exploratory data analysis is to develop an understanding of your data. EDA is fundamentally a creative process. And like most creative processes, the key to asking quality questions is to generate a large quantity of questions.* \n", - "\n", - "Key questions:\n", - "* Which kind of variation occurs within variables?\n", - "* Which kind of co-variation occurs between variables?\n", - "\n", - "https://r4ds.had.co.nz/exploratory-data-analysis.html" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# imports\n", - "\n", - "import os, codecs\n", - "import pandas as pd\n", - "import numpy as np\n", - "import seaborn as sns\n", - "import matplotlib.pyplot as plt" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Import the dataset\n", - "Let us import the Venetian apprenticeship contracts dataset in memory." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "root_folder = \"../data/apprenticeship_venice/\"\n", - "df_contracts = pd.read_csv(codecs.open(os.path.join(root_folder,\"professions_data.csv\"), encoding=\"utf8\"), sep=\";\")\n", - "df_professions = pd.read_csv(codecs.open(os.path.join(root_folder,\"professions_classification.csv\"), encoding=\"utf8\"), sep=\",\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's take another look to the dataset." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 9653 entries, 0 to 9652\n", - "Data columns (total 47 columns):\n", - " # Column Non-Null Count Dtype \n", - "--- ------ -------------- ----- \n", - " 0 page_title 9653 non-null object \n", - " 1 register 9653 non-null object \n", - " 2 annual_salary 7870 non-null float64\n", - " 3 a_profession 9653 non-null object \n", - " 4 profession_code_strict 9618 non-null object \n", - " 5 profession_code_gen 9614 non-null object \n", - " 6 profession_cat 9597 non-null object \n", - " 7 corporation 9350 non-null object \n", - " 8 keep_profession_a 9653 non-null int64 \n", - " 9 complete_profession_a 9653 non-null int64 \n", - " 10 enrolmentY 9628 non-null float64\n", - " 11 enrolmentM 9631 non-null float64\n", - " 12 startY 9533 non-null float64\n", - " 13 startM 9539 non-null float64\n", - " 14 length 9645 non-null float64\n", - " 15 has_fled 9653 non-null int64 \n", - " 16 m_profession 9535 non-null object \n", - " 17 m_profession_code_strict 9508 non-null object \n", - " 18 m_profession_code_gen 9506 non-null object \n", - " 19 m_profession_cat 9489 non-null object \n", - " 20 m_corporation 9276 non-null object \n", - " 21 keep_profession_m 9653 non-null int64 \n", - " 22 complete_profession_m 9653 non-null int64 \n", - " 23 m_gender 9554 non-null float64\n", - " 24 m_name 9623 non-null object \n", - " 25 m_surname 6960 non-null object \n", - " 26 m_patronimic 2620 non-null object \n", - " 27 m_atelier 1434 non-null object \n", - " 28 m_coords 9639 non-null object \n", - " 29 a_name 9653 non-null object \n", - " 30 a_age 9303 non-null float64\n", - " 31 a_gender 9522 non-null float64\n", - " 32 a_geo_origins 7149 non-null object \n", - " 33 a_geo_origins_std 4636 non-null object \n", - " 34 a_coords 9610 non-null object \n", - " 35 a_quondam 7848 non-null float64\n", - " 36 accommodation_master 9653 non-null int64 \n", - " 37 personal_care_master 9653 non-null int64 \n", - " 38 clothes_master 9653 non-null int64 \n", - " 39 generic_expenses_master 9653 non-null int64 \n", - " 40 salary_in_kind_master 9653 non-null int64 \n", - " 41 pledge_goods_master 9653 non-null int64 \n", - " 42 pledge_money_master 9653 non-null int64 \n", - " 43 salary_master 9653 non-null int64 \n", - " 44 female_guarantor 9653 non-null int64 \n", - " 45 period_cat 7891 non-null float64\n", - " 46 incremental_salary 9653 non-null int64 \n", - "dtypes: float64(11), int64(15), object(21)\n", - "memory usage: 3.5+ MB\n" - ] - } - ], - "source": [ - "df_contracts.info()" - ] - }, - { - "cell_type": "code", - "execution_count": 80, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
page_titleregisterannual_salarya_professionprofession_code_strictprofession_code_genprofession_catcorporationkeep_profession_acomplete_profession_a...personal_care_masterclothes_mastergeneric_expenses_mastersalary_in_kind_masterpledge_goods_masterpledge_money_mastersalary_masterfemale_guarantorperiod_catincremental_salary
0Carlo Della sosta (Orese) 1592-08-03asv, giustizia vecchia, accordi dei garzoni, 1...NaNoreseoreseoreficeoreficeOresi11...11100000NaN0
1Antonio quondam Andrea (squerariol) 1583-01-09asv, giustizia vecchia, accordi dei garzoni, 1...12.5squerariolsquerariollavori allo squerolavori allo squeroSquerarioli11...001000101.00
2Cristofollo di Zuane (batioro in carta) 1591-0...asv, giustizia vecchia, accordi dei garzoni, 1...NaNbatiorobatiorobattiorofabbricatore di foglie/fili/cordelle d'oro o a...Battioro11...00000000NaN0
\n", - "

3 rows × 47 columns

\n", - "
" - ], - "text/plain": [ - " page_title \\\n", - "0 Carlo Della sosta (Orese) 1592-08-03 \n", - "1 Antonio quondam Andrea (squerariol) 1583-01-09 \n", - "2 Cristofollo di Zuane (batioro in carta) 1591-0... \n", - "\n", - " register annual_salary \\\n", - "0 asv, giustizia vecchia, accordi dei garzoni, 1... NaN \n", - "1 asv, giustizia vecchia, accordi dei garzoni, 1... 12.5 \n", - "2 asv, giustizia vecchia, accordi dei garzoni, 1... NaN \n", - "\n", - " a_profession profession_code_strict profession_code_gen \\\n", - "0 orese orese orefice \n", - "1 squerariol squerariol lavori allo squero \n", - "2 batioro batioro battioro \n", - "\n", - " profession_cat corporation \\\n", - "0 orefice Oresi \n", - "1 lavori allo squero Squerarioli \n", - "2 fabbricatore di foglie/fili/cordelle d'oro o a... Battioro \n", - "\n", - " keep_profession_a complete_profession_a ... personal_care_master \\\n", - "0 1 1 ... 1 \n", - "1 1 1 ... 0 \n", - "2 1 1 ... 0 \n", - "\n", - " clothes_master generic_expenses_master salary_in_kind_master \\\n", - "0 1 1 0 \n", - "1 0 1 0 \n", - "2 0 0 0 \n", - "\n", - " pledge_goods_master pledge_money_master salary_master female_guarantor \\\n", - "0 0 0 0 0 \n", - "1 0 0 1 0 \n", - "2 0 0 0 0 \n", - "\n", - " period_cat incremental_salary \n", - "0 NaN 0 \n", - "1 1.0 0 \n", - "2 NaN 0 \n", - "\n", - "[3 rows x 47 columns]" - ] - }, - "execution_count": 80, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_contracts.head(3)" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Index(['page_title', 'register', 'annual_salary', 'a_profession',\n", - " 'profession_code_strict', 'profession_code_gen', 'profession_cat',\n", - " 'corporation', 'keep_profession_a', 'complete_profession_a',\n", - " 'enrolmentY', 'enrolmentM', 'startY', 'startM', 'length', 'has_fled',\n", - " 'm_profession', 'm_profession_code_strict', 'm_profession_code_gen',\n", - " 'm_profession_cat', 'm_corporation', 'keep_profession_m',\n", - " 'complete_profession_m', 'm_gender', 'm_name', 'm_surname',\n", - " 'm_patronimic', 'm_atelier', 'm_coords', 'a_name', 'a_age', 'a_gender',\n", - " 'a_geo_origins', 'a_geo_origins_std', 'a_coords', 'a_quondam',\n", - " 'accommodation_master', 'personal_care_master', 'clothes_master',\n", - " 'generic_expenses_master', 'salary_in_kind_master',\n", - " 'pledge_goods_master', 'pledge_money_master', 'salary_master',\n", - " 'female_guarantor', 'period_cat', 'incremental_salary'],\n", - " dtype='object')" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_contracts.columns" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Every row represents an apprenticeship contract. Contracts were registered both at the guild's and at a public office. This is a sample of contracts from a much larger set of records.\n", - "\n", - "Some of the variables we will work with are:\n", - "* `annual_salary`: the annual salary paid to the apprencice, if any (in Venetian ducats).\n", - "* `a_profession` to `corporation`: increasingly generic classifications for the apprentice's stated profession.\n", - "* `startY` and `enrolmentY`: contract start and registration year respectively.\n", - "* `length`: of the contract, in years.\n", - "* `m_gender` and `a_gender`: of master and apprentice respectively.\n", - "* `a_age`: age of the apprentice at entry, in years.\n", - "* `female_guarantor`: if at least one of the contract's guarantors was female, boolean." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
TrascrizioneStandardGruppo 0Gruppo 1Gruppo 2Gruppo 3Gruppo 4Corporazione
0al negotio del librarolibrerlibraiolibrai - diverse specializzazionistampaaltre lavorazioni manifatturierebenilibreri, stampatori e ligadori
1arte de far arpicordiarte de far arpicordifabbricatore di arpicordifabbricatore di strumenti musicalimusicaaltri serviziserviziNaN
2arte de' coloriarte dei colorifabbricazione/vendita di coloricoloricoloridecorazioni e mestieri dell'artebenispezieri
\n", - "
" - ], - "text/plain": [ - " Trascrizione Standard \\\n", - "0 al negotio del libraro librer \n", - "1 arte de far arpicordi arte de far arpicordi \n", - "2 arte de' colori arte dei colori \n", - "\n", - " Gruppo 0 Gruppo 1 \\\n", - "0 libraio librai - diverse specializzazioni \n", - "1 fabbricatore di arpicordi fabbricatore di strumenti musicali \n", - "2 fabbricazione/vendita di colori colori \n", - "\n", - " Gruppo 2 Gruppo 3 Gruppo 4 \\\n", - "0 stampa altre lavorazioni manifatturiere beni \n", - "1 musica altri servizi servizi \n", - "2 colori decorazioni e mestieri dell'arte beni \n", - "\n", - " Corporazione \n", - "0 libreri, stampatori e ligadori \n", - "1 NaN \n", - "2 spezieri " - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_professions.head(3)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The professions data frame contains a classification system for each profession as found in the records (transcription, first column). The last column is the guild (or corporation) which governed the given profession. This work was performed manually by historians. We don't use it here as the classifications we need are already part of the main dataframe." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Questions\n", - "\n", - "* Plot the distribution (histogram) of the apprentices' age, contract length, annual salary and start year.\n", - "* Calculate the proportion of female apprentices and masters, and of contracts with a female guarantor.\n", - "* How likely it is for a female apprentice to have a female master? And for a male apprentice?" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "salaries_male_guarantor = df_contracts[df_contracts.female_guarantor == 0].annual_salary" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [], - "source": [ - "salaries_female_guarantor = df_contracts[df_contracts.female_guarantor == 1].annual_salary" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAWy0lEQVR4nO3dfZBddX3H8fdHItBJlA1idzJJ2oSa6qAZMdkJcXyYG1NDEi2hrTLYTFlpZradiVandCTUaiwPM6GKFKaKk5rUYNGFogwZRDGN3Dr5IzyE50ezQCjJhKSyIXoDYonf/nF/Sy/r3uy9d8/eu5vf5zVzZ8/5nt8593vObj737LnnbhQRmJlZHt7Q6QbMzKx9HPpmZhlx6JuZZcShb2aWEYe+mVlGpnS6gWM57bTTYs6cOS2te+TIEaZOnVpsQ+NksvTqPos3WXp1n8Ua7z537dr184h464gLI2LCPhYuXBituvPOO1tet90mS6/us3iTpVf3Wazx7hO4N+rkqi/vmJllxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llZNQ/wyDp7cCNNaXTgS8C16f6HGAPcF5EHJIk4BpgJfAS8MmIuC9tqxf4h7SdyyNiSzG7MbI5634wnpuva8+Gj3Tkec3MRjPqmX5EPBkRZ0bEmcBCqkF+C7AO2B4R84DtaR5gBTAvPfqA6wAknQqsB84CFgHrJU0vdG/MzOyYmr28sxR4KiKeBVYBQ2fqW4Bz0/Qq4Pr0JyB2Al2SZgBnA9siYjAiDgHbgOVj3QEzM2ucoon/I1fSZuC+iPgXSS9GRFeqCzgUEV2SbgM2RMSOtGw7cDFQAk6OiMtT/QvAyxHxlWHP0Uf1NwS6u7sX9vf3t7RjlUqFZw4fbWndsZo/85SmxlcqFaZNmzZO3RTHfRZvsvTqPos13n0uWbJkV0T0jLSs4T+tLOlE4BzgkuHLIiIkFfI/rEfERmAjQE9PT5RKpZa2Uy6XuWrHkSJaatqe1aWmxpfLZVrdz3Zyn8WbLL26z2J1ss9mLu+soHqWfyDNH0iXbUhfD6b6PmB2zXqzUq1e3czM2qSZ0P8E8N2a+a1Ab5ruBW6tqV+gqsXA4YjYD9wBLJM0Pb2BuyzVzMysTRq6vCNpKvBh4K9qyhuAmyStAZ4Fzkv126nerjlA9U6fCwEiYlDSZcA9adylETE45j0wM7OGNRT6EXEEeMuw2gtU7+YZPjaAtXW2sxnY3HybZmZWBH8i18wsIw59M7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy0hDoS+pS9LNkp6Q9Lik90o6VdI2SbvT1+lprCRdK2lA0kOSFtRspzeN3y2pd7x2yszMRtbomf41wI8i4h3Au4HHgXXA9oiYB2xP8wArgHnp0QdcByDpVGA9cBawCFg/9EJhZmbtMWroSzoF+CCwCSAifh0RLwKrgC1p2Bbg3DS9Crg+qnYCXZJmAGcD2yJiMCIOAduA5QXui5mZjaKRM/25wP8A/ybpfknflDQV6I6I/WnM80B3mp4JPFez/t5Uq1c3M7M2mdLgmAXApyPiLknX8P+XcgCIiJAURTQkqY/qZSG6u7spl8stbadSqXDR/KNFtNS0ZnuuVCot72c7uc/iTZZe3WexOtlnI6G/F9gbEXel+Zuphv4BSTMiYn+6fHMwLd8HzK5Zf1aq7QNKw+rl4U8WERuBjQA9PT1RKpWGD2lIuVzmqh1HWlp3rPasLjU1vlwu0+p+tpP7LN5k6dV9FquTfY56eScingeek/T2VFoKPAZsBYbuwOkFbk3TW4EL0l08i4HD6TLQHcAySdPTG7jLUs3MzNqkkTN9gE8DN0g6EXgauJDqC8ZNktYAzwLnpbG3AyuBAeClNJaIGJR0GXBPGndpRAwWshdmZtaQhkI/Ih4AekZYtHSEsQGsrbOdzcDmJvozM7MC+RO5ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhz6ZmYZceibmWXEoW9mlhGHvplZRhoKfUl7JD0s6QFJ96baqZK2Sdqdvk5PdUm6VtKApIckLajZTm8av1tS7/jskpmZ1dPMmf6SiDgzInrS/Dpge0TMA7aneYAVwLz06AOug+qLBLAeOAtYBKwfeqEwM7P2GMvlnVXAljS9BTi3pn59VO0EuiTNAM4GtkXEYEQcArYBy8fw/GZm1qRGQz+AH0vaJakv1bojYn+afh7oTtMzgedq1t2bavXqZmbWJlMaHPf+iNgn6XeBbZKeqF0YESEpimgovaj0AXR3d1Mul1vaTqVS4aL5R4toqWnN9lypVFrez3Zyn8WbLL26z2J1ss+GQj8i9qWvByXdQvWa/AFJMyJif7p8czAN3wfMrll9VqrtA0rD6uURnmsjsBGgp6cnSqXS8CENKZfLXLXjSEvrjtWe1aWmxpfLZVrdz3Zyn8WbLL26z2J1ss9RL+9ImirpTUPTwDLgEWArMHQHTi9wa5reClyQ7uJZDBxOl4HuAJZJmp7ewF2WamZm1iaNnOl3A7dIGhr/nYj4kaR7gJskrQGeBc5L428HVgIDwEvAhQARMSjpMuCeNO7SiBgsbE/MzGxUo4Z+RDwNvHuE+gvA0hHqAayts63NwObm2zQzsyL4E7lmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGHPpmZhlx6JuZZcShb2aWEYe+mVlGGg59SSdIul/SbWl+rqS7JA1IulHSial+UpofSMvn1GzjklR/UtLZhe+NmZkdUzNn+p8BHq+ZvxK4OiLeBhwC1qT6GuBQql+dxiHpDOB84J3AcuDrkk4YW/tmZtaMhkJf0izgI8A307yADwE3pyFbgHPT9Ko0T1q+NI1fBfRHxCsR8QwwACwqYB/MzKxBjZ7p/zPwOeA3af4twIsR8Wqa3wvMTNMzgecA0vLDafxr9RHWMTOzNpgy2gBJHwUORsQuSaXxbkhSH9AH0N3dTblcbmk7lUqFi+YfLbCzxjXbc6VSaXk/28l9Fm+y9Oo+i9XJPkcNfeB9wDmSVgInA28GrgG6JE1JZ/OzgH1p/D5gNrBX0hTgFOCFmvqQ2nVeExEbgY0APT09USqVWtitavBeteNIS+uO1Z7VpabGl8tlWt3PdnKfxZssvbrPYnWyz1Ev70TEJRExKyLmUH0j9icRsRq4E/hYGtYL3Jqmt6Z50vKfRESk+vnp7p65wDzg7sL2xMzMRtXImX49FwP9ki4H7gc2pfom4NuSBoBBqi8URMSjkm4CHgNeBdZGRGeuv5iZZaqp0I+IMlBO008zwt03EfEr4ON11r8CuKLZJs3MrBj+RK6ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUZGDX1JJ0u6W9KDkh6V9I+pPlfSXZIGJN0o6cRUPynND6Tlc2q2dUmqPynp7HHbKzMzG1EjZ/qvAB+KiHcDZwLLJS0GrgSujoi3AYeANWn8GuBQql+dxiHpDOB84J3AcuDrkk4ocF/MzGwUo4Z+VFXS7BvTI4APATen+hbg3DS9Ks2Tli+VpFTvj4hXIuIZYABYVMROmJlZYxQRow+qnpHvAt4GfA34MrAznc0jaTbww4h4l6RHgOURsTctewo4C/hSWuffU31TWufmYc/VB/QBdHd3L+zv729pxyqVCs8cPtrSumM1f+YpTY2vVCpMmzZtnLopjvss3mTp1X0Wa7z7XLJkya6I6Blp2ZRGNhARR4EzJXUBtwDvKK6933qujcBGgJ6eniiVSi1tp1wuc9WOIwV21rg9q0tNjS+Xy7S6n+3kPos3WXp1n8XqZJ9N3b0TES8CdwLvBbokDb1ozAL2pel9wGyAtPwU4IXa+gjrmJlZGzRy985b0xk+kn4H+DDwONXw/1ga1gvcmqa3pnnS8p9E9RrSVuD8dHfPXGAecHdB+2FmZg1o5PLODGBLuq7/BuCmiLhN0mNAv6TLgfuBTWn8JuDbkgaAQap37BARj0q6CXgMeBVYmy4bmZlZm4wa+hHxEPCeEepPM8LdNxHxK+DjdbZ1BXBF822amVkR/IlcM7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy4hD38wsI6OGvqTZku6U9JikRyV9JtVPlbRN0u70dXqqS9K1kgYkPSRpQc22etP43ZJ6x2+3zMxsJI2c6b8KXBQRZwCLgbWSzgDWAdsjYh6wPc0DrADmpUcfcB1UXySA9cBZwCJg/dALhZmZtceooR8R+yPivjT9S+BxYCawCtiShm0Bzk3Tq4Dro2on0CVpBnA2sC0iBiPiELANWF7kzpiZ2bEpIhofLM0Bfgq8C/jviOhKdQGHIqJL0m3AhojYkZZtBy4GSsDJEXF5qn8BeDkivjLsOfqo/oZAd3f3wv7+/pZ2rFKp8Mzhoy2tO1bzZ57S1PhKpcK0adPGqZviuM/iTZZe3WexxrvPJUuW7IqInpGWTWl0I5KmAd8DPhsRv6jmfFVEhKTGXz2OISI2AhsBenp6olQqtbSdcrnMVTuOFNFS0/asLjU1vlwu0+p+tpP7LN5k6dV9FquTfTZ0946kN1IN/Bsi4vupfCBdtiF9PZjq+4DZNavPSrV6dTMza5NG7t4RsAl4PCK+WrNoKzB0B04vcGtN/YJ0F89i4HBE7AfuAJZJmp7ewF2WamZm1iaNXN55H/AXwMOSHki1vwc2ADdJWgM8C5yXlt0OrAQGgJeACwEiYlDSZcA9adylETFYxE6YmVljRg399Ias6ixeOsL4ANbW2dZmYHMzDZqZWXH8iVwzs4w49M3MMuLQNzPLiEPfzCwjDn0zs4w49M3MMuLQNzPLiEPfzCwjDn0zs4w49M3MMuLQNzPLiEPfzCwjDn0zs4w49M3MMuLQNzPLiEPfzCwjDn0zs4w49M3MMuLQNzPLiEPfzCwjo4a+pM2SDkp6pKZ2qqRtknanr9NTXZKulTQg6SFJC2rW6U3jd0vqHZ/dMTOzY2nkTP9bwPJhtXXA9oiYB2xP8wArgHnp0QdcB9UXCWA9cBawCFg/9EJhZmbtM2roR8RPgcFh5VXAljS9BTi3pn59VO0EuiTNAM4GtkXEYEQcArbx2y8kZmY2zhQRow+S5gC3RcS70vyLEdGVpgUcioguSbcBGyJiR1q2HbgYKAEnR8Tlqf4F4OWI+MoIz9VH9bcEuru7F/b397e0Y5VKhWcOH21p3bGaP/OUpsZXKhWmTZs2Tt0Ux30Wb7L06j6LNd59LlmyZFdE9Iy0bMpYNx4RIWn0V47Gt7cR2AjQ09MTpVKppe2Uy2Wu2nGkqLaasmd1qanx5XKZVvezndxn8SZLr+6zWJ3ss9W7dw6kyzakrwdTfR8wu2bcrFSrVzczszZqNfS3AkN34PQCt9bUL0h38SwGDkfEfuAOYJmk6ekN3GWpZmZmbTTq5R1J36V6Tf40SXup3oWzAbhJ0hrgWeC8NPx2YCUwALwEXAgQEYOSLgPuSeMujYjhbw6bmdk4GzX0I+ITdRYtHWFsAGvrbGczsLmp7szMrFD+RK6ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGXHom5llxKFvZpYRh76ZWUYc+mZmGRnzf5c4ke05+c/rLpvzq++0sRMzs4nBZ/pmZhk5rs/0O2XOuh80Nf6i+a/yySbXGcmeDR8Z8zbM7PjmM30zs4w49M3MMuLQNzPLSNuv6UtaDlwDnAB8MyI2tLsHqH9nj+/qMbPjWVtDX9IJwNeADwN7gXskbY2Ix9rZx/Gq2TeQm3WsN5z9JrLZ5NDuM/1FwEBEPA0gqR9YBUyY0Pe9/a0Z7xecevxiY9acdof+TOC5mvm9wFm1AyT1AX1ptiLpyRaf6zTg5y2uW8dHi91c8jfj0mvxJmKfunLE8oTr8xgmS6/us1jj3efv11sw4e7Tj4iNwMaxbkfSvRHRU0BL426y9Oo+izdZenWfxepkn+2+e2cfMLtmflaqmZlZG7Q79O8B5kmaK+lE4Hxga5t7MDPLVlsv70TEq5I+BdxB9ZbNzRHx6Dg93ZgvEbXRZOnVfRZvsvTqPovVsT4VEZ16bjMzazN/ItfMLCMOfTOzjByXoS9puaQnJQ1IWtfpfoZImi3pTkmPSXpU0mdS/UuS9kl6ID1WToBe90h6OPVzb6qdKmmbpN3p6/QJ0Ofba47bA5J+IemzE+GYStos6aCkR2pqIx5DVV2bfmYfkrSgw31+WdITqZdbJHWl+hxJL9cc12+0q89j9Fr3ey3pknRMn5R0dof7vLGmxz2SHkj19h7TiDiuHlTfIH4KOB04EXgQOKPTfaXeZgAL0vSbgJ8BZwBfAv6u0/0N63UPcNqw2j8B69L0OuDKTvc5wvf+eaofTOn4MQU+CCwAHhntGAIrgR8CAhYDd3W4z2XAlDR9ZU2fc2rHTZBjOuL3Ov3behA4CZibcuGETvU5bPlVwBc7cUyPxzP91/7UQ0T8Ghj6Uw8dFxH7I+K+NP1L4HGqn1KeLFYBW9L0FuDczrUyoqXAUxHxbKcbAYiInwKDw8r1juEq4Pqo2gl0SZrRqT4j4scR8Wqa3Un1MzUdV+eY1rMK6I+IVyLiGWCAaj6Mu2P1KUnAecB329HLcMdj6I/0px4mXLBKmgO8B7grlT6VfpXePBEumwAB/FjSrvSnMQC6I2J/mn4e6O5Ma3Wdz+v/IU20Ywr1j+FE/rn9S6q/hQyZK+l+Sf8l6QOdamqYkb7XE/WYfgA4EBG7a2ptO6bHY+hPeJKmAd8DPhsRvwCuA/4AOBPYT/VXv057f0QsAFYAayV9sHZhVH8vnTD3+6YP+50D/EcqTcRj+joT7RiORNLngVeBG1JpP/B7EfEe4G+B70h6c6f6Syb893qYT/D6k5O2HtPjMfQn9J96kPRGqoF/Q0R8HyAiDkTE0Yj4DfCvtOlX0GOJiH3p60HgFqo9HRi65JC+Huxch79lBXBfRByAiXlMk3rHcML93Er6JNW/Mrg6vUCRLpW8kKZ3Ub1O/ocda5Jjfq8n4jGdAvwpcONQrd3H9HgM/Qn7px7StbxNwOMR8dWaeu212z8BHhm+bjtJmirpTUPTVN/Ue4TqcexNw3qBWzvT4Yhed/Y00Y5pjXrHcCtwQbqLZzFwuOYyUNup+p8dfQ44JyJeqqm/VdX/FwNJpwPzgKc70+VrPdX7Xm8Fzpd0kqS5VHu9u939DfNHwBMRsXeo0PZj2q53jNv5oHonxM+ovmJ+vtP91PT1fqq/zj8EPJAeK4FvAw+n+lZgRof7PJ3qXQ8PAo8OHUPgLcB2YDfwn8CpnT6mqa+pwAvAKTW1jh9Tqi9C+4H/pXo9eU29Y0j1rp2vpZ/Zh4GeDvc5QPV6+NDP6TfS2D9LPxMPAPcBfzwBjmnd7zXw+XRMnwRWdLLPVP8W8NfDxrb1mPrPMJiZZeR4vLxjZmZ1OPTNzDLi0Dczy4hD38wsIw59M7OMOPTNzDLi0Dczy8j/AR4PAMHujH0FAAAAAElFTkSuQmCC\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "salaries_male_guarantor.hist()\n", - "salaries_female_guarantor.hist()" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAATWklEQVR4nO3df6zd9X3f8edrkKIUBwIltTybxc7kVuOHSuM7hpRR2QorDmGFtEvliCWgZnIXESlRMwnTSCvSZMnd1lSLaOicgiAjzS1rEuGGsJUie6hSGLEZqTGOhxO8zD8GakIBZxObyXt/nK/RwZzre7n33HPu8ef5kI7O93zO9/s9r/O9x6/7Pd/zvcepKiRJbfhb4w4gSRodS1+SGmLpS1JDLH1JaoilL0kNOXvcAWZz0UUX1erVq+e17I9//GPOPffc4QZaBOYcvknJas7hm5Ssi51zz549f11V73rTHVW1pC/r1q2r+dq5c+e8lx0lcw7fpGQ15/BNStbFzgnsrgGd6uEdSWqIpS9JDbH0Jakhlr4kNcTSl6SGWPqS1BBLX5IaYulLUkMsfUlqyJL/GoZhWb3lodenD2374BiTSNL4uKcvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhpi6UtSQyx9SWqIpS9JDbH0Jakhlr4kNWTW0k9ycZKdSfYn2ZfkU934HUmOJHmqu1zXt8ztSQ4mOZDk2r7xdUn2dvd9PkkW52lJkgaZy3+MfgL4TFU9meQdwJ4kj3T3/X5V/dv+mZNcAmwCLgX+NvAXSX6uql4D7gI2A48D3wQ2Ag8P56lIkmYz655+VR2rqie76VeA/cDK0yxyAzBdVa9W1XPAQeDKJCuA86rqW1VVwJeAGxf6BCRJc5de/85x5mQ18BhwGfBbwC3Ay8Bueu8GXkxyJ/B4Vd3fLXM3vb35Q8C2qrqmG78auK2qrh/wOJvpvSNg+fLl66anp+f15I4fP86yZcsA2HvkpdfHL195/rzWt1j6cy5lk5ITJierOYdvUrIuds4NGzbsqaqpN91RVXO6AMuAPcCvdreXA2fRe7ewFbinG/8D4J/2LXc38GvA3wf+om/8auDPZnvcdevW1Xzt3Lnz9el33/aN1y9LTX/OpWxSclZNTlZzDt+kZF3snMDuGtCpczp7J8nbgK8CX66qr3W/LJ6vqteq6ifAF4Eru9kPAxf3Lb4KONqNrxowLkkakbmcvRN6e+v7q+pzfeMr+mb7EPB0N70D2JTknCRrgLXAE1V1DHglyVXdOj8GPDik5yFJmoO5nL3zPuCjwN4kT3Vjvw18JMkVQNE7Xv+bAFW1L8kDwDP0zvy5tXpn7gB8ArgXeDu94/yeuSNJIzRr6VfVXwKDzqf/5mmW2UrvOP+p47vpfQgsSRoD/yJXkhpi6UtSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SGWPqS1BBLX5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNmfU/Rj8Trd7y0OvTh7Z9cIxJJGm03NOXpIZY+pLUEEtfkhpi6UtSQyx9SWqIpS9JDbH0Jakhs5Z+kouT7EyyP8m+JJ/qxi9M8kiSZ7vrC/qWuT3JwSQHklzbN74uyd7uvs8nyeI8LUnSIHPZ0z8BfKaq/h5wFXBrkkuALcCjVbUWeLS7TXffJuBSYCPwhSRndeu6C9gMrO0uG4f4XCRJs5i19KvqWFU92U2/AuwHVgI3APd1s90H3NhN3wBMV9WrVfUccBC4MskK4Lyq+lZVFfClvmUkSSOQXv/OceZkNfAYcBnwg6p6Z999L1bVBUnuBB6vqvu78buBh4FDwLaquqYbvxq4raquH/A4m+m9I2D58uXrpqen5/Xkjh8/zrJlywDYe+SlgfNcvvL8ea17mPpzLmWTkhMmJ6s5h29Ssi52zg0bNuypqqlTx+f83TtJlgFfBT5dVS+f5nD8oDvqNONvHqzaDmwHmJqaqvXr18815hvs2rWLk8ve0vd9O/0O3TS/dQ9Tf86lbFJywuRkNefwTUrWceWc09k7Sd5Gr/C/XFVf64af7w7Z0F2/0I0fBi7uW3wVcLQbXzVgXJI0InM5eyfA3cD+qvpc3107gJu76ZuBB/vGNyU5J8kaeh/YPlFVx4BXklzVrfNjfctIkkZgLod33gd8FNib5Klu7LeBbcADST4O/AD4MEBV7UvyAPAMvTN/bq2q17rlPgHcC7yd3nH+h4fzNCRJczFr6VfVXzL4eDzA+2dYZiuwdcD4bnofAkuSxsC/yJWkhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhpi6UtSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SGWPqS1BBLX5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDZi39JPckeSHJ031jdyQ5kuSp7nJd3323JzmY5ECSa/vG1yXZ2933+SQZ/tORJJ3OXPb07wU2Dhj//aq6ort8EyDJJcAm4NJumS8kOaub/y5gM7C2uwxapyRpEc1a+lX1GPCjOa7vBmC6ql6tqueAg8CVSVYA51XVt6qqgC8BN84zsyRpntLr4FlmSlYD36iqy7rbdwC3AC8Du4HPVNWLSe4EHq+q+7v57gYeBg4B26rqmm78auC2qrp+hsfbTO9dAcuXL183PT09ryd3/Phxli1bBsDeIy8NnOfylefPa93D1J9zKZuUnDA5Wc05fJOSdbFzbtiwYU9VTZ06fvY813cX8K+A6q5/D/gNYNBx+jrN+EBVtR3YDjA1NVXr16+fV8hdu3Zxctlbtjw0cJ5DN81v3cPUn3Mpm5ScMDlZzTl8k5J1XDnndfZOVT1fVa9V1U+ALwJXdncdBi7um3UVcLQbXzVgXJI0QvMq/e4Y/UkfAk6e2bMD2JTknCRr6H1g+0RVHQNeSXJVd9bOx4AHF5BbkjQPsx7eSfIVYD1wUZLDwO8A65NcQe8QzSHgNwGqal+SB4BngBPArVX1WreqT9A7E+jt9I7zPzzE5yFJmoNZS7+qPjJg+O7TzL8V2DpgfDdw2VtKJ0kaKv8iV5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhoy6/+cdaZbveWh16cPbfvgGJNI0uJzT1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhoya+knuSfJC0me7hu7MMkjSZ7tri/ou+/2JAeTHEhybd/4uiR7u/s+nyTDfzqSpNOZy57+vcDGU8a2AI9W1Vrg0e42SS4BNgGXdst8IclZ3TJ3AZuBtd3l1HVKkhbZrKVfVY8BPzpl+Abgvm76PuDGvvHpqnq1qp4DDgJXJlkBnFdV36qqAr7Ut4wkaUTS6+BZZkpWA9+oqsu6239TVe/su//FqrogyZ3A41V1fzd+N/AwcAjYVlXXdONXA7dV1fUzPN5meu8KWL58+brp6el5Pbnjx4+zbNkyAPYeeWnW+S9fef68Hmeh+nMuZZOSEyYnqzmHb1KyLnbODRs27KmqqVPHh/0tm4OO09dpxgeqqu3AdoCpqalav379vMLs2rWLk8ve0vdtmjM5dNP8Hmeh+nMuZZOSEyYnqzmHb1KyjivnfM/eeb47ZEN3/UI3fhi4uG++VcDRbnzVgHFJ0gjNt/R3ADd30zcDD/aNb0pyTpI19D6wfaKqjgGvJLmqO2vnY33LSJJGZNbDO0m+AqwHLkpyGPgdYBvwQJKPAz8APgxQVfuSPAA8A5wAbq2q17pVfYLemUBvp3ec/+GhPhNJ0qxmLf2q+sgMd71/hvm3AlsHjO8GLntL6SRJQ+Vf5EpSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SGWPqS1BBLX5IaYulLUkMsfUlqiKUvSQ2x9CWpIZa+JDXE0pekhlj6ktQQS1+SGmLpS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhpi6UtSQyx9SWqIpS9JDbH0JakhCyr9JIeS7E3yVJLd3diFSR5J8mx3fUHf/LcnOZjkQJJrFxpekvTWDGNPf0NVXVFVU93tLcCjVbUWeLS7TZJLgE3ApcBG4AtJzhrC40uS5mgxDu/cANzXTd8H3Ng3Pl1Vr1bVc8BB4MpFeHxJ0gxSVfNfOHkOeBEo4N9X1fYkf1NV7+yb58WquiDJncDjVXV/N3438HBV/emA9W4GNgMsX7583fT09LzyHT9+nGXLlgGw98hLs85/+crz5/U4C9WfcymblJwwOVnNOXyTknWxc27YsGFP3xGY1529wPW+r6qOJvlZ4JEk3z3NvBkwNvA3TlVtB7YDTE1N1fr16+cVbteuXZxc9pYtD806/6Gb5vc4C9WfcymblJwwOVnNOXyTknVcORd0eKeqjnbXLwBfp3e45vkkKwC66xe62Q8DF/ctvgo4upDHlyS9NfMu/STnJnnHyWngl4GngR3Azd1sNwMPdtM7gE1JzkmyBlgLPDHfx5ckvXULObyzHPh6kpPr+eOq+k9Jvg08kOTjwA+ADwNU1b4kDwDPACeAW6vqtQWllyS9JfMu/ar6PvALA8Z/CLx/hmW2Alvn+5iSpIXxL3IlqSELPXvnjLK67wyfQ9s+OMYkkrQ43NOXpIac0Xv6e4+8NKfz8yWpFe7pS1JDLH1JaoilL0kNsfQlqSGWviQ1xNKXpIZY+pLUEEtfkhpyRv9x1kL4lQySzkTu6UtSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SGeMrmHHj6pqQzhXv6ktQQS1+SGmLpS1JDLH1Jaogf5C6AH/BKmjSW/iLzF4OkpcTSf4v6S1ySJo2lPyb9vzw+c/kJ1o8viqSGjLz0k2wE/h1wFvBHVbVt1BkWw2IdxvHwkKRhGmnpJzkL+APgHwGHgW8n2VFVz4wyR4v85SEJRr+nfyVwsKq+D5BkGrgBOKNKf6bj/qf7PGAhpbzYhX5y/Qs9DOUvHmn8UlWje7DknwAbq+qfdbc/CvyDqvrkKfNtBjZ3N38eODDPh7wI+Ot5LjtK5hy+SclqzuGblKyLnfPdVfWuUwdHvaefAWNv+q1TVduB7Qt+sGR3VU0tdD2LzZzDNylZzTl8k5J1XDlH/Re5h4GL+26vAo6OOIMkNWvUpf9tYG2SNUl+CtgE7BhxBklq1kgP71TViSSfBP4zvVM276mqfYv4kAs+RDQi5hy+SclqzuGblKxjyTnSD3IlSePlt2xKUkMsfUlqyBlZ+kk2JjmQ5GCSLePOc1KSi5PsTLI/yb4kn+rG70hyJMlT3eW6cWcFSHIoyd4u0+5u7MIkjyR5tru+YMwZf75vuz2V5OUkn14K2zTJPUleSPJ039iM2y/J7d1r9kCSa5dA1n+T5LtJ/irJ15O8sxtfneT/9G3bPxxzzhl/1uPapjPk/JO+jIeSPNWNj3Z7VtUZdaH3AfH3gPcAPwV8B7hk3Lm6bCuA93bT7wD+O3AJcAfwL8adb0DeQ8BFp4z9a2BLN70F+N1x5zzlZ/+/gHcvhW0K/BLwXuDp2bZf9zr4DnAOsKZ7DZ815qy/DJzdTf9uX9bV/fMtgW068Gc9zm06KOcp9/8e8C/HsT3PxD3917/qoar+L3Dyqx7GrqqOVdWT3fQrwH5g5XhTvWU3APd10/cBN44vypu8H/heVf2PcQcBqKrHgB+dMjzT9rsBmK6qV6vqOeAgvdfySAzKWlV/XlUnupuP0/u7mrGaYZvOZGzb9HQ5kwT4deAro8hyqjOx9FcC/7Pv9mGWYLEmWQ38IvBfu6FPdm+j7xn3IZM+Bfx5kj3dV2MALK+qY9D7JQb87NjSvdkm3vgPaSlu05m231J/3f4G8HDf7TVJ/luS/5Lk6nGF6jPoZ71Ut+nVwPNV9Wzf2Mi255lY+nP6qodxSrIM+Crw6ap6GbgL+LvAFcAxem/9loL3VdV7gQ8Atyb5pXEHmkn3x36/AvzHbmipbtOZLNnXbZLPAieAL3dDx4C/U1W/CPwW8MdJzhtXPmb+WS/VbfoR3rhzMtLteSaW/pL+qockb6NX+F+uqq8BVNXzVfVaVf0E+CIjfFt/OlV1tLt+Afg6vVzPJ1kB0F2/ML6Eb/AB4Mmqeh6W7jZl5u23JF+3SW4Grgduqu4AdHe45Ifd9B56x8p/blwZT/OzXnLbNMnZwK8Cf3JybNTb80ws/SX7VQ/dsby7gf1V9bm+8RV9s30IePrUZUctyblJ3nFymt6Hek/T25Y3d7PdDDw4noRv8oa9p6W4TTszbb8dwKYk5yRZA6wFnhhDvtel9x8e3Qb8SlX9777xd6X3f2OQ5D30sn5/PClP+7NectsUuAb4blUdPjkw8u05qk+MR3kBrqN3Zsz3gM+OO09frn9I7+3lXwFPdZfrgP8A7O3GdwArlkDW99A78+E7wL6T2xH4GeBR4Nnu+sIlkPWngR8C5/eNjX2b0vsldAz4f/T2Oj9+uu0HfLZ7zR4APrAEsh6kd0z85Gv1D7t5f617TXwHeBL4x2POOePPelzbdFDObvxe4J+fMu9It6dfwyBJDTkTD+9IkmZg6UtSQyx9SWqIpS9JDbH0Jakhlr4kNcTSl6SG/H+J27HnIC37LQAAAABJRU5ErkJggg==", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "df_contracts.annual_salary.hist(bins=100)" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAD4CAYAAADo30HgAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAVkklEQVR4nO3df4wc91nH8feD3Qa318YOoYexLRyQVUhifsSnECit7pRCTBPVARHkKlAHgqyitATkSnWoRPnHwoCC1BJSZHBUl0S9mjQlJqmhkclRIdUJcUh7cdw0LjGpE9eGNkl7JQpcePhjx7Df697d7s7u3l38fkmnnf3Od2aenZ3bz82PnYvMRJKks75roQuQJC0uBoMkqWAwSJIKBoMkqWAwSJIKyxe6gPlceOGFuX79+rb7f/vb3+b1r399/wqqyfrqsb56rK+epVTfkSNH/iMzv7erGWXmov7ZtGlTduLBBx/sqP+gWV891leP9dWzlOoDHskuP3c9lCRJKhgMkqSCwSBJKhgMkqSCwSBJKhgMkqSCwSBJKhgMkqSCwSBJKiz6W2K8mqzfeT87Nk5zw877257mxO6r+1iRJH0n9xgkSQWDQZJUMBgkSQWDQZJUMBgkSQWDQZJUMBgkSQWDQZJUMBgkSQWDQZJUmDcYIuKOiDgTEY83tf1xRHwpIr4YEZ+OiJVN426JiOMR8WREXNXUvikiJqtxH4mI6PmrkSTV1s4ew8eAzTPaHgAuzcwfBb4M3AIQERcDW4FLqmluj4hl1TQfBbYDG6qfmfOUJC0C8wZDZn4O+MaMts9m5nT19DCwthreAoxn5suZ+TRwHLg8IlYDb8zMz2dmAh8Hru3Ra5Ak9VA0Pqfn6RSxHrgvMy9tMe5vgU9m5p0RcRtwODPvrMbtBQ4CJ4Ddmfn2qv2twAcy85pZlredxt4Fw8PDm8bHx9t+QVNTUwwNDbXdf5Amn32R4RVw+qX2p9m45vz+FdTCYl5/YH11WV89S6m+sbGxI5k50s18at12OyI+CEwDd51tatEt52hvKTP3AHsARkZGcnR0tO2aJiYm6KT/IN1Q3Xb71sn2V/uJ60f7V1ALi3n9gfXVZX31nCv1dR0MEbENuAa4Mv9/t+MksK6p21rguap9bYt2SdIi09XlqhGxGfgA8M7M/M+mUQeArRFxXkRcROMk88OZeQr4VkRcUV2N9G7g3pq1S5L6YN49hoj4BDAKXBgRJ4EP0bgK6Tzggeqq08OZ+Z7MPBoR+4EnaBxiuikzX6lm9Zs0rnBaQeO8w8HevhRJUi/MGwyZ+a4WzXvn6L8L2NWi/RHgO05eS5IWF7/5LEkq1LoqSYvP+p33d9T/xO6r+1SJpKXKPQZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUsFgkCQVDAZJUmHeYIiIOyLiTEQ83tR2QUQ8EBFPVY+rmsbdEhHHI+LJiLiqqX1TRExW4z4SEdH7lyNJqqudPYaPAZtntO0EDmXmBuBQ9ZyIuBjYClxSTXN7RCyrpvkosB3YUP3MnKckaRGYNxgy83PAN2Y0bwH2VcP7gGub2scz8+XMfBo4DlweEauBN2bm5zMzgY83TSNJWkSi8Tk9T6eI9cB9mXlp9fyFzFzZNP75zFwVEbcBhzPzzqp9L3AQOAHszsy3V+1vBT6QmdfMsrztNPYuGB4e3jQ+Pt72C5qammJoaKjt/oM0+eyLDK+A0y+1P83GNed3vIxOzJz/Yl5/YH11WV89S6m+sbGxI5k50s18lve0Kmh13iDnaG8pM/cAewBGRkZydHS07QImJibopP8g3bDzfnZsnObWyfZX+4nrRzteRidmzn8xrz+wvrqsr55zpb5ur0o6XR0eono8U7WfBNY19VsLPFe1r23RLklaZLoNhgPAtmp4G3BvU/vWiDgvIi6icZL54cw8BXwrIq6orkZ6d9M0kqRFZN5jGhHxCWAUuDAiTgIfAnYD+yPiRuAZ4DqAzDwaEfuBJ4Bp4KbMfKWa1W/SuMJpBY3zDgd7+kokST0xbzBk5rtmGXXlLP13AbtatD8CXNpRdZKkgfObz5KkQq+vStISs37GVUw7Nk7PeWXTid1X97skSQvMPQZJUsFgkCQVDAZJUsFzDDXMPD4vSa8G7jFIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgq1giEificijkbE4xHxiYj47oi4ICIeiIinqsdVTf1viYjjEfFkRFxVv3xJUq91HQwRsQb4LWAkMy8FlgFbgZ3AoczcAByqnhMRF1fjLwE2A7dHxLJ65UuSeq3uoaTlwIqIWA68DngO2ALsq8bvA66thrcA45n5cmY+DRwHLq+5fElSj0Vmdj9xxM3ALuAl4LOZeX1EvJCZK5v6PJ+ZqyLiNuBwZt5Zte8FDmbm3S3mux3YDjA8PLxpfHy87ZqmpqYYGhrq+jV1YvLZFzueZngFnH6p/f4b15zf0fy7qanZfPV1Wk+vDfL97Yb11WN99TTXNzY2diQzR7qZz/JuC6jOHWwBLgJeAP46In5lrklatLVMpczcA+wBGBkZydHR0bbrmpiYoJP+ddyw8/6Op9mxcZpbJ9tf7SeuH+1o/t3U1Gy++jqtp9cG+f52w/rqsb56elVfnUNJbweezsx/z8z/Bu4Bfho4HRGrAarHM1X/k8C6punX0jj0JElaROoEwzPAFRHxuogI4ErgGHAA2Fb12QbcWw0fALZGxHkRcRGwAXi4xvIlSX3Q9aGkzHwoIu4GHgWmgX+hcfhnCNgfETfSCI/rqv5HI2I/8ETV/6bMfKVm/ZKkHus6GAAy80PAh2Y0v0xj76FV/100TlZriVrf4TmME7uv7lMlkvrFbz5LkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpUOsmeuq/Tm9aJ0l1uccgSSoYDJKkgsEgSSoYDJKkgsEgSSoYDJKkgsEgSSoYDJKkgsEgSSoYDJKkQq1giIiVEXF3RHwpIo5FxE9FxAUR8UBEPFU9rmrqf0tEHI+IJyPiqvrlS5J6re4ew4eBv8vMHwZ+DDgG7AQOZeYG4FD1nIi4GNgKXAJsBm6PiGU1ly9J6rGugyEi3gi8DdgLkJn/lZkvAFuAfVW3fcC11fAWYDwzX87Mp4HjwOXdLl+S1B+Rmd1NGPHjwB7gCRp7C0eAm4FnM3NlU7/nM3NVRNwGHM7MO6v2vcDBzLy7xby3A9sBhoeHN42Pj7dd19TUFENDQ129pk5NPvtix9MMr4DTL/WhmB7pdX0b15zfu5kx2Pe3G9ZXj/XV01zf2NjYkcwc6WY+dW67vRy4DHhfZj4UER+mOmw0i2jR1jKVMnMPjdBhZGQkR0dH2y5qYmKCTvrXcUMXt8TesXGaWycX793Oe13fietHezYvGOz72w3rq8f66ulVfXXOMZwETmbmQ9Xzu2kExemIWA1QPZ5p6r+uafq1wHM1li9J6oOugyEzvwZ8NSLeXDVdSeOw0gFgW9W2Dbi3Gj4AbI2I8yLiImAD8HC3y5ck9UfdYwbvA+6KiNcC/wr8Go2w2R8RNwLPANcBZObRiNhPIzymgZsy85Way5ck9VitYMjMx4BWJzeunKX/LmBXnWVKkvrLbz5LkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgoGgySpYDBIkgq1gyEilkXEv0TEfdXzCyLigYh4qnpc1dT3log4HhFPRsRVdZctSeq9Xuwx3Awca3q+EziUmRuAQ9VzIuJiYCtwCbAZuD0ilvVg+ZKkHqoVDBGxFrga+Mum5i3Avmp4H3BtU/t4Zr6cmU8Dx4HL6yxfktR7kZndTxxxN/AHwBuA92fmNRHxQmaubOrzfGauiojbgMOZeWfVvhc4mJl3t5jvdmA7wPDw8Kbx8fG2a5qammJoaKjr19SJyWdf7Hia4RVw+qU+FNMjva5v45rzezczBvv+dsP66rG+eprrGxsbO5KZI93MZ3m3BUTENcCZzDwSEaPtTNKirWUqZeYeYA/AyMhIjo62M/uGiYkJOulfxw077+94mh0bp7l1suvV3ne9ru/E9aM9mxcM9v3thvXVY3319Kq+Op8AbwHeGRHvAL4beGNE3AmcjojVmXkqIlYDZ6r+J4F1TdOvBZ6rsXxJUh90fY4hM2/JzLWZuZ7GSeV/yMxfAQ4A26pu24B7q+EDwNaIOC8iLgI2AA93XbkkqS/6cUxjN7A/Im4EngGuA8jMoxGxH3gCmAZuysxX+rB8SVINPQmGzJwAJqrhrwNXztJvF7CrF8uUJPWH33yWJBUMBklSYfFeN6lXhfUdXtJ7YvfVfapEUrvcY5AkFQwGSVLBYJAkFQwGSVLBYJAkFQwGSVLBYJAkFQwGSVLBYJAkFQwGSVLBYJAkFbxXUpNO7+sjSa9G7jFIkgoGgySpYDBIkgoGgySpYDBIkgpelSTNo5Or1XZsnGa0f6VIA2EwaMnz34dKveWhJElSoetgiIh1EfFgRByLiKMRcXPVfkFEPBART1WPq5qmuSUijkfEkxFxVS9egCSpt+rsMUwDOzLzR4ArgJsi4mJgJ3AoMzcAh6rnVOO2ApcAm4HbI2JZneIlSb3XdTBk5qnMfLQa/hZwDFgDbAH2Vd32AddWw1uA8cx8OTOfBo4Dl3e7fElSf/TkHENErAd+AngIGM7MU9AID+BNVbc1wFebJjtZtUmSFpHIzHoziBgC/hHYlZn3RMQLmbmyafzzmbkqIv4M+Hxm3lm17wU+k5mfajHP7cB2gOHh4U3j4+Nt1zM1NcXQ0FBXr2Xy2Re7mq4Twyvg9Et9X0zXFrq+jWvOn3N8q/e30/dtvmXM1Mn8h1fAmy7obP6DVOf3YxCsr57m+sbGxo5k5kg386l1uWpEvAb4FHBXZt5TNZ+OiNWZeSoiVgNnqvaTwLqmydcCz7Wab2buAfYAjIyM5OjoaNs1TUxM0En/ZjcM4O6qOzZOc+vk4r1KeKHrO3H96JzjW72/nb5v8y1jpk7mv2PjNL/c5fY3CHV+PwbB+urpVX11rkoKYC9wLDP/pGnUAWBbNbwNuLepfWtEnBcRFwEbgIe7Xb4kqT/q/Gn4FuBXgcmIeKxq+11gN7A/Im4EngGuA8jMoxGxH3iCxhVNN2XmKzWWL0nqg66DITP/CYhZRl85yzS7gF3dLlOS1H+L92C3zknz3d5ix8bpgZwLks5l3hJDklQwGCRJBYNBklQwGCRJBYNBklQwGCRJBYNBklTwewzSEuS/M1U/uccgSSoYDJKkgsEgSSoYDJKkgiefdc7p9MStdK5xj0GSVDAYJEkFg0GSVPAcg6SBm+08z2z/iMkv6A2WewySpILBIEkqGAySpILnGKRzwNlj+rMdw5+p02P6/f5uSDfz97xE917VweAXmSSpc6/qYJCWAv+A0WIz8GCIiM3Ah4FlwF9m5u5B1yBpbobV/F7N/xNjoMEQEcuAPwN+FjgJ/HNEHMjMJwZZh9RPfqguDq/mD+5+G/Qew+XA8cz8V4CIGAe2AAaDpAXVTpC0e/K+2/k3W8igiswc3MIifgnYnJm/UT3/VeAnM/O9M/ptB7ZXT98MPNnBYi4E/qMH5faL9dVjffVYXz1Lqb4fyMzv7WYmg95jiBZt35FMmbkH2NPVAiIeycyRbqYdBOurx/rqsb56zpX6Bv0Ft5PAuqbna4HnBlyDJGkOgw6GfwY2RMRFEfFaYCtwYMA1SJLmMNBDSZk5HRHvBf6exuWqd2Tm0R4vpqtDUANkffVYXz3WV885Ud9ATz5LkhY/b6InSSoYDJKkwpINhojYHBFPRsTxiNjZYnxExEeq8V+MiMsGWNu6iHgwIo5FxNGIuLlFn9GIeDEiHqt+fm9Q9VXLPxERk9WyH2kxfiHX35ub1stjEfHNiPjtGX0Guv4i4o6IOBMRjze1XRARD0TEU9XjqlmmnXNb7WN9fxwRX6rev09HxMpZpp1zW+hjfb8fEc82vYfvmGXahVp/n2yq7UREPDbLtINYfy0/U/q2DWbmkvuhceL6K8APAq8FvgBcPKPPO4CDNL47cQXw0ADrWw1cVg2/Afhyi/pGgfsWcB2eAC6cY/yCrb8W7/XXaHxZZ8HWH/A24DLg8aa2PwJ2VsM7gT+cpf45t9U+1vdzwPJq+A9b1dfOttDH+n4feH8b7/+CrL8Z428Ffm8B11/Lz5R+bYNLdY/h/26tkZn/BZy9tUazLcDHs+EwsDIiVg+iuMw8lZmPVsPfAo4Bawax7B5asPU3w5XAVzLz3xZg2f8nMz8HfGNG8xZgXzW8D7i2xaTtbKt9qS8zP5uZ09XTwzS+N7QgZll/7Viw9XdWRATwy8Aner3cds3xmdKXbXCpBsMa4KtNz0/ynR+87fTpu4hYD/wE8FCL0T8VEV+IiIMRcclgKyOBz0bEkWjcgmSmRbH+aHzXZbZfyIVcfwDDmXkKGr+4wJta9Fks6/HXaewBtjLfttBP760Odd0xy2GQxbD+3gqczsynZhk/0PU34zOlL9vgUg2Gdm6t0dbtN/opIoaATwG/nZnfnDH6URqHR34M+FPgbwZZG/CWzLwM+Hngpoh424zxi2H9vRZ4J/DXLUYv9Ppr12JYjx8EpoG7Zuky37bQLx8Ffgj4ceAUjcM1My34+gPexdx7CwNbf/N8psw6WYu2OdfhUg2Gdm6tsaC334iI19B4A+/KzHtmjs/Mb2bmVDX8GeA1EXHhoOrLzOeqxzPAp2nsbjZbDLcv+Xng0cw8PXPEQq+/yumzh9eqxzMt+iz0drgNuAa4PqsDzjO1sS30RWaezsxXMvN/gL+YZbkLvf6WA78IfHK2PoNaf7N8pvRlG1yqwdDOrTUOAO+urq65Anjx7C5Xv1XHJPcCxzLzT2bp831VPyLichrvxdcHVN/rI+INZ4dpnKR8fEa3BVt/TWb9S20h11+TA8C2angbcG+LPgt2G5ho/FOsDwDvzMz/nKVPO9tCv+prPmf1C7Msd6Fvo/N24EuZebLVyEGtvzk+U/qzDfbzTHo/f2hcNfNlGmfbP1i1vQd4TzUcNP4p0FeASWBkgLX9DI1dtS8Cj1U/75hR33uBozSuEDgM/PQA6/vBarlfqGpYVOuvWv7raHzQn9/UtmDrj0ZAnQL+m8ZfYDcC3wMcAp6qHi+o+n4/8Jm5ttUB1XecxrHls9vgn8+sb7ZtYUD1/VW1bX2RxgfV6sW0/qr2j53d5pr6LsT6m+0zpS/boLfEkCQVluqhJElSnxgMkqSCwSBJKhgMkqSCwSBJKhgMkqSCwSBJKvwvlUId3avGll4AAAAASUVORK5CYII=", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "df_contracts[df_contracts.annual_salary < 20].annual_salary.hist(bins=25)" - ] - }, - { - "cell_type": "code", - "execution_count": 86, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 86, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAATvklEQVR4nO3dfYxl9X3f8fenYBPCmgd3lRVmqZZKUImHxO1ONlQu7aztBmJbAquxtA41ILvaCOEqUUnLklYKkbXVNsrGFaJGXQvLUFKPaGwHZCAIo0wIEZTsEuJlIchr74ouILaOycLQlGTX3/5xz1Z3x/O0s3Pn3pnf+yWN5p7vPU/f+/CZM7975kyqCklSG/7OsHdAkrR8DH1JaoihL0kNMfQlqSGGviQ15PRh78B81q5dWxs2bBjIut955x3OOuusgax7VKz2Hu1v5VvtPQ6jv7Vr1/LYY489VlXXTL9v5EN/w4YN7N69eyDrnpycZHx8fCDrHhWrvUf7W/lWe4/D6i/J2pnqDu9IUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDRv4vcleiDdseHtq2D+74+NC2LWn0zXukn+TCJH+Y5KUk+5L8Sle/I8mrSZ7vvj7Wt8ztSfYneTnJ1X31jUn2dvfdmSSDaUuSNJOFHOkfBW6tqueSvA/Yk+Tx7r4vVtVv98+c5FJgC3AZ8AHg20kuqapjwN3AVuAZ4BHgGuDRpWlFkjSfeY/0q+r1qnquu/028BJwwRyLXAtMVNW7VXUA2A9sSnI+cHZVPV29f8x7H3DdqTYgSVq4nMw/Rk+yAXgSuBz4N8BNwFvAbnq/DbyZ5C7gmaq6v1vmHnpH8weBHVX10a5+FXBbVX1ihu1spfcbAevWrds4MTGxyPbmNjU1xZo1a5Z8vXtfPbLk61yoKy4454TpQfU4Kuxv5VvtPQ6rv82bN++pqrHp9QV/kJtkDfB14Fer6q0kdwNfAKr7vhP4LDDTOH3NUf/xYtUuYBfA2NhYDeqypIO65OlNw/wg9/rxE6a9bO3Kttr7g9Xf46j1t6BTNpO8h17g/25VfQOgqt6oqmNV9SPgy8CmbvZDwIV9i68HXuvq62eoS5KWyULO3glwD/BSVf1OX/38vtk+CbzQ3X4I2JLkjCQXARcDz1bV68DbSa7s1nkD8OAS9SFJWoCFDO98CPgMsDfJ813t14FPJ/kgvSGag8AvA1TVviQPAC/SO/Pnlu7MHYCbga8CZ9Ib5/fMHUlaRvOGflU9xczj8Y/Mscx2YPsM9d30PgSWJA2Bl2GQpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNWTe0E9yYZI/TPJSkn1JfqWrvz/J40m+230/r2+Z25PsT/Jykqv76huT7O3uuzNJBtOWJGkmpy9gnqPArVX1XJL3AXuSPA7cBDxRVTuSbAO2AbcluRTYAlwGfAD4dpJLquoYcDewFXgGeAS4Bnh0qZtSWzZse3go2z244+ND2a50KuY90q+q16vque7228BLwAXAtcC93Wz3Atd1t68FJqrq3ao6AOwHNiU5Hzi7qp6uqgLu61tGkrQM0svfBc6cbACeBC4HXqmqc/vue7OqzktyF/BMVd3f1e+hdzR/ENhRVR/t6lcBt1XVJ2bYzlZ6vxGwbt26jRMTE4tqbj5TU1OsWbNmyde799UjS77OhbrignNOmB5Uj6NiamqKA0eODWXb0x/rQVjtzx+s/h6H1d/mzZv3VNXY9PpChncASLIG+Drwq1X11hzD8TPdUXPUf7xYtQvYBTA2Nlbj4+ML3c2TMjk5ySDWfdOQhhsADl4/fsL0oHocFZOTk+x86p2hbHv6Yz0Iq/35g9Xf46j1t6Czd5K8h17g/25VfaMrv9EN2dB9P9zVDwEX9i2+Hnitq6+foS5JWiYLOXsnwD3AS1X1O313PQTc2N2+EXiwr74lyRlJLgIuBp6tqteBt5Nc2a3zhr5lJEnLYCHDOx8CPgPsTfJ8V/t1YAfwQJLPAa8AnwKoqn1JHgBepHfmzy3dmTsANwNfBc6kN87vmTuStIzmDf2qeoqZx+MBPjLLMtuB7TPUd9P7EFiSNAT+Ra4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGrLgf6IiaXT4f4G1WB7pS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqyLyhn+QrSQ4neaGvdkeSV5M83319rO++25PsT/Jykqv76huT7O3uuzNJlr4dSdJcFnKk/1XgmhnqX6yqD3ZfjwAkuRTYAlzWLfOlJKd1898NbAUu7r5mWqckaYDmDf2qehL44QLXdy0wUVXvVtUBYD+wKcn5wNlV9XRVFXAfcN0i91mStEjpZfA8MyUbgG9V1eXd9B3ATcBbwG7g1qp6M8ldwDNVdX833z3Ao8BBYEdVfbSrXwXcVlWfmGV7W+n9VsC6des2TkxMLL7DOUxNTbFmzZolX+/eV48s+ToX6ooLzjlhelA9joqpqSkOHDk2lG1Pf6wHYbbnb1ivsUH03MJrdBj9bd68eU9VjU2vn77I9d0NfAGo7vtO4LPATOP0NUd9RlW1C9gFMDY2VuPj44vczblNTk4yiHXftO3hJV/nQh28fvyE6UH1OComJyfZ+dQ7Q9n29Md6EGZ7/ob1GhtEzy28Rkepv0WdvVNVb1TVsar6EfBlYFN31yHgwr5Z1wOvdfX1M9QlSctoUaHfjdEf90ng+Jk9DwFbkpyR5CJ6H9g+W1WvA28nubI7a+cG4MFT2G9J0iLMO7yT5GvAOLA2ySHgN4DxJB+kN0RzEPhlgKral+QB4EXgKHBLVR0fcL2Z3plAZ9Ib5390CfuQJC3AvKFfVZ+eoXzPHPNvB7bPUN8NXH5SeydJWlL+Ra4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhoy7z9Gl6TjNmx7eMnXeesVR7lpnvUe3PHxJd9uqzzSl6SGrOoj/fmOShZyhCFJq4lH+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1JB5Qz/JV5IcTvJCX+39SR5P8t3u+3l9992eZH+Sl5Nc3VffmGRvd9+dSbL07UiS5rKQI/2vAtdMq20Dnqiqi4EnummSXApsAS7rlvlSktO6Ze4GtgIXd1/T1ylJGrB5Q7+qngR+OK18LXBvd/te4Lq++kRVvVtVB4D9wKYk5wNnV9XTVVXAfX3LSJKWSXoZPM9MyQbgW1V1eTf9V1V1bt/9b1bVeUnuAp6pqvu7+j3Ao8BBYEdVfbSrXwXcVlWfmGV7W+n9VsC6des2TkxMLKq5va8emfP+dWfCG3+9qFWPrCsuOOeE6ampKdasWTOkvRm8qakpDhw5NpRtT3+sB2G252++1/ZKspD34XI81oMyrPfg5s2b91TV2PT6Ul9lc6Zx+pqjPqOq2gXsAhgbG6vx8fFF7cx8V9C89Yqj7Ny7ui40evD68ROmJycnWezjtxJMTk6y86l3hrLt6Y/1IMz2/K2mq8Mu5H24HI/1oIzae3CxZ++80Q3Z0H0/3NUPARf2zbceeK2rr5+hLklaRosN/YeAG7vbNwIP9tW3JDkjyUX0PrB9tqpeB95OcmV31s4NfctIkpbJvGMbSb4GjANrkxwCfgPYATyQ5HPAK8CnAKpqX5IHgBeBo8AtVXV8wPVmemcCnUlvnP/RJe1EkjSveUO/qj49y10fmWX+7cD2Geq7gctPau8kSUvKv8iVpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNOX3YOyBJ89mw7eGhbfvgjo8PbduD4JG+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGnFPpJDibZm+T5JLu72vuTPJ7ku9338/rmvz3J/iQvJ7n6VHdeknRyluJIf3NVfbCqxrrpbcATVXUx8EQ3TZJLgS3AZcA1wJeSnLYE25ckLdAghneuBe7tbt8LXNdXn6iqd6vqALAf2DSA7UuSZpGqWvzCyQHgTaCA/1pVu5L8VVWd2zfPm1V1XpK7gGeq6v6ufg/waFX93gzr3QpsBVi3bt3GiYmJRe3f3lePzHn/ujPhjb9e1KpH1hUXnHPC9NTUFGvWrBnS3gze1NQUB44cG8q2pz/WgzDb8zffa3slGfX34ak+z8N6D27evHlP3wjM/3eqF1z7UFW9luSngMeT/MUc82aG2ow/capqF7ALYGxsrMbHxxe1czfNc5GmW684ys69q+uacwevHz9henJyksU+fivB5OQkO596Zyjbnv5YD8Jsz998r+2VZNTfh6f6PI/ae/CUhneq6rXu+2Hgm/SGa95Icj5A9/1wN/sh4MK+xdcDr53K9iVJJ2fRoZ/krCTvO34b+HngBeAh4MZuthuBB7vbDwFbkpyR5CLgYuDZxW5fknTyTuV3qnXAN5McX89/r6o/SPKnwANJPge8AnwKoKr2JXkAeBE4CtxSVcMZjJWkRi069Kvq+8DPzFD/S+AjsyyzHdi+2G1Kkk6Nf5ErSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGLPofo0ut27Dt4YFv49YrjnLTMmxH7fBIX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1Jasiyh36Sa5K8nGR/km3LvX1Jatmyhn6S04D/AvwCcCnw6SSXLuc+SFLLlvtIfxOwv6q+X1V/A0wA1y7zPkhSs1JVy7ex5BeBa6rqX3XTnwF+rqo+P22+rcDWbvIfAC8PaJfWAj8Y0LpHxWrv0f5WvtXe4zD6+wFAVV0z/Y7lvp5+Zqj92E+dqtoF7Br4ziS7q2ps0NsZptXeo/2tfKu9x1Hrb7mHdw4BF/ZNrwdeW+Z9kKRmLXfo/ylwcZKLkrwX2AI8tMz7IEnNWtbhnao6muTzwGPAacBXqmrfcu7DNAMfQhoBq71H+1v5VnuPI9Xfsn6QK0kaLv8iV5IaYuhLUkNWXegn+UqSw0le6KvdkeTVJM93Xx/r6u9Jcm+SvUleSnJ73zIbu/r+JHcmmel002U3U39d/V93l7fYl+S3+uq3dz28nOTqvvqK7y/JP0+yp+tjT5IP980/kv3ByT+H3X1/L8lUkl/rq41kj4t4jf50kqe7+t4kP9HVR7I/OOnX6WjlTFWtqi/gnwL/CHihr3YH8GszzPtLwER3+yeBg8CGbvpZ4B/T+9uCR4FfGHZvc/S3Gfg2cEY3/VPd90uBPwfOAC4Cvgector6+4fAB7rblwOv9i0zkv2dbI99938d+B/9r+NR7fEkn8PTge8AP9NN/91Rf40uoseRyplVd6RfVU8CP1zo7MBZSU4HzgT+BngryfnA2VX1dPWemfuA6waxvydrlv5uBnZU1bvdPIe7+rX0XmzvVtUBYD+wabX0V1V/VlXH/85jH/ATSc4Y5f7gpJ9DklwHfJ9ej8drI9vjSfb388B3qurPu/pfVtWxUe4PTrrHkcqZVRf6c/h8ku90v5ad19V+D3gHeB14BfjtqvohcAG9PyQ77lBXG1WXAFcl+Z9J/ijJz3b1C4D/1Tff8T5WS3/9/gXwZ90bbqX1B7P0mOQs4DbgN6fNv9J6nO05vASoJI8leS7Jv+vqK60/mL3HkcqZ5b4Mw7DcDXyB3k/cLwA7gc/SuwDcMeADwHnAHyf5Ngu8XMQIOZ3e/l8J/CzwQJK/z+x9rIr+uqMjklwG/Cd6R42w8vqD2Z/D3wS+WFVT04Z7V1qPs/V3OvBPutr/AZ5Isgd4a4Z1jHJ/MHuPI5UzTYR+Vb1x/HaSLwPf6iZ/CfiDqvpb4HCSPwHGgD+md4mI40b9chGHgG90Ifhskh/Ru8jTbJe9OMTq6O9/J1kPfBO4oaq+1zf/SuoPZu/x54Bf7D4UPBf4UZL/S2+MfyX1ONdr9I+q6gcASR6hN1Z+PyurP5i9x5HKmSaGd7qxs+M+CRz/xP0V4MPpOYveT+i/qKrXgbeTXNl9mn4D8OCy7vTJ+X3gwwBJLgHeS+8qew8BW7px7ouAi4FnV0t/Sc4FHgZur6o/OT7zCuwPZumxqq6qqg1VtQH4z8B/rKq7VmCPv8/Mr9HHgJ9O8pPdmPc/A15cgf3B7D2OVs4M+pPi5f4CvkZv7Oxv6f3k/Rzw34C99M4SeAg4v5t3Db0zIvYBLwL/tm89Y/R+OHwPuIvur5eH/TVLf++ld2T0AvAc8OG++f9918PL9J0ZsBr6A/4DvbHS5/u+jp8xMZL9LeY57FvuDk48e2cke1zEa/Rfdu/BF4DfGvX+FvE6Hamc8TIMktSQJoZ3JEk9hr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqyP8DDE9W+GtusxEAAAAASUVORK5CYII=", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "df_contracts.startY.hist(bins=10)" - ] - }, - { - "cell_type": "code", - "execution_count": 81, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.026105873821609893" - ] - }, - "execution_count": 81, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# proportion of female apprentices\n", - "1-(df_contracts.a_gender.sum()/df_contracts.shape[0])" - ] - }, - { - "cell_type": "code", - "execution_count": 82, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.023723194861701047" - ] - }, - "execution_count": 82, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# proportion of female masters\n", - "1-(df_contracts.m_gender.sum()/df_contracts.shape[0])" - ] - }, - { - "cell_type": "code", - "execution_count": 83, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.7310924369747899" - ] - }, - "execution_count": 83, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# prop female apprentices with male master\n", - "df_contracts[(df_contracts.a_gender == 0) & (df_contracts.startY < 1800)].m_gender.sum()\\\n", - " /df_contracts[(df_contracts.a_gender == 0) & (df_contracts.startY < 1800)].shape[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 84, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "0.9810528582193992" - ] - }, - "execution_count": 84, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# prop male apprentices with male master\n", - "df_contracts[(df_contracts.a_gender == 1) & (df_contracts.startY < 1800)].m_gender.sum()\\\n", - " /df_contracts[(df_contracts.a_gender == 1) & (df_contracts.startY < 1800)].shape[0]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Looking at empirical distributions" - ] - }, - { - "cell_type": "code", - "execution_count": 87, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 87, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD4CAYAAAAAczaOAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAT2ElEQVR4nO3df4zk9X3f8eerh0OpiRM74NXlDveIdLbCj+QqVhTJbbTESbkYK+AqTg5RA7Wrsy0s2epVLaSR7MY6CbVx3FqJic4GgWWXCwqxQcEkITQrXAmK72zi44epD3OxlzvdySY1rGNdc/jdP+a7ZbLs7e7M7M3CfJ4PaTQzn+/38/1+5s3x2u985jvfSVUhSWrDP1jvAUiSxsfQl6SGGPqS1BBDX5IaYuhLUkNOW+8BrOSss86qLVu2DNzvBz/4Aa997WvXfkCvMtbhJdaixzr0THod9u/f/92qOntx+ys+9Lds2cK+ffsG7jc7O8vMzMzaD+hVxjq8xFr0WIeeSa9Dkr9eqt3pHUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1Jasgr/hu562XLDfcuu/zQTZePaSSStHY80pekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNWTH0k9ya5FiSx/ra/jDJo93tUJJHu/YtSX7Yt+wP+vpclORAkoNJPpkkp+QVSZJOajXn6d8G/B7w2YWGqvqNhcdJPg58v2/9p6tq2xLbuRnYCTwMfAnYDtw38IglSUNb8Ui/qh4EnltqWXe0/uvAHcttI8lG4HVV9VBVFb0/IFcOPFpJ0khG/UbuPweOVtU3+9rOTfI14Hngt6rqy8AmYK5vnbmubUlJdtJ7V8DU1BSzs7MDD2x+fn6ofgt2XXhi2eWjbHucRq3DJLEWPdahp9U6jBr6V/H3j/KPAG+qqu8luQj4YpLzgaXm7+tkG62qPcAegOnp6Rrmx4tH/dHj61a6DMPVw297nCb9x58HYS16rENPq3UYOvSTnAb8S+CihbaqOg4c7x7vT/I08GZ6R/ab+7pvBg4Pu29J0nBGOWXzl4BvVNX/n7ZJcnaSDd3jnwG2At+qqiPAC0ku6T4HuAa4e4R9S5KGsJpTNu8AHgLekmQuyXu7RTt4+Qe4vwB8PclfAX8EvL+qFj4E/gDwGeAg8DSeuSNJY7fi9E5VXXWS9uuWaLsLuOsk6+8DLhhwfJKkNeQ3ciWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGrOaH0W9NcizJY31tH03ybJJHu9vb+5bdmORgkqeSXNbXflGSA92yTybJ2r8cSdJyVnOkfxuwfYn2T1TVtu72JYAk5wE7gPO7Pp9KsqFb/2ZgJ7C1uy21TUnSKbRi6FfVg8Bzq9zeFcDeqjpeVc8AB4GLk2wEXldVD1VVAZ8FrhxyzJKkIZ02Qt8PJrkG2Afsqqq/ATYBD/etM9e1/V33eHH7kpLspPeugKmpKWZnZwce3Pz8/FD9Fuy68MSyy0fZ9jiNWodJYi16rENPq3UYNvRvBj4GVHf/ceA9wFLz9LVM+5Kqag+wB2B6erpmZmYGHuDs7CzD9Ftw3Q33Lrv80NXDb3ucRq3DJLEWPdahp9U6DHX2TlUdraoXq+pHwKeBi7tFc8A5fatuBg537ZuXaJckjdFQod/N0S94J7BwZs89wI4kpyc5l94Hto9U1RHghSSXdGftXAPcPcK4JUlDWHF6J8kdwAxwVpI54CPATJJt9KZoDgHvA6iqx5PcCTwBnACur6oXu019gN6ZQGcA93U3SdIYrRj6VXXVEs23LLP+bmD3Eu37gAsGGp0kaU35jVxJaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ1Z8ecStbQtN9x70mWHbrp8jCORpNVb8Ug/ya1JjiV5rK/tvyT5RpKvJ/lCkp/s2rck+WGSR7vbH/T1uSjJgSQHk3wySU7JK5IkndRqpnduA7YvarsfuKCqfg7438CNfcuerqpt3e39fe03AzuBrd1t8TYlSafYiqFfVQ8Czy1q+/OqOtE9fRjYvNw2kmwEXldVD1VVAZ8FrhxqxJKkoa3FnP57gD/se35ukq8BzwO/VVVfBjYBc33rzHVtS0qyk967AqamppidnR14UPPz80P1W7DrwhMrr3QSo+x3rY1ah0liLXqsQ0+rdRgp9JP8R+AE8Pmu6Qjwpqr6XpKLgC8mOR9Yav6+TrbdqtoD7AGYnp6umZmZgcc2OzvLMP0WXLfMB7UrOXT18Ptda6PWYZJYix7r0NNqHYYO/STXAu8A3tZN2VBVx4Hj3eP9SZ4G3kzvyL5/CmgzcHjYfUuShjPUefpJtgP/AfjVqvrbvvazk2zoHv8MvQ9sv1VVR4AXklzSnbVzDXD3yKOXJA1kxSP9JHcAM8BZSeaAj9A7W+d04P7uzMuHuzN1fgH47SQngBeB91fVwofAH6B3JtAZwH3dTZI0RiuGflVdtUTzLSdZ9y7grpMs2wdcMNDoJElrysswSFJDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWpI07+Ru9zv3ErSJPJIX5IaYuhLUkOant45VVaaNjp00+VjGokk/X0e6UtSQwx9SWqIoS9JDTH0Jakhhr4kNWTF0E9ya5JjSR7ra3tDkvuTfLO7f33fshuTHEzyVJLL+tovSnKgW/bJdL+oLkkan9Uc6d8GbF/UdgPwQFVtBR7onpPkPGAHcH7X51NJNnR9bgZ2Alu72+JtSpJOsRVDv6oeBJ5b1HwFcHv3+Hbgyr72vVV1vKqeAQ4CFyfZCLyuqh6qqgI+29dHkjQmw345a6qqjgBU1ZEkb+zaNwEP960317X9Xfd4cfuSkuyk966AqakpZmdnBx7g/Pz8iv12XXhi4O2uhWFez7BWU4dWWIse69DTah3W+hu5S83T1zLtS6qqPcAegOnp6ZqZmRl4ILOzs6zU77r1uuDagR8su3gtv7G7mjq0wlr0WIeeVusw7Nk7R7spG7r7Y137HHBO33qbgcNd++Yl2iVJYzRs6N8DXNs9vha4u699R5LTk5xL7wPbR7qpoBeSXNKdtXNNXx9J0pisOL2T5A5gBjgryRzwEeAm4M4k7wW+DbwLoKoeT3In8ARwAri+ql7sNvUBemcCnQHc190kSWO0YuhX1VUnWfS2k6y/G9i9RPs+4IKBRidJWlN+I1eSGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqyNChn+QtSR7tuz2f5MNJPprk2b72t/f1uTHJwSRPJblsbV6CJGm1Vvxh9JOpqqeAbQBJNgDPAl8A/jXwiar6nf71k5wH7ADOB34a+Iskb66qF4cdgyRpMGs1vfM24Omq+utl1rkC2FtVx6vqGeAgcPEa7V+StAprFfo7gDv6nn8wydeT3Jrk9V3bJuA7fevMdW2SpDFJVY22geTHgMPA+VV1NMkU8F2ggI8BG6vqPUl+H3ioqj7X9bsF+FJV3bXENncCOwGmpqYu2rt378Djmp+f58wzz1x2nQPPfn/g7Y7DhZt+Ys22tZo6tMJa9FiHnkmvw6WXXrq/qqYXtw89p9/nV4CvVtVRgIV7gCSfBv6kezoHnNPXbzO9PxYvU1V7gD0A09PTNTMzM/CgZmdnWanfdTfcO/B2x+HQ1TNrtq3V1KEV1qLHOvS0Woe1mN65ir6pnSQb+5a9E3ise3wPsCPJ6UnOBbYCj6zB/iVJqzTSkX6SfwT8MvC+vub/nGQbvemdQwvLqurxJHcCTwAngOs9c0eSxmuk0K+qvwV+alHbu5dZfzewe5R9SpKG5zdyJakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDXE0Jekhhj6ktQQQ1+SGmLoS1JDDH1JaoihL0kNMfQlqSGGviQ1xNCXpIaMFPpJDiU5kOTRJPu6tjckuT/JN7v71/etf2OSg0meSnLZqIOXJA1mLY70L62qbVU13T2/AXigqrYCD3TPSXIesAM4H9gOfCrJhjXYvyRplU7F9M4VwO3d49uBK/va91bV8ap6BjgIXHwK9i9JOolRQ7+AP0+yP8nOrm2qqo4AdPdv7No3Ad/p6zvXtUmSxuS0Efu/taoOJ3kjcH+SbyyzbpZoqyVX7P0B2QkwNTXF7OzswAObn59fsd+uC08MvN1xGOb1nsxq6tAKa9FjHXparcNIoV9Vh7v7Y0m+QG+65miSjVV1JMlG4Fi3+hxwTl/3zcDhk2x3D7AHYHp6umZmZgYe2+zsLCv1u+6Gewfe7jgcunpmzba1mjq0wlr0WIeeVusw9PROktcm+fGFx8C/AB4D7gGu7Va7Fri7e3wPsCPJ6UnOBbYCjwy7f0nS4EY50p8CvpBkYTv/var+NMlXgDuTvBf4NvAugKp6PMmdwBPACeD6qnpxpNFLkgYydOhX1beAn1+i/XvA207SZzewe9h9SpJG4zdyJakhhr4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqiKEvSQ0x9CWpIYa+JDVk1Esr6xTYsszVPw/ddPkYRyJp0nikL0kNMfQlqSGGviQ1xNCXpIYY+pLUEENfkhpi6EtSQwx9SWqIoS9JDRk69JOck+QvkzyZ5PEkH+raP5rk2SSPdre39/W5McnBJE8luWwtXoAkafVGuQzDCWBXVX01yY8D+5Pc3y37RFX9Tv/KSc4DdgDnAz8N/EWSN1fViyOMQZI0gKGP9KvqSFV9tXv8AvAksGmZLlcAe6vqeFU9AxwELh52/5KkwaWqRt9IsgV4ELgA+LfAdcDzwD567wb+JsnvAQ9X1ee6PrcA91XVHy2xvZ3AToCpqamL9u7dO/CY5ufnOfPMM5dd58Cz3x94u+vtwk0/sezyxa9p6gw4+sPV9Z10q/k30QLr0DPpdbj00kv3V9X04vaRr7KZ5EzgLuDDVfV8kpuBjwHV3X8ceA+QJbov+RenqvYAewCmp6drZmZm4HHNzs6yUr/rlrma5SvVoatnll2++DXtuvAEHz9w2qr6TrrV/JtogXXoabUOI529k+Q19AL/81X1xwBVdbSqXqyqHwGf5qUpnDngnL7um4HDo+xfkjSYUc7eCXAL8GRV/W5f+8a+1d4JPNY9vgfYkeT0JOcCW4FHht2/JGlwo0zvvBV4N3AgyaNd228CVyXZRm/q5hDwPoCqejzJncAT9M78ud4zdyRpvIYO/ar6nyw9T/+lZfrsBnYPu09J0mj8Rq4kNcTQl6SGGPqS1BBDX5IaYuhLUkMMfUlqyMiXYdDk2LLCZSkO3XT5mEYi6VSZ6NBfKcRejSbxNUkaH6d3JKkhhr4kNcTQl6SGTPScvsbHD4GlVweP9CWpIR7pa6L1vwPZdeGJl/2ymO9A1BqP9CWpIYa+JDXE0JekhjinLzXmwLPff9lnGwv8jGPyGfoai+VO6ZzUoGnxNb9SLfXfYuGD/db+W4w99JNsB/4bsAH4TFXdNO4xtGrU6/as13V/XqnfAfA6SHo1GmvoJ9kA/D7wy8Ac8JUk91TVE+Mch15ZXqnh+UodlzSKcR/pXwwcrKpvASTZC1wBGPoa2ijh/Ep99zKKUd75jDqu5fY9yrZXek2v1j/Q6zEFmKo6JRtecmfJrwHbq+rfdM/fDfzTqvrgovV2Aju7p28Bnhpid2cB3x1huJPCOrzEWvRYh55Jr8M/rqqzFzeO+0g/S7S97K9OVe0B9oy0o2RfVU2Pso1JYB1eYi16rENPq3UY93n6c8A5fc83A4fHPAZJata4Q/8rwNYk5yb5MWAHcM+YxyBJzRrr9E5VnUjyQeDP6J2yeWtVPX6KdjfS9NAEsQ4vsRY91qGnyTqM9YNcSdL68to7ktQQQ1+SGjJxoZ9ke5KnkhxMcsN6j2ecktya5FiSx/ra3pDk/iTf7O5fv55jHIck5yT5yyRPJnk8yYe69qZqkeQfJnkkyV91dfhPXXtTdeiXZEOSryX5k+55c7WYqNDvu8zDrwDnAVclOW99RzVWtwHbF7XdADxQVVuBB7rnk+4EsKuqfha4BLi++3fQWi2OA79YVT8PbAO2J7mE9urQ70PAk33Pm6vFRIU+fZd5qKr/Cyxc5qEJVfUg8Nyi5iuA27vHtwNXjnNM66GqjlTVV7vHL9D7n3wTjdWieua7p6/pbkVjdViQZDNwOfCZvubmajFpob8J+E7f87murWVTVXUEemEIvHGdxzNWSbYA/wT4XzRYi24641HgGHB/VTVZh85/Bf498KO+tuZqMWmhv6rLPKgNSc4E7gI+XFXPr/d41kNVvVhV2+h9+/3iJBes85DWRZJ3AMeqav96j2W9TVroe5mHlzuaZCNAd39sncczFkleQy/wP19Vf9w1N1kLgKr6P8Asvc98WqzDW4FfTXKI3rTvLyb5HA3WYtJC38s8vNw9wLXd42uBu9dxLGORJMAtwJNV9bt9i5qqRZKzk/xk9/gM4JeAb9BYHQCq6saq2lxVW+jlwv+oqn9Fg7WYuG/kJnk7vbm7hcs87F7fEY1PkjuAGXqXjD0KfAT4InAn8Cbg28C7qmrxh70TJck/A74MHOCl+dvfpDev30wtkvwcvQ8nN9A7wLuzqn47yU/RUB0WSzID/LuqekeLtZi40JckndykTe9IkpZh6EtSQwx9SWqIoS9JDTH0Jakhhr4kNcTQl6SG/D8WXZv35dC0cAAAAABJRU5ErkJggg==", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "df_contracts[df_contracts.annual_salary < 50].annual_salary.hist(bins=40)" - ] - }, - { - "cell_type": "code", - "execution_count": 88, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 88, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX8AAAD4CAYAAAAEhuazAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQaUlEQVR4nO3df6xf9V3H8efLMpHQbRTZGkLRommMQBXlBknmzG02Rx1LYEZMCRklznRZINkS/ljZP5uaJo1xU5cJsROyks01jduksaISshtcMmTtgpYfQ5pRsbRpM2GMLgYte/vHPa3fdd/b9nt7f33P5/lIvvme8znnfM/nndP7uud+zvmepqqQJLXlJxa7A5KkhWf4S1KDDH9JapDhL0kNMvwlqUHnLXYHzuSSSy6p1atXn5z/wQ9+wIUXXrh4HZonfa0L+lubdY2fvtY2rK69e/d+t6reNtM2Sz78V69ezZ49e07OT01NMTk5uXgdmid9rQv6W5t1jZ++1jasriT/cbptHPaRpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGLflv+EpzbfXm3SOtf2DrjfPUE2nxeOYvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSg84Y/kkuT/K1JM8meTrJR7r2i5M8kuT57n3FwDb3JNmf5LkkNwy0X5tkX7fsM0kyP2VJkk7nbM78jwN3V9UvAtcDdya5EtgMPFpVa4BHu3m6ZRuAq4D1wL1JlnWfdR+wCVjTvdbPYS2SpLN0xvCvqsNV9a1u+jXgWeAy4CZge7faduDmbvomYEdVvV5VLwD7geuSXAq8paq+UVUFPDiwjSRpAWU6h89y5WQ18BhwNfBiVV00sOyVqlqR5LPA41X1ha79fuBh4ACwtare3bW/E/hYVb1vyH42Mf0XAitXrrx2x44dJ5cdO3aM5cuXj1blGOhrXbD0atv30qsjrb/2srcObV9qdc2VvtYF/a1tWF3r1q3bW1UTM21z1v+Hb5LlwJeBj1bV908zXD9sQZ2m/ccbq7YB2wAmJiZqcnLy5LKpqSkG5/uir3XB0qvtjlH/D9/bJoe2L7W65kpf64L+1jabus7qbp8kb2I6+L9YVV/pmo90Qzl070e79oPA5QObrwIOde2rhrRLkhbY2dztE+B+4Nmq+vTAol3Axm56I/DQQPuGJOcnuYLpC7tPVNVh4LUk13efefvANpKkBXQ2wz7vAD4A7EvyZNf2cWArsDPJB4EXgVsAqurpJDuBZ5i+U+jOqnqj2+7DwOeBC5i+DvDw3JQhSRrFGcO/qr7O8PF6gHfNsM0WYMuQ9j1MXyyWJC0iv+ErSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhp0xvBP8kCSo0meGmj7ZJKXkjzZvd47sOyeJPuTPJfkhoH2a5Ps65Z9JknmvhxJ0tk4mzP/zwPrh7T/aVVd073+HiDJlcAG4Kpum3uTLOvWvw/YBKzpXsM+U5K0AM4Y/lX1GPDyWX7eTcCOqnq9ql4A9gPXJbkUeEtVfaOqCngQuHmWfZYknaPzzmHbu5LcDuwB7q6qV4DLgMcH1jnYtf1vN31q+1BJNjH9VwIrV65kamrq5LJjx479yHxf9LUuWHq13b32+Ejrz9T3pVbXXOlrXdDf2mZT12zD/z7gj4Dq3j8F/B4wbBy/TtM+VFVtA7YBTExM1OTk5MllU1NTDM73RV/rgqVX2x2bd4+0/oHbJoe2L7W65kpf64L+1jabumZ1t09VHamqN6rqh8DngOu6RQeBywdWXQUc6tpXDWmXJC2CWYV/N4Z/wvuBE3cC7QI2JDk/yRVMX9h9oqoOA68lub67y+d24KFz6Lck6RyccdgnyZeASeCSJAeBTwCTSa5heujmAPAhgKp6OslO4BngOHBnVb3RfdSHmb5z6ALg4e4lSVoEZwz/qrp1SPP9p1l/C7BlSPse4OqReidJmhd+w1eSGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IadMb/yUtaSKs37x55mwNbb5yHnkj95pm/JDXI8JekBjnso7E3m6EiqXWe+UtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDzhj+SR5IcjTJUwNtFyd5JMnz3fuKgWX3JNmf5LkkNwy0X5tkX7fsM0ky9+VIks7G2Zz5fx5Yf0rbZuDRqloDPNrNk+RKYANwVbfNvUmWddvcB2wC1nSvUz9TkrRAzhj+VfUY8PIpzTcB27vp7cDNA+07qur1qnoB2A9cl+RS4C1V9Y2qKuDBgW0kSQss01l8hpWS1cDfVdXV3fz3quqigeWvVNWKJJ8FHq+qL3Tt9wMPAweArVX17q79ncDHqup9M+xvE9N/JbBy5cprd+zYcXLZsWPHWL58+eiVLnF9rQtGq23fS6/Oc29Gt/aytw5t7+sx62td0N/ahtW1bt26vVU1MdM2581xH4aN49dp2oeqqm3ANoCJiYmanJw8uWxqaorB+b7oa10wWm13bN49v52ZhQO3TQ5t7+sx62td0N/aZlPXbMP/SJJLq+pwN6RztGs/CFw+sN4q4FDXvmpIu7TkrZ7hF9Lda48P/WV1YOuN890l6ZzN9lbPXcDGbnoj8NBA+4Yk5ye5gukLu09U1WHgtSTXd3f53D6wjSRpgZ3xzD/Jl4BJ4JIkB4FPAFuBnUk+CLwI3AJQVU8n2Qk8AxwH7qyqN7qP+jDTdw5dwPR1gIfntBJJ0lk7Y/hX1a0zLHrXDOtvAbYMad8DXD1S7yRJ82KuL/hKzZvpGsFMvEagxeDjHSSpQYa/JDXI8JekBjnmr3m1evPuGe+Hl7R4PPOXpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDTL8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUHnLXYHJI1u9ebdI61/YOuN89QTjSvP/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUHnFP5JDiTZl+TJJHu6touTPJLk+e59xcD69yTZn+S5JDeca+clSbMzF2f+66rqmqqa6OY3A49W1Rrg0W6eJFcCG4CrgPXAvUmWzcH+JUkjmo9hn5uA7d30duDmgfYdVfV6Vb0A7Aeum4f9S5LOIFU1+42TF4BXgAL+sqq2JfleVV00sM4rVbUiyWeBx6vqC137/cDDVfU3Qz53E7AJYOXKldfu2LHj5LJjx46xfPnyWfd5qeprXfteepWVF8CR/17snsy9uapr7WVvHXmbfS+9Om/76Ou/RehvbcPqWrdu3d6BEZkfc65P9XxHVR1K8nbgkSTfPs26GdI29DdPVW0DtgFMTEzU5OTkyWVTU1MMzvdFX+u6Y/Nu7l57nE/t698DZOeqrgO3TY68zR2jPtVzhH309d8i9Le22dR1TsM+VXWoez8KfJXpYZwjSS4F6N6PdqsfBC4f2HwVcOhc9i9Jmp1Zh3+SC5O8+cQ08B7gKWAXsLFbbSPwUDe9C9iQ5PwkVwBrgCdmu39J0uydy9+sK4GvJjnxOX9dVf+Q5JvAziQfBF4EbgGoqqeT7ASeAY4Dd1bVG+fUe0nSrMw6/KvqO8AvD2n/L+BdM2yzBdgy231KkuaG3/CVpAYZ/pLUIMNfkhpk+EtSgwx/SWqQ4S9JDerfd+6lMbN6xEc1zPc+7l57nMn564qWCM/8JalBhr8kNcjwl6QGGf6S1CDDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXI8JekBhn+ktQgw1+SGmT4S1KDDH9JapDhL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIMNfkhpk+EtSg85b7A5ovKzevHuxuyBpDnjmL0kNMvwlqUGGvyQ1yPCXpAYZ/pLUIO/2aZx370htMvwlnbNRTyIObL1xnnqis2X4S1ryZvMXqr9gTm/Bwz/JeuDPgWXAX1XV1oXug6TTcziw/xY0/JMsA/4C+E3gIPDNJLuq6pmF7Mc48YdQ0nxY6DP/64D9VfUdgCQ7gJsAw19qyEKc1Azbx91rj3PHHO171GGlpXZdJFU1rzv4kZ0lvwOsr6rf7+Y/APxaVd11ynqbgE3d7C8Azw0svgT47gJ0d6H1tS7ob23WNX76Wtuwun62qt420wYLfeafIW0/9tunqrYB24Z+QLKnqibmumOLra91QX9rs67x09faZlPXQn/J6yBw+cD8KuDQAvdBkpq30OH/TWBNkiuS/CSwAdi1wH2QpOYt6LBPVR1Pchfwj0zf6vlAVT094scMHQ7qgb7WBf2tzbrGT19rG7muBb3gK0laGnywmyQ1yPCXpAaNTfgnWZ/kuST7k2xe7P7MpSQHkuxL8mSSPYvdn9lK8kCSo0meGmi7OMkjSZ7v3lcsZh9na4baPpnkpe64PZnkvYvZx9lIcnmSryV5NsnTST7StY/1cTtNXX04Zj+V5Ikk/9rV9gdd+0jHbCzG/LvHQvw7A4+FAG7ty2MhkhwAJqpqrL98kuQ3gGPAg1V1ddf2x8DLVbW1+6W9oqo+tpj9nI0ZavskcKyq/mQx+3YuklwKXFpV30ryZmAvcDNwB2N83E5T1+8y/scswIVVdSzJm4CvAx8BfpsRjtm4nPmffCxEVf0PcOKxEFpCquox4OVTmm8CtnfT25n+ARw7M9Q29qrqcFV9q5t+DXgWuIwxP26nqWvs1bRj3eybulcx4jEbl/C/DPjPgfmD9ORAdgr4pyR7u0db9MnKqjoM0z+QwNsXuT9z7a4k/9YNC43V0MipkqwGfgX4F3p03E6pC3pwzJIsS/IkcBR4pKpGPmbjEv5n9ViIMfaOqvpV4LeAO7shBi199wE/D1wDHAY+tai9OQdJlgNfBj5aVd9f7P7MlSF19eKYVdUbVXUN009JuC7J1aN+xriEf68fC1FVh7r3o8BXmR7m6osj3fjriXHYo4vcnzlTVUe6H8IfAp9jTI9bN278ZeCLVfWVrnnsj9uwuvpyzE6oqu8BU8B6Rjxm4xL+vX0sRJILuwtSJLkQeA/w1Om3Giu7gI3d9EbgoUXsy5w68YPWeT9jeNy6i4f3A89W1acHFo31cZuprp4cs7cluaibvgB4N/BtRjxmY3G3D0B3S9af8f+PhdiyuD2aG0l+jumzfZh+3MZfj2ttSb4ETDL9eNkjwCeAvwV2Aj8DvAjcUlVjd+F0htommR4+KOAA8KETY67jIsmvA/8M7AN+2DV/nOnx8bE9bqep61bG/5j9EtMXdJcxfQK/s6r+MMlPM8IxG5vwlyTNnXEZ9pEkzSHDX5IaZPhLUoMMf0lqkOEvSQ0y/CWpQYa/JDXo/wCyR1tTDJOpXgAAAABJRU5ErkJggg==", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "df_contracts[df_contracts.a_age < 30].a_age.hist(bins=25)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Two very important distributions" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Normal\n", - "\n", - "Also known as Gaussian, is a bell-shaped distribution with mass around the mean and exponentially decaying on the sides. It is fully characterized by the mean (center of mass) and standard deviation (spread).\n", - "\n", - "https://en.wikipedia.org/wiki/Normal_distribution" - ] - }, - { - "cell_type": "code", - "execution_count": 89, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 89, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAFgCAYAAACFYaNMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAXPElEQVR4nO3db5Cd5Xnf8e/FYmsxFl3hCGZBmgLV2hOwi5zZpY6pPI5JgpJ4DO2ERJnGpTO0vIBkjN0mhfRFJy/UIVMPpdOp0tFgYqWxoYptjOJkiGUFEtO6ZoXC2og/wxZZsJFACx4VSJGCxNUX55F0dHSkPZL2Off58/3MaM4593meo0ug/el57nP/icxEktR955QuQJKGlQEsSYUYwJJUiAEsSYUYwJJUyLmlCzgba9euzUceeaR0GZK0kGjX2NdXwK+99lrpEiTpjPV1AEtSPzOAJakQA1iSCjGAJakQA1iSCjGAJakQA1iSCjGAJakQA1iSCjGAJakQA1iSCjGAJakQA1iSCunr5Siluhw4cIDp6enj2qamphgdHS1UkQaRASy1MT09zR0bvsnYilUA7J+b5d7bYM2aNYUr0yAxgKWTGFuxiuUTq0uXoQFmH7AkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhtQZwRPwoIn4YEU9FxPaq7cKI2BoRL1SPy5qOvysiZiPi+Yi4vs7aJKm0blwB/0xmrs7Myer1ncC2zJwAtlWviYgrgXXAVcBaYENEjHShPkkqokQXxA3Apur5JuDGpvYHM/NgZu4CZoFrul+eJHVH3QGcwLcj4smIuLVquzgz9wJUjxdV7ZcCLzedO1e1HScibo2I7RGxfX5+vsbSJaledW9Lf21m7omIi4CtEfHcKY6NNm15QkPmRmAjwOTk5AnvS1K/qPUKODP3VI/7gIdodCm8GhHjANXjvurwOWBl0+krgD111idJJdUWwBFxfkQsPfIc+HngaWALcHN12M3Aw9XzLcC6iFgSEZcDE8ATddUnSaXV2QVxMfBQRBz5fb6amY9ExDSwOSJuAV4CbgLIzJ0RsRl4BjgE3J6Zh2usT5KKqi2AM/NF4Oo27a8D153knPXA+rpqkqRe4kw4SSqk7lEQ0kA4fOgdZmZmjms7ePAgAEuWLDmufWpqitHR0a7Vpv5lAEsdePOV3WzY9Tbju49Nzpzb8RgjSz/A+MRHjrbtn5vl3ttgzZo1JcpUnzGApQ4tHb+C5ROrj77ePzfLuWPjx7VJp8M+YEkqxACWpEIMYEkqxACWpEIMYEkqxACWpEIMYEkqxHHAGjoHDhxgenr6uDZnr6kEA1hDZ3p6mjs2fJOxFasAZ6+pHANYQ2lsxSpnsKk4A1hDr91COzMzM7z7rl+RqF4GsIZe+4V2vsvYB6cKVqVhYABLtF9oR6qb91iSVIgBLEmFGMCSVIgBLEmFGMCSVIijIDTQ2k07doyveoUBrIHWOu0YHOOr3mEAa+C1Tjt2jK96hfdhklSIASxJhRjAklSIASxJhfglnFSjdsPgwB041GAAS4uodW3hmZkZ7n/8RZatnDja5g4cOsIAlhZR69rCR8Ycu/uG2jGApUXWvLawY451Kn4JJ0mFGMCSVIhdEOpbrSMMDh48CMCSJUuOtrnwjnqZAay+1brQztyOxxhZ+gHGJz5y9BgX3lEvM4DV15oX2tk/N8u5Y+MuvKO+4b2ZJBViAEtSIQawJBViAEtSIQawJBXiKAipy1oX7AFXRxtWBrDUZa0L9rg62vAygKUCmhfs0fCyD1iSCqk9gCNiJCL+OiK+Vb2+MCK2RsQL1eOypmPviojZiHg+Iq6vuzZJKqkbV8CfA55ten0nsC0zJ4Bt1Wsi4kpgHXAVsBbYEBEjXahPkoqoNYAjYgXwS8B9Tc03AJuq55uAG5vaH8zMg5m5C5gFrqmzPkkqqe4r4HuB3wbebWq7ODP3AlSPF1XtlwIvNx03V7UdJyJujYjtEbF9fn6+lqIlqRtqC+CI+DSwLzOf7PSUNm15QkPmxsyczMzJ5cuXn1WNklRSncPQrgU+ExG/CIwCF0TEHwGvRsR4Zu6NiHFgX3X8HLCy6fwVwJ4a65N6QruJGeDkjGFQWwBn5l3AXQAR8Ung32Tmr0fEfwRuBu6uHh+uTtkCfDUi7gEuASaAJ+qqT+oVrRMzwMkZw6LERIy7gc0RcQvwEnATQGbujIjNwDPAIeD2zDxcoD6p65yYMZy6EsCZ+RjwWPX8deC6kxy3HljfjZrUX1r3fwP3e1P/cyqy+kLr/m/gfm/qfwaw+kbz/m/gfm/qf96/SVIhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFeJiPOpJrctPuvSkBpEBrJ7UuvykS09qEBnA6lnNy0+69KQGkQEs9SA36hwOBrDUg9yoczgYwFKPcqPOwefXypJUiAEsSYXYBaHi3HJew8oAVnFuOa9hZQCrJ7jlvIaR93iSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFuBiP1Cfa7RPnHnH9zQCW+kTrPnHuEdf/DGCpj7hP3GCxD1iSCjGAJakQA1iSCjGAJakQA1iSCjGAJakQA1iSCjGAJamQjgI4Iq7tpK3l/dGIeCIiZiJiZ0T8btV+YURsjYgXqsdlTefcFRGzEfF8RFx/un8YSeonnV4B/5cO25odBD6VmVcDq4G1EfEx4E5gW2ZOANuq10TElcA64CpgLbAhIkY6rE+S+s4ppyJHxE8DHweWR8QXmt66ADhlOGZmAm9VL99T/UrgBuCTVfsm4DHg31btD2bmQWBXRMwC1wDf6/yPI0n9Y6Er4PcC76cR1Eubfr0B/PJCHx4RIxHxFLAP2JqZ3wcuzsy9ANXjRdXhlwIvN50+V7W1fuatEbE9IrbPz88vVIIk9axTXgFn5l8CfxkRX87M3af74Zl5GFgdEWPAQxHx4VMcHu0+os1nbgQ2AkxOTp7wviT1i05XQ1sSERuBy5rPycxPdXJyZu6PiMdo9O2+GhHjmbk3IsZpXB1D44p3ZdNpK4A9HdYnSX2n0wD+Y+C/AfcBhzs5ISKWA+9U4Xse8LPA7wFbgJuBu6vHh6tTtgBfjYh7gEuACeCJDuuTpL7TaQAfyszfP83PHgc2VSMZzgE2Z+a3IuJ7wOaIuAV4CbgJIDN3RsRm4BngEHB71YUhSQOp0wD+k4i4DXiIxvAyADLzxyc7ITN/AHy0TfvrwHUnOWc9sL7DmiSpr3UawDdXj7/V1JbAFYtbjiQNj44CODMvr7sQSRo2HQVwRPzzdu2Z+YeLW44kDY9OuyCmmp6P0ujD3QEYwJJ0hjrtgvjN5tcR8feA/15LRZI6cvjQO8zMzJzQPjU1xejoaIGKdLrOdFv6/0djnK6kQt58ZTcbdr3N+O5jy7Lsn5vl3ttgzZo1BStTpzrtA/4Tjk0LHgF+EthcV1GSOrN0/AqWT6wuXYbOUKdXwF9sen4I2J2ZczXUI0lDo6P1gKtFeZ6jsRLaMuDv6ixKkoZBpzti/AqNdRluAn4F+H5ELLgcpSTp5Drtgvh3wFRm7oOjC+18B/haXYVJ0qDrdEuic46Eb+X10zhXktRGp1fAj0TEnwMPVK9/FfizekqSpOGw0J5wq2hsIfRbEfFPgX9MY+eK7wFf6UJ9kjSwFupGuBd4EyAzv5GZX8jMz9O4+r233tIkabAtFMCXVev6Hiczt9PYnkiSdIYWCuBTTSg/bzELkaRhs1AAT0fEv2ptrLYTerKekiRpOCw0CuIOGtvJ/zOOBe4k8F7gn9RYlyQNvFMGcGa+Cnw8In4G+HDV/KeZ+Re1VyZJA67T9YAfBR6tuRZJGirOZpOkQs50QXbpjBw4cIDp6enj2mZmZnj3Xa8FNHwMYHXV9PQ0d2z4JmMrVh1tm9vxXcY+OHWKs6TBZACr68ZWrDpuF4f9c7PlipEK8r5PkgrxCli1au3ztb9XOsYAVq1a+3zt75WOMYBVu+Y+X/t7pWO8F5SkQgxgSSrEAJakQgxgSSrEAJakQgxgSSrEAJakQgxgSSrEAJakQpwJJw2Qw4feYWZm5ri2qakpRkdPtcG5SjGApQHy5iu72bDrbcZ3jwCNqd/33gZr1qwpXJnaMYClAbN0/Irj1ltW77IPWJIKMYAlqRC7ILRo3HBTOj0GsBaNG25Kp8cA1qJyw02pc94bSlIhtQVwRKyMiEcj4tmI2BkRn6vaL4yIrRHxQvW4rOmcuyJiNiKej4jr66pNknpBnVfAh4B/nZk/CXwMuD0irgTuBLZl5gSwrXpN9d464CpgLbAhIkZqrE+SiqotgDNzb2buqJ6/CTwLXArcAGyqDtsE3Fg9vwF4MDMPZuYuYBa4pq76JKm0rvQBR8RlwEeB7wMXZ+ZeaIQ0cFF12KXAy02nzVVtkjSQag/giHg/8HXgjsx841SHtmnLNp93a0Rsj4jt8/Pzi1WmJHVdrQEcEe+hEb5fycxvVM2vRsR49f44sK9qnwNWNp2+AtjT+pmZuTEzJzNzcvny5fUVL0k1q3MURABfAp7NzHua3toC3Fw9vxl4uKl9XUQsiYjLgQngibrqk6TS6pyIcS3wWeCHEfFU1fY7wN3A5oi4BXgJuAkgM3dGxGbgGRojKG7PzMM11idJRdUWwJn5OO37dQGuO8k564H1ddUkSb3EmXCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVEidC7JLKuzwoXeYmZk5oX1qaorR0dECFamZASwNsDdf2c2GXW8zvnvkaNv+uVnuvQ3WrFlTsDKBAayzcODAAaanp4++npmZ4d137dXqNUvHr2D5xOrSZagNA1hnbHp6mjs2fJOxFasAmNvxXcY+OFW4Kql/GMA6K2MrVh29uto/N1u2GKnPeL8oSYUYwJJUiAEsSYUYwJJUiF/CSUPGyRm9wwCWhoyTM3qHASwNISdn9Ab7gCWpEANYkgqxC0IdaV33AVz7QTpbBrA60rruA7j2g3S2DGB1rHndB3DtB+lsef8oSYUYwJJUiAEsSYUYwJJUiAEsSYU4CkJS2wV6XJynfgawpBMW6HFxnu4wgCUBLtBTgn3AklSIV8A6ges+SN1hAOsErvsgdYcBrLZc90Gqn/eUklSIASxJhRjAklSIASxJhdQWwBFxf0Tsi4inm9oujIitEfFC9bis6b27ImI2Ip6PiOvrqkuSekWdV8BfBta2tN0JbMvMCWBb9ZqIuBJYB1xVnbMhIkZqrE2SiqstgDPzr4AftzTfAGyqnm8CbmxqfzAzD2bmLmAWuKau2iSpF3S7D/jizNwLUD1eVLVfCrzcdNxc1XaCiLg1IrZHxPb5+flai5WkOvXKRIxo05btDszMjcBGgMnJybbHSDo77ZanBJeoXGzdDuBXI2I8M/dGxDiwr2qfA1Y2HbcC2NPl2iRVWpenBJeorEO3A3gLcDNwd/X4cFP7VyPiHuASYAJ4osu1SWri8pT1qy2AI+IB4JPAT0TEHPDvaQTv5oi4BXgJuAkgM3dGxGbgGeAQcHtmHq6rNknqBbUFcGb+2kneuu4kx68H1tdVjyT1GmfCSVIhBrAkFWIAS1IhvTIOWAW1bkHk9kNSdxjAOmELIrcfkrrDABZw/BZEbj+kdtrNjnNm3NkxgCV1pHV2nDPjzp4BLKljzo5bXH7TIkmFGMCSVIgBLEmF2Ac8ZFrH/ILjfqVSDOAh0zrmFxz3K5ViAA+h5jG/4LhfqRTvOyWpEANYkgoxgCWpEANYkgoxgCWpEEdBSDoj7VZHA1dIOx0GsKQz0ro6GsCPdz/HLZ+Y4eqrrz7aZiCfnAEs6Yy1ro62f26WDd95xiUrO2QAS1pULlnZOQN4wLnfm9S7DOAB535vUu8ygIeA+71Jvcl7UUkqxACWpEIMYEkqxACWpEL8Ek5SbZyufGoG8ABxvzf1mnbTlZ0dd4wBPEDc7029yJlxJ2cA96mTXe1ecMk/cL83qU8YwH3Kq12p/xnAfczdjaX+ZgBL6ipHRhxjAEvqKkdGHGMA9wmXldQgaR0Z0e6qeBiuiA3gPuGykhpkrVfF7bY2gsELZQO4j7ispAZZ81Vx69ZGR9oGrZvCAO4Brd0LBw8eBGDJkiVH2+xy0LAZhgkcBnAPOLF74TFGln6A8YmPHD3GLgdp8BjAPaK1e+HcsXHH+EoDzntaSSrEK+Auc8UyaXG0+1mC/hopYQDX6GRhe//jL7Js5cTRNvt3pYW1jhVu97PUbyMlDOAanWrBHPt3pdPTOla43c9Sv01z7rkAjoi1wH8GRoD7MvPuwiWdFRfMkRZP61jhVos1zblb3Rs9FcARMQL8V+DngDlgOiK2ZOYzi/V7dDLmttO21v8ZTheWyltomnMnP9/d6t7oqQAGrgFmM/NFgIh4ELgBWLQAnp6e5l/8zhc5/wPjALz24tOMnLeUZeN//+gxnbT97et7+fyv/txxUyVnZmb4T/9j63GfPXbFP+Scc+LoMW/tm2Pk7beZf//5J21brGP8bD+79GeX/v0B9v7gf/J7/+sNln3naaCzn+8jP7t1i8ys/TfpVET8MrA2M/9l9fqzwD/KzN9oOuZW4Nbq5YeA59t81E8Ar9Vc7umyps5Y08J6rR6wpoW8lplrWxt77Qo42rQd9y9EZm4ENp7yQyK2Z+bkYhZ2tqypM9a0sF6rB6zpTPVaB+UcsLLp9QpgT6FaJKlWvRbA08BERFweEe8F1gFbCtckSbXoqS6IzDwUEb8B/DmNYWj3Z+bOM/ioU3ZRFGJNnbGmhfVaPWBNZ6SnvoSTpGHSa10QkjQ0DGBJKmSgAjgi7o+IfRHxdOlaACJiZUQ8GhHPRsTOiPhcD9Q0GhFPRMRMVdPvlq7piIgYiYi/johvla4FICJ+FBE/jIinImJ76XoAImIsIr4WEc9Vf69+unA9H6r++xz59UZE3FGypqquz1d/v5+OiAciovcWgmDA+oAj4hPAW8AfZuaHe6CecWA8M3dExFLgSeDGxZxafQY1BXB+Zr4VEe8BHgc+l5n/u1RNR0TEF4BJ4ILM/HQP1PMjYDIze2UwPxGxCfhuZt5XjRR6X2buL1wWcHQpgb+hMXlqd8E6LqXx9/rKzHw7IjYDf5aZXy5V08kM1BVwZv4V8OPSdRyRmXszc0f1/E3gWeDSwjVlZr5VvXxP9av4v8IRsQL4JeC+0rX0qoi4APgE8CWAzPy7XgnfynXA/ykZvk3OBc6LiHOB99Gj8wkGKoB7WURcBnwU+H7hUo7c6j8F7AO2ZmbxmoB7gd8G3i1cR7MEvh0RT1ZT4Eu7ApgH/qDqqrkvIs5f6KQuWgc8ULqIzPwb4IvAS8Be4P9m5rfLVtWeAdwFEfF+4OvAHZn5Rul6MvNwZq6mMdPwmogo2l0TEZ8G9mXmkyXraOPazPwp4BeA26surpLOBX4K+P3M/Cjwt8CdZUtqqLpDPgP8cQ/UsozGIl6XA5cA50fEr5etqj0DuGZVP+vXga9k5jdK19Osun19DDhhkZAuuxb4TNXn+iDwqYj4o7IlQWbuqR73AQ/RWK2vpDlgrumO5Ws0ArkX/AKwIzNfLV0I8LPArsycz8x3gG8AHy9cU1sGcI2qL7y+BDybmfeUrgcgIpZHxFj1/Dwaf1mfK1lTZt6VmSsy8zIat7F/kZlFr1gi4vzqi1Oq2/yfB4qOrsnMV4CXI+JDVdN1LOJSrWfp1+iB7ofKS8DHIuJ91c/gdTS+f+k5AxXAEfEA8D3gQxExFxG3FC7pWuCzNK7ojgzT+cXCNY0Dj0bED2isvbE1M3ti2FePuRh4PCJmgCeAP83MRwrXBPCbwFeq/3+rgf9QthyIiPfR2EShJ+7wqjuErwE7gB/SyLmenJY8UMPQJKmfDNQVsCT1EwNYkgoxgCWpEANYkgoxgCWpEANYkgoxgCWpkP8PxbwSTFWF+boAAAAASUVORK5CYII=", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "s1 = np.random.normal(5, 1, 10000)\n", - "sns.displot(s1)" - ] - }, - { - "cell_type": "code", - "execution_count": 90, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 90, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAD4CAYAAADSIzzWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAN7ElEQVR4nO3df0zc933H8dfbwBI7XtTViVAG0W7VSammyWob1P2IVFmLvQBpuv25SZvJDwUJJkyzSMsmoTiR0P5arCX8MSlKt4FWZdrSTko2sBprnbZK2zrI0jkDB3+XXD1c16GXrCkGYg6/9wd3J+44jDHcve/M8yGhcF9/+X7fxnyefO97oJi7CwBQe/uiBwCAvYoAA0AQAgwAQQgwAAQhwAAQpHk7O991112eSqWqNAoA3JqmpqZ+5O53l2/fVoBTqZQmJyd3byoA2APM7PuVtnMLAgCCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIMi2/p9wwM0YGRlRkiQ1PefFixclSW1tbTU7Zzqd1sDAQM3Oh8ZHgFF1SZLo7XdmtHrg0zU7Z9PijyVJP/ykNl/iTYsf1uQ8uLUQYNTE6oFPa+mz3TU73/5z45JUs3MWzgdsB/eAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBrmBkZEQjIyPRYwB71l5Zg83RA9SjJEmiRwD2tL2yBrkCBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAgtQkwNlsVidOnFA2m92wLUkS9fX1qb+/X9lstuK+Wx2rIEkSPfzww0qSpGT/vr4+9fb2qq+vT5OTk+ru7taTTz6pJEnU39+vxx9/XEePHtWRI0d07Ngxvfvuu5qdnd10BgDVtbKyotnZWfX29hbfHnvsMXV2dqqzs1NTU1N69NFHdeTIET300EM6fvy4urq6imu/sO4LXSlX3pH1j7PZrPr7+4vNSJLkuk3aiZoEeHR0VGfPntXY2NiGbcPDw5qZmdH09LTGxsYq7rvVsQqGh4d15coVDQ8Pl+w/MzOj2dlZzczM6LnnntPi4qLOnz+v4eFhTU9P67333lMul5O09g+/vLyspaWlTWcAUF2XL1/W0tKSZmdni2/vv/++lpeXtby8rJMnTyqTyUiSPvnkE124cEFLS0vFtV9Y94WulCvvyPrHo6Ojmp6eLjZjeHj4uk3aiaoHOJvN6vTp03J3nT59uvgdprCt8EmUpPHxcU1MTJTsu9WxCpIkKR4rk8koSRJls1lNTEyUHGNhYaH4/vpzVzIxMcFVMFBjhUZcz/p1vF4mk9HU1FTJui9fx+UdSZKk+HhiYmJDMzKZzKZN2qnmXT1aBaOjo7p27ZokaXV1VWNjY3L34rb1VlZWZGYl+z711FPXPVbhz9df9RYeHz58uHhlezOuXr2q3t5etbe33/QxsPbNcd9Vjx6jqvYtf6wk+YkGBwejR2l4c3NzO/r4kydPlqz7lZWVklaUd2R4eLj4eGVlRe6Vv1YrNWmntrwCNrNeM5s0s8n5+fltn+DMmTPFT0Yul9Obb75Zsq1c4S9f2HerYxWUX81mMhmdOXNm00/mjfroo4929PEAtmena25hYaFk3bt7SSvKO5LJZIqPr9eLSk3aqS2vgN39ZUkvS1JHR8e2a3b06FGNj48rl8upublZx44dk7sXt5UzM7l7cd+tjlWQSqVKIpxKpXT48GG98cYbNx1hM9Mjjzyyq9/x9qLBwUFNvXc5eoyqunb7nUp/plUvvvhi9CgN79SpU3r99ddv+uMPHjyoK1euFNe9mZW0orwj7e3tmpubUy6XK/ankkpN2qmq3wPu6enRvn1rp2lqatLx48dLtq3X0tKi5ubmkn23OlbB0NBQyb5DQ0Pq6ekpHu9mtLS0bJgBQHX19PTs6OOff/75knVfvo7LOzI0NFR83NLSopaWlorHrdSknap6gA8dOqTOzk6ZmTo7O3Xo0KGSbalUqrhvd3e3urq6Svbd6lgF6XS6eKxUKqV0Oq1Dhw6pq6ur5BgHDx4svr/+3JV0dXVtmAFAdRUacT3r1/F6qVRK999/f8m6L1/H5R1Jp9PFx11dXRuakUqlNm3STlX9RThp7TtOJpPZ8F0ok8noxIkTeuGFF2RmxT8v33erYxUMDQ1pcHCw5Gq4p6dH58+f1+rqqpqamvTEE0/o2WefVVtbm5555hmdOnVKy8vLunDhgnK5nFpaWtTU1FQyD4Daam1t1eLiou69997itpWVFV26dEnS2lXuyMiIMpmMbrvtNrW2tmp+fr649gvrfrN1XN6R8sdJkiiXy6mpqUlPP/20Xnrppar0wLZzf7Sjo8MnJyd3fYh6U3glm/t5u6NwD3jps901O+f+c+OSVLNz7j83rvu5B7xrbrU1aGZT7t5Rvp1fRQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAII0Rw9Qj9LpdPQIwJ62V9YgAa5gYGAgegRgT9sra5BbEAAQhAADQBACDABBCDAABCHAABCEAANAEAIMAEEIMAAEIcAAEIQAA0AQAgwAQQgwAAQhwAAQhAADQBACDABBCDAABCHAABCEAANAEAIMAEEIMAAEIcAAEIQAA0AQAgwAQQgwAAQhwAAQhAADQBACDABBCDAABCHAABCEAANAEAIMAEEIMAAEIcAAEIQAA0AQAgwAQQgwAAQhwAAQhAADQBACDABBCDAABGmOHgB7Q9Pih9p/bryG58tKUs3O2bT4oaTWmpwLtw4CjKpLp9M1P+fFizlJUltbraLYGvL3RGMjwKi6gYGB6BGAusQ9YAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCEGAACEKAASAIAQaAIAQYAIIQYAAIQoABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCAEGACCmLvf+M5m85K+X+GP7pL0o90aqkoaYUapMeZshBmlxpizEWaUGmPOep7x59z97vKN2wrwZsxs0t07dnygKmqEGaXGmLMRZpQaY85GmFFqjDkbYcZy3IIAgCAEGACC7FaAX96l41RTI8woNcacjTCj1BhzNsKMUmPM2QgzltiVe8AAgO3jFgQABCHAABBkRwE2sz83sw/M7J3dGmi3mdm9ZvZtM5sxs/82s8HomcqZ2e1m9l0z+15+xuejZ9qMmTWZ2X+a2d9Hz7IZM8uY2Vkze9vMJqPn2YyZfcrMXjOzc/mvz1+Jnmk9M7sv/zksvH1sZl+NnqsSM3sqv3beMbNXzez26JluxI7uAZvZlyQtSBpz91/ctal2kZndI+ked3/LzH5a0pSk33T36eDRiszMJN3h7gtm1iLpO5IG3f3fgkfbwMx+X1KHpDvd/cvR81RiZhlJHe5erz+UL0kys1FJ/+Lur5jZT0k64O7/FzxWRWbWJOmipF9y90q/jBXGzNq0tmZ+wd2XzOxvJI27+1/GTra1HV0Bu/s/S/pwl2apCne/5O5v5d//iaQZSW2xU5XyNQv5hy35t7p7ddTM2iU9LOmV6FkanZndKelLkr4mSe5+tV7jm/egpP+pt/iu0yxpv5k1Szog6QfB89yQPXUP2MxSkj4v6d+DR9kg/9T+bUkfSHrT3etuRkl/KukPJF0LnmMrLulbZjZlZr3Rw2ziM5LmJf1F/pbOK2Z2R/RQ1/Fbkl6NHqISd78o6U8kXZB0SdKP3f1bsVPdmD0TYDM7KOkbkr7q7h9Hz1PO3Vfd/XOS2iV90czq6paOmX1Z0gfuPhU9yw14wN2/IKlL0u/lb5XVm2ZJX5D0Z+7+eUlXJP1h7EiV5W+PfEXS30bPUomZ/Yyk35D085J+VtIdZvY7sVPdmD0R4Px91W9I+rq7fzN6nuvJPw39J0mdsZNs8ICkr+Tvr/61pF8zs7+KHakyd/9B/r8fSPo7SV+MnaiiOUlz657pvKa1INejLklvufvl6EE2cVTS++4+7+4rkr4p6VeDZ7oht3yA8y9wfU3SjLufip6nEjO728w+lX9/v9a+oM6FDlXG3f/I3dvdPaW1p6P/6O51d5VhZnfkX2xV/in9r0uqu5/ScfcfSvpfM7svv+lBSXXzwnCZ31ad3n7IuyDpl83sQH69P6i113rq3k5/DO1VSf8q6T4zmzOzJ3ZnrF31gKTf1doVW+HHabqjhypzj6Rvm9l/SfoPrd0Drtsf86pzrZK+Y2bfk/RdSf/g7qeDZ9rMgKSv5//dPyfpj2PH2cjMDkg6prWryrqUfxbxmqS3JJ3VWtca4teS+VVkAAhyy9+CAIB6RYABIAgBBoAgBBgAghBgAAhCgAEgCAEGgCD/Dzz21aUFqjuaAAAAAElFTkSuQmCC", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# for boxplots see https://en.wikipedia.org/wiki/Interquartile_range (or ask!)\n", - "sns.boxplot(x=s1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Heavy-tailed\n", - "Distributions with a small but non-negligible amount of observations with high values. Several probability distributions follow this pattern: https://en.wikipedia.org/wiki/Heavy-tailed_distribution#Common_heavy-tailed_distributions.\n", - "\n", - "We pick the lognormal here: https://en.wikipedia.org/wiki/Log-normal_distribution" - ] - }, - { - "cell_type": "code", - "execution_count": 91, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 91, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAFgCAYAAACFYaNMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAaQElEQVR4nO3df7Bfd13n8ecrvwtIaTXtxKYOZTbD2jojYOzyw3FcqtuIjuk6VsNYyGJrXLe6gDs67fqHwx+ZYR3HqctukUipQZFORNhGZi3WAHZ2pQ0RUGlLttFqGhubwE5bFvO9uTf3vX98zw3fpjfJTXLP/dzc7/Mx851zzuf7Oee8PyR9cXK+50eqCknSwlvWugBJGlcGsCQ1YgBLUiMGsCQ1YgBLUiMrWhdwITZt2lQPPPBA6zIk6WwyW+NFfQT81a9+tXUJknTeLuoAlqSLmQEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY2MbQAPBgMGg0HrMiSNsbENYElqzQCWpEYMYElqxACWpEYMYElqZCwDuKoYDAZUVetSJI2xsQzgiYkJbrn700xMTLQuRdIYG8sABli+clXrEiSNubENYElqzQCWpEZ6DeAk707yaJIvJ/lokjVJLk/yYJInuullI/3vTHIgyf4kN/ZZmyS11lsAJ7kK+I/Axqr6LmA5sAW4A9hTVRuAPd0ySa7tvr8O2ATcnWR5X/VJUmt9n4JYAVySZAXwEuBpYDOws/t+J3BTN78ZuK+qJqrqSeAAcH3P9UlSM70FcFX9I/AbwEHgMPBcVf0pcGVVHe76HAau6Fa5CnhqZBOHurYXSLItyb4k+44ePdpX+ZLUuz5PQVzG8Kj2GuDbgZcmueVMq8zS9qI7JapqR1VtrKqNa9eunZ9iJamBPk9B/CDwZFUdrapJ4OPAG4FnkqwD6KZHuv6HgKtH1l/P8JSFJC1JfQbwQeD1SV6SJMANwOPAbmBr12crcH83vxvYkmR1kmuADcDeHuuTpKZW9LXhqnokyceALwBTwBeBHcDLgF1JbmUY0jd3/R9Nsgt4rOt/e1Wd6Ks+SWqttwAGqKpfA37tlOYJhkfDs/XfDmzvsyZJWiy8E06SGjGAJakRA1iSGjGAJamRsQ1g34ohqbWxDeDpqUluu/dh34ohqZmxDWCAZb4VQ1JDYx3AktTS2AXwzLlfSWpt7AJ45o3I09PTrUuRNObGLoDBNyJLWhzGMoAlaTEwgCWpEQNYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgCWpkd4COMmrk3xp5PN8kncluTzJg0me6KaXjaxzZ5IDSfYnubGv2iRpMegtgKtqf1W9pqpeA3wP8M/AJ4A7gD1VtQHY0y2T5FpgC3AdsAm4O8nyvuqTpNYW6hTEDcDfVtU/AJuBnV37TuCmbn4zcF9VTVTVk8AB4PoFqk+SFtxCBfAW4KPd/JVVdRigm17RtV8FPDWyzqGu7QWSbEuyL8m+o0ePXlBRJyaP+4ZkSc30HsBJVgE/Bvzh2brO0lYvaqjaUVUbq2rj2rVr56NESWpiIY6Afxj4QlU90y0/k2QdQDc90rUfAq4eWW898PQC1CdJTSxEAL+Vb55+ANgNbO3mtwL3j7RvSbI6yTXABmDvAtQnSU2s6HPjSV4C/BDwcyPN7wV2JbkVOAjcDFBVjybZBTwGTAG3V9WJPuuTpJZ6DeCq+mfgW09p+xrDqyJm678d2N5nTZK0WHgnnCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiO9BnCSVyT5WJKvJHk8yRuSXJ7kwSRPdNPLRvrfmeRAkv1JbuyzNklqre8j4N8CHqiqfwl8N/A4cAewp6o2AHu6ZZJcC2wBrgM2AXcnWd5zfZLUTG8BnOTlwPcD9wBU1fGqehbYDOzsuu0EburmNwP3VdVEVT0JHACu76u+riYGgwFV1eduJGlWfR4Bvwo4Ctyb5ItJPpjkpcCVVXUYoJte0fW/CnhqZP1DXdsLJNmWZF+SfUePHr2gAqenJrnt3oeZmJi4oO1I0vnoM4BXAK8D3l9VrwW+QXe64TQyS9uLDk2rakdVbayqjWvXrr3gIpetXHXB25Ck89FnAB8CDlXVI93yxxgG8jNJ1gF00yMj/a8eWX898HSP9UlSU70FcFX9E/BUkld3TTcAjwG7ga1d21bg/m5+N7Alyeok1wAbgL191SdJra3oefu/CHwkySrg74B3MAz9XUluBQ4CNwNU1aNJdjEM6Sng9qo60XN9ktRMrwFcVV8CNs7y1Q2n6b8d2N5nTZK0WHgnnCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiMGsCQ1YgBLUiNjH8BVxWAwoKpalyJpzIx9AE9PTfL2DzzExMRE61IkjZmxD2CAZStXtS5B0hjqNYCT/H2Sv0nypST7urbLkzyY5IluetlI/zuTHEiyP8mNfdYmSa0txBHwv66q11TVxm75DmBPVW0A9nTLJLkW2AJcB2wC7k6yfAHqk6QmWpyC2Azs7OZ3AjeNtN9XVRNV9SRwALh+4cuTpIXRdwAX8KdJ/jLJtq7tyqo6DNBNr+jarwKeGln3UNf2Akm2JdmXZN/Ro0d7LF2S+rWi5+2/qaqeTnIF8GCSr5yhb2Zpe9G1YVW1A9gBsHHjRq8dk3TR6vUIuKqe7qZHgE8wPKXwTJJ1AN30SNf9EHD1yOrrgaf7rE+SWuotgJO8NMm3zMwD/wb4MrAb2Np12wrc383vBrYkWZ3kGmADsLev+iSptT5PQVwJfCLJzH7+oKoeSPJ5YFeSW4GDwM0AVfVokl3AY8AUcHtVneixPklqqrcArqq/A757lvavATecZp3twPa+apKkxcQ74SSpEQNYkhoxgIETk8cZDAaty5A0ZgxgSWpkTgGc5E1zaZMkzd1cj4DfN8c2SdIcnfEytCRvAN4IrE3ySyNfvRzwSWWSdAHOdh3wKuBlXb9vGWl/HviJvoqSpHFwxgCuqj8H/jzJ71bVPyxQTZI0FuZ6J9zqJDuAV46uU1Vv7qMoSRoHcw3gPwR+G/gg4PMZJGkezDWAp6rq/b1WIkljZq6Xof1xkv+QZF33Us3Lk1zea2WStMTN9Qh45vm9vzzSVsCr5rccSRofcwrgqrqm70IkadzMKYCTvH229qr68PyWI0njY66nIL53ZH4NwweqfwEwgCXpPM31FMQvji4nuRT4vV4qkqQxcb6Po/xnhi/NlCSdp7meA/5jhlc9wPAhPN8J7OqrKEkaB3M9B/wbI/NTwD9U1aEe6pGksTGnUxDdQ3m+wvCJaJcBx/ssSpLGwVzfiPGTwF7gZuAngUeS+DhKSboAcz0F8avA91bVEYAka4E/Az7WV2GStNTN9SqIZTPh2/naOawrSZrFXI+AH0jyKeCj3fJPAf+zn5IkaTyc7Z1w/wK4sqp+OcmPA98HBPgc8JEFqE+SlqyznUa4C/g6QFV9vKp+qarezfDo965+S5Okpe1sAfzKqvrrUxurah/D1xOdVZLlSb6Y5JPd8uVJHkzyRDe9bKTvnUkOJNmf5MZzGMcFqSoGgwFVdfbOkjRPzhbAa87w3SVz3Mc7gcdHlu8A9lTVBmBPt0ySa4EtwHXAJuDuJMvnuI8LMj01yW33PszExMRC7E6SgLMH8OeT/OypjUluBf7ybBtPsh74EYbvkpuxGdjZze8Ebhppv6+qJqrqSeAAcP3Z9jFflq1ctVC7kiTg7FdBvAv4RJKf5puBuxFYBfzbOWz/LuBXGN5BN+PKqjoMUFWHk1zRtV8FPDzS71DX9gJJtgHbAL7jO75jDiVI0uJ0xiPgqnqmqt4IvAf4++7znqp6Q1X905nWTfKjwJGqOuuR8swqs5UwS007qmpjVW1cu3btHDctSYvPXJ8H/BngM+e47TcBP5bkLQzPJb88ye8DzyRZ1x39rgNmbvA4BFw9sv564Olz3KckXTR6u5utqu6sqvVV9UqGP659uqpuAXbzzZd8bgXu7+Z3A1uSrE5yDcPnDe/tqz5Jam2ud8LNp/cCu7of8g4yfMAPVfVokl3AYwwfeXl7VZ1oUJ8kLYgFCeCq+izw2W7+awzfKTdbv+3A9oWoSZJa84E6ktSIASxJjRjAktSIASxJjRjAktSIAdw5MXmcwWDQugxJY8QAlqRGDGBJasQAlqRGDGBJasQAlqRGDGBJasQAlqRGDGBJamTsAngwGDA9Pd26DEkavwCWpMXCAJakRgxgSWrEAJakRgxgSWrEAJakRgxgSWrEAJakRgxgSWrEAJakRgxgSWrEAJakRnoL4CRrkuxN8ldJHk3ynq798iQPJnmim142ss6dSQ4k2Z/kxr5qm01VMRgMqKqF3K2kMdbnEfAE8Oaq+m7gNcCmJK8H7gD2VNUGYE+3TJJrgS3AdcAm4O4ky3us7wWmpya57d6HmZiYWKhdShpzvQVwDf2/bnFl9ylgM7Cza98J3NTNbwbuq6qJqnoSOABc31d9s1m2ctVC7k7SmOv1HHCS5Um+BBwBHqyqR4Arq+owQDe9out+FfDUyOqHurZTt7ktyb4k+44ePdpn+ZLUq14DuKpOVNVrgPXA9Um+6wzdM9smZtnmjqraWFUb165dO0+VStLCW5CrIKrqWeCzDM/tPpNkHUA3PdJ1OwRcPbLaeuDphahPklro8yqItUle0c1fAvwg8BVgN7C167YVuL+b3w1sSbI6yTXABmBvX/VJUmsretz2OmBndyXDMmBXVX0yyeeAXUluBQ4CNwNU1aNJdgGPAVPA7VV1osf6JKmp3gK4qv4aeO0s7V8DbjjNOtuB7X3VJEmLiXfCSVIjBrAkNWIAS1IjBvAInwchaSEZwCOmpyZ5+wce8nkQkhaEAXwKnwchaaEYwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAEsSY0YwJLUiAF8iqnjEzz77LO+F05S7wzgU0xPTXLbvQ/7XjhJvTOAZ1HAYDBoXYakJc4AlqRGDGBJaqS3AE5ydZLPJHk8yaNJ3tm1X57kwSRPdNPLRta5M8mBJPuT3NhXbZK0GPR5BDwF/Keq+k7g9cDtSa4F7gD2VNUGYE+3TPfdFuA6YBNwd5LlPdYnSU31FsBVdbiqvtDNfx14HLgK2Azs7LrtBG7q5jcD91XVRFU9CRwAru+rPklqbUHOASd5JfBa4BHgyqo6DMOQBq7oul0FPDWy2qGu7dRtbUuyL8m+o0eP9lq3JPWp9wBO8jLgj4B3VdXzZ+o6S9uL7oaoqh1VtbGqNq5du3a+ypSkBddrACdZyTB8P1JVH++an0myrvt+HXCkaz8EXD2y+nrg6T7rk6SW+rwKIsA9wONV9ZsjX+0GtnbzW4H7R9q3JFmd5BpgA7C3r/okqbUVPW77TcDbgL9J8qWu7T8D7wV2JbkVOAjcDFBVjybZBTzG8AqK26vqRI/1SVJTvQVwVf0vZj+vC3DDadbZDmzvq6a5qioGgwFVxfBAXpLmn3fCzcIH8khaCAbwaSxbuap1CZKWOANYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgCWpEQNYkhoxgE9j9HkQktQHA/g0pqcmefsHHvJ5EJJ6YwCfgc+DkNQnA1iSGjGAJakRA1iSGjGAT+PE5HGmp6dblyFpCTOAJakRA1iSGjGAJakRA/gMvBtOUp8M4DPwbjhJfTKAz8K74ST1xQCWpEYMYElqxAA+ixOTxxkMBq3LkLQE9RbAST6U5EiSL4+0XZ7kwSRPdNPLRr67M8mBJPuT3NhXXedjMBgYwpLmXZ9HwL8LbDql7Q5gT1VtAPZ0yyS5FtgCXNetc3eS5T3WNmdeiiapL70FcFU9BPzfU5o3Azu7+Z3ATSPt91XVRFU9CRwAru+rtnMxPTXJz394r5eiSZp3C30O+MqqOgzQTa/o2q8Cnhrpd6hre5Ek25LsS7Lv6NGjvRZ7cp8rVnoULGneLZYf4TJL26xpV1U7qmpjVW1cu3Ztz2UNTU9Nctu9D3sULGleLXQAP5NkHUA3PdK1HwKuHum3Hnh6gWs7I2/IkDTfFjqAdwNbu/mtwP0j7VuSrE5yDbAB2LvAtUnSglrR14aTfBT4AeDbkhwCfg14L7Arya3AQeBmgKp6NMku4DFgCri9qk70VZskLQa9BXBVvfU0X91wmv7bge191SNJi81i+RFOksaOASxJjRjAc+QdcZLmmwE8Rz6cXdJ8M4DPgXfESZpPBvA58ChY0nwygM/BicnjsLy3K/ckjRkD+Bz5Y5yk+WIAnyNPQ0iaLwbwefDBPJLmgwF8jk5MHmd6erp1GZKWAH9ROg8nJo9z7NgxAFavXk0y2+OMJenMPAI+TxMTE/zU+/7Mc8GSzpsBfB5mroRY7rlgSRfAAD4PMy/q9FywpAthAJ8nr4SQdKEM4AvgTRmSLoQBfAFmbsoYDAYGsaRzZgBfgJlnQ3hFhKTzYQDPE6+IkHSuDOB5cmLyOIPBoHUZki4iBvAFGv0h7tlnnz15h5wknY0BfIEmj32Dd+x4iKnjE/z8h/cyGAw4duwYx44d80c5SWdkAM+DmWuCs2Ilzz33HD/+6/dz812f8kc5SWdkAM+jmTvksnwFy1auOnl5miTNxgCeZzNHwzNPTBsMBkxPT3udsKQXMYB7UlU8//zzvO23/5yDBw9y812f4rnnnjOEJZ1kAPdkemqSX/z9zzNdxb+/9y+YPD7BT9/9mZNXSkxPT/tjnTTmFt0D2ZNsAn4LWA58sKre27ik8zZzOmLZylXDJ6clvPWuP2H56ku452fewG0f+gumT5zgv9+ykUsvvZQ1a9aQhNWrV3P8+HGqiiSsWbPmRdsevfxtpo8PhpcuLosqgJMsB/478EPAIeDzSXZX1WNtK5s/y1au4sTUJO/Y8RBZtozpKn7uQ/+b5StXUSemTobztg9/ng+8bSNVxZo1a7jkkktYvXo1ExMTHDt2jImJCX723s9R09NUFffc9n1ceumlACdDfGJi4kVH16NhXVVMTEywevVqgJPzo0E+02fVqlUnt9dX4I/WM7Pt0f0fP3785Hez9T3fbc+2jXPd/oVYyH3p/PT1Z7SoAhi4HjhQVX8HkOQ+YDMwrwE8PXl8+JmeJkCdmHrRPHBe38213/KVq07WMGry2Dd4+917WJZwy/seAGD5ilUsW7aM33rr63jnR7/AianheqvWvPTk9m553wMsX7GKmp4iy1aw42fexM/d8xBTJ068oH3l6jV85PYbWLNmDYPBgLe9/9P8zjveCMDP3vsX/N7Pv/lkLaf2+Zkdn2V6uli+ctXJbcyYueJjzZo1sx6xz8XMvkZrAE7uf6a+0bpmlk+3vdH50fVH9/c773jji+qerZbZ9jOzj/Md86n7upDtLJT5GPPFZubP6A/f/ZZ5HXcW0/nHJD8BbKqq27rltwH/qqp+YaTPNmBbt/hqYP957OrbgK9eYLkXi3EaK4zXeB3rxeOrVbXp1MbFdgQ827H9C/4foqp2ADsuaCfJvqraeCHbuFiM01hhvMbrWC9+i+0qiEPA1SPL64GnG9UiSb1abAH8eWBDkmuSrAK2ALsb1yRJvVhUpyCqairJLwCfYngZ2oeq6tEednVBpzAuMuM0Vhiv8TrWi9yi+hFOksbJYjsFIUljwwCWpEbGKoCTbEqyP8mBJHe0rud8JLk6yWeSPJ7k0STv7NovT/Jgkie66WUj69zZjXl/khtH2r8nyd903/3XLNLbsJIsT/LFJJ/slpfyWF+R5GNJvtL9Gb9hqY43ybu7v8NfTvLRJGuW6lhPq6rG4sPwR72/BV4FrAL+Cri2dV3nMY51wOu6+W8B/g9wLfDrwB1d+x3Af+nmr+3Guhq4pvvfYHn33V7gDQyvv/4T4Idbj+80Y/4l4A+AT3bLS3msO4HbuvlVwCuW4niBq4AngUu65V3Av1uKYz3TZ5yOgE/e5lxVx4GZ25wvKlV1uKq+0M1/HXic4V/mzQz/46Wb3tTNbwbuq6qJqnoSOABcn2Qd8PKq+lwN/xZ/eGSdRSPJeuBHgA+ONC/Vsb4c+H7gHoCqOl5Vz7JEx8vwKqxLkqwAXsLwmv+lOtZZjVMAXwU8NbJ8qGu7aCV5JfBa4BHgyqo6DMOQBq7oup1u3Fd186e2LzZ3Ab8CjD40Y6mO9VXAUeDe7pTLB5O8lCU43qr6R+A3gIPAYeC5qvpTluBYz2ScAvistzlfTJK8DPgj4F1V9fyZus7SVmdoXzSS/ChwpKr+cq6rzNJ2UYy1swJ4HfD+qnot8A2G/ww/nYt2vN253c0MTyd8O/DSJLecaZVZ2i6KsZ7JOAXwkrnNOclKhuH7kar6eNf8TPfPMbrpka79dOM+1M2f2r6YvAn4sSR/z/CU0ZuT/D5Lc6wwrPNQVT3SLX+MYSAvxfH+IPBkVR2tqkng48AbWZpjPa1xCuAlcZtz9wvvPcDjVfWbI1/tBrZ281uB+0fatyRZneQaYAOwt/vn3deTvL7b5ttH1lkUqurOqlpfVa9k+Of16aq6hSU4VoCq+ifgqSSv7ppuYPgo1qU43oPA65O8pKvxBoa/ZyzFsZ5e618BF/IDvIXhVQN/C/xq63rOcwzfx/CfWH8NfKn7vAX4VmAP8EQ3vXxknV/txryfkV+IgY3Al7vv/hvdnZGL8QP8AN+8CmLJjhV4DbCv+/P9H8BlS3W8wHuAr3R1/h7DKxyW5FhP9/FWZElqZJxOQUjSomIAS1IjBrAkNWIAS1IjBrAkNWIAS1IjBrAkNfL/AVVe4kXAOE9LAAAAAElFTkSuQmCC", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "s2 = np.random.lognormal(5, 1, 10000)\n", - "sns.displot(s2)" - ] - }, - { - "cell_type": "code", - "execution_count": 92, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 92, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAD4CAYAAADSIzzWAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAANZklEQVR4nO3dXWxb5R3H8d8/cQotHSpNu4oFVIOMVnE1oEKwTVM1gtZk07pLLlCzi2l3oVsvpiIiRRW+2TRVo5k0CRVN7d7QxNCGUFOt2cbdBEs3urKmLKZNIVkDxdV4WVOapM8ufOycOHYaG9v/xv5+JKv2eXnOOQ/pN86JKyyEIABA47V5nwAAtCoCDABOCDAAOCHAAOCEAAOAk0QlG2/atCkkk8k6nQoANKcTJ068H0LYXLy8ogAnk0mNjo7W7qwAoAWY2flSy7kFAQBOCDAAOCHAAOCEAAOAEwIMAE4IMAA4IcAA4IQAA4ATAgwATggwADghwADghAADgBMCDABOCDAAOCHAAOCEAAOAEwIMAE4IMAA4IcAA4KSi/yfcpzU0NKRMJiNJmpqakiR1dXUV1qdSKfX39zfylADATUMDnMlk9PobY5pft1Htlz+QJE1/kjuF9suXGnkqAOCu4bcg5tdt1My2Xs2v69T8uk7NbOuNXm9s9KkAgCvuAQOAEwIMAE4IMAA4IcAA4IQAA4ATAgwATggwADghwADghAADgBMCDABOCDAAOCHAAOCEAAOAEwIMAE4IMAA4IcAA4IQAA4ATAgwATggwADghwADghAADgBMCDABOCDAAOCHAAOCEAAOAEwIMAE4IMAA4IcAA4IQAA4ATAgwATggwADghwADghAADgBMCDABOCDAAOCHAAOCEAAOAEwIMAE4IMAA4IcAA4IQAA4CTRCMOMjQ0VJfx+vv7azouADRSQwKcyWRu6PEAwAO3IADACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcJ7xOoxsmTJyVJO3bscDl+IpGQmWl2dlYbN27UpUuXym67Zs0aSVJbW5s2bNig6enpwrqtW7eqra1N09PTGhoaUiqVkiRls1nt27dPk5OTSqfTOnTokMxMe/fu1cGDBzU4OKjOzk5ls1nt379fTzzxhA4cOKArV67owoULi8bKj5ffLr5/tfLjDQ4OStKS56WOE9+n3LGLxx0YGJCZ6emnn160TzabLbuukutdyTlVOy+1GM9DM1xDrdVzTngHXIW5uTnNzs5K0rLxlaSrV6/q6tWrunLlyqL4StL58+d17tw5zczMKJ1OF5YfPnxY4+PjmpmZ0eDgoMbGxnT69Gml02mdOnVKR44cKWx36tQppdNpnT59WmfPnl0yVvF28f2rlR/vyJEjJZ+XOk58u5WOm7/u4n2WW1fJ9a7knCpR6/E8NMM11Fo952TVBdjrXW+9TUxMKJPJKJvNanh4uLD8448/XrRNCEHHjh1TJpPRsWPHFELQxMREybGk3Hfv+Hb5/bPZbFXnGR9veHhYw8PDhefljhPfp9yxi8c9evRoYd3w8HBhn+L5ia+r5HpXck7VzkstxvPQDNdQa/Wek4YEeGpqSplMRplMRm1XPix9Ilc+VCaT0Z49e5Z9NLN0Oq3Dhw8X3l2XMz8/r3Q6rWvXri07lpT77l283fz8fNXfzePjzc7Oam5urvC8+Lzzx4nvU+7Y5cbNv46/6y+3rpLrXck5VaLW43lohmuotXrPyXUDbGbfNbNRMxu9ePFiTQ+OxSYmJjQyMnLd7ebm5jQxMbEoRKXGkqSRkZEl283Nzen48eNVnWN8vBCCQghLnhcfJ75PuWMXjxsXQijsMzIysmh9fF0l17uSc6pErcfz0AzXUGv1npPrBjiE8GwIYXsIYfvmzZurOkhXV5dSqZRSqZSu3XxryW2u3XyrUqmUnnnmmWUfzSyZTKq7u/u62yUSCSWTSSUS5X+HmkwmJUnd3d1LtkskEnr00UerOsf4eGYmM1vyvPg48X3KHbt43DgzK+zT3d29aH18XSXXu5JzqkStx/PQDNdQa/Wek1V3D7iZDQwMqK+vTx0dHctu197eroGBAbW1lf/PNzAwIEnq6+tbsl17e7t2795d1TnGx+vo6Ch8cXZ0dCw57/xx4vuUO3a5cfOv8/v09fWVXVfJ9a7knCpR6/E8NMM11Fq952TVBfiVV17xPoW6SCaTSqVS6uzsVE9PT2H5+vXrF21jZtq5c6dSqZR27twpMyu82y0eS5I6OzsXbZffv9qP08TH6+npUU9PT+F5uePE9yl37OJxe3t7C+t6enoK+xTPT3xdJde7knOqdl5qMZ6HZriGWqv3nKzKzwF7q8fngPPvWKXcd92xsTFNTk5q//79Sz4HHH83ODExseRzwPGxireL71+t/Hj5cYqflzpO8T4rGXd8fFxmVvLda7l1lVzvSs6pErUez0MzXEOt1XNOrPgXHsvZvn17GB0drfgg8U8vnDj7rma29WrtmdzHjGa25d7prD1zVA/cvWVF93nz4zX7PWEAzcHMToQQthcvX3W3IACgWRBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHCSaMRBUqmUJCmTydR0PABYzRoS4P7+fknSnj17ajoeAKxm3IIAACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcBJotEHbL98SWvPHFX75awkae2Zo4Xl0pZGnw4AuGlogFOpVOH51NScJKmrKx/dLYvWA0Cza2iA+/v7G3k4ALihcQ8YAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcEGACcEGAAcEKAAcAJAQYAJwQYAJwQYABwQoABwAkBBgAnBBgAnBBgAHBCgAHACQEGACcWQlj5xmYXJZ2v8libJL1f5b7NhHlYwFwsYC4WNONcbA0hbC5eWFGAPw0zGw0hbG/IwW5gzMMC5mIBc7GgleaCWxAA4IQAA4CTRgb42QYe60bGPCxgLhYwFwtaZi4adg8YALAYtyAAwAkBBgAndQ+wme00szfNLGNm++p9vEYzszvN7C9mNmZm/zKzPdHyjWZ23MzGoz9vi+3zZDQfb5rZ12LLHzCzU9G6g2ZmHtf0aZlZu5n9w8xejl635FyY2QYze8HMzkRfHw+34lyY2fejvxtvmNlvzOzmVpyHkkIIdXtIapf0lqS7Ja2RdFLSvfU8ZqMfkm6XdH/0/DOS/i3pXkk/krQvWr5P0g+j5/dG83CTpLui+WmP1r0m6WFJJmlYUo/39VU5J3sl/VrSy9HrlpwLSYclfSd6vkbShlabC0ldks5JWhu9/q2kb7faPJR71Psd8IOSMiGEsyGEq5Kel7SrzsdsqBDChRDC36PnH0kaU+6LbpdyfwEV/fmt6PkuSc+HED4JIZyTlJH0oJndLunWEMJfQ+6r7Uhsn1XDzO6Q9HVJh2KLW24uzOxWSV+R9JwkhRCuhhD+qxacC0kJSWvNLCFpnaT/qDXnYYl6B7hL0jux15PRsqZkZklJ90l6VdKWEMIFKRdpSZ+NNis3J13R8+Llq81PJP1A0rXYslaci7slXZT08+h2zCEzu0UtNhchhClJP5b0tqQLkj4IIfxRLTYP5dQ7wKXu0TTl597MbL2k30n6Xgjhw+U2LbEsLLN81TCzb0h6L4RwYqW7lFjWFHOh3Lu++yX9LIRwn6T/KfejdjlNORfRvd1dyt1O+JykW8zs8eV2KbFs1c9DOfUO8KSkO2Ov71Dux4+mYmYdysX3VyGEF6PF70Y/Nin6871oebk5mYyeFy9fTb4k6ZtmNqHc7aavmtkv1ZpzMSlpMoTwavT6BeWC3Gpz0S3pXAjhYghhVtKLkr6o1puHkuod4L9JusfM7jKzNZIek/RSnY/ZUNFvYp+TNBZCOBBb9ZKkvuh5n6Q/xJY/ZmY3mdldku6R9Fr0Y9hHZvZQNObu2D6rQgjhyRDCHSGEpHL/rf8cQnhcrTkX05LeMbPPR4sekXRarTcXb0t6yMzWRef/iHK/J2m1eSit3r/lk9Sr3CcD3pL0lPdvHetwfV9W7kehf0p6PXr0SuqU9CdJ49GfG2P7PBXNx5uK/SZX0nZJb0TrfqroXyquxoekHVr4FERLzoWkL0gajb42fi/ptlacC0n7JZ2JruEXyn3CoeXmodSDf4oMAE74l3AA4IQAA4ATAgwATggwADghwADghAADgBMCDABO/g+N/5VHP4nZywAAAABJRU5ErkJggg==", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "sns.boxplot(x=s2)" - ] - }, - { - "cell_type": "code", - "execution_count": 93, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 93, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWAAAAFgCAYAAACFYaNMAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAYcUlEQVR4nO3df5Dc9X3f8efbAg6MhcQPgcQJKqLKHvNjGh8KzZmOxzZxQhKPcTyBKNM4mg6tMq0S4dAmQPKHp3+oQ2c8Hlcdko4GHCuJA1awAcV1wBgbVGcwGNZKFgEeIwOyDlUShMjCKQcS7/5xX533VivdSdx3P7u3z8eMZvf72e/39i2QXvruZz8/IjORJHXfO0oXIEmDygCWpEIMYEkqxACWpEIMYEkq5KTSBbwdV199dd5///2ly5Ck6USnxr6+A3755ZdLlyBJJ6yvA1iS+pkBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVEhfL0cpnYjx8XEajcaUtpGREYaGhgpVpEFVawBHxAvAAeAQcDAzV0bEWcCXgGXAC8B1mflqdf4twPXV+esy84E669NgajQarLvtXhYMLwdg/9gONqyF0dHRwpVp0HTjDvhDmdm6cO/NwEOZeWtE3Fwd3xQRFwOrgEuA84FvRMS7M/NQF2rUgFkwvJxzll9WugwNuBJ9wNcAm6rnm4CPt7TflZnjmfk88BxwRffLk6TuqDuAE/h6RDwZEWuqtvMyczdA9Xhu1T4M/Kjl2l1V2xQRsSYinoiIJ/bt21dj6ZJUr7q7IK7MzJci4lzgwYh49hjndtozKY9oyNwIbARYuXLlEa9LUr+o9Q44M1+qHvcC9zDRpbAnIpYAVI97q9N3ARe0XL4UeKnO+iSppNoCOCJOj4j5h58Dvwg8BWwBVlenrQbuq55vAVZFxFBEXASsAB6vqz5JKq3OLojzgHsi4vD7/FVm3h8R3wU2R8T1wE7gWoDM3B4Rm4GngYPAWkdAaDa0j/ttNpuknVfqAbUFcGb+EPhXHdpfAa46yjXrgfV11aS5r9Mki2azycZHdrBw6cS437FtW1m44vIS5UlTOBNOc0r7JAv4aeAeHve7f2xHqfKkKQxg9bVO3QtnnD91ksV0gfvWwTdpNptHtDs9WXUzgNXX2u94T6R74cCenWx48XUWt+S005PVDQaw+l7rtOIT7V6Yv3iZU5PVdS5HKUmFGMCSVIhdEOorjunVXGIAq6/MxpduUq8wgNV3ZuNLN6kX2AcsSYV4Byx10GlyhhMzNNsMYKmD9skZTsxQHQxg6SicnKG62QcsSYUYwJJUiAEsSYUYwJJUiAEsSYUYwJJUiAEsSYUYwJJUiAEsSYUYwJJUiAEsSYUYwJJUiAEsSYUYwJJUiAEsSYW4HrA0A+6QoToYwNIMuEOG6mAASzPkDhmabQawetb4+DiNRmNKW7PZJLNQQdIsM4DVsxqNButuu5cFw8sn28a2bWXhissLViXNHgNYPW3B8PIpH/v3j+0oWI00uxyGJkmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVIgBLEmFGMCSVEjtARwR8yLiexHx1er4rIh4MCJ+UD2e2XLuLRHxXER8PyJ+qe7aJKmkbuwJdwPwDHBGdXwz8FBm3hoRN1fHN0XExcAq4BLgfOAbEfHuzDzUhRrVA9p3QXYHZM11tQZwRCwFfhVYD9xYNV8DfLB6vgl4GLipar8rM8eB5yPiOeAK4NE6a1TvaN8F2R2QNdfVfQf8OeAPgfktbedl5m6AzNwdEedW7cPAd1rO21W1TRERa4A1ABdeeGENJauk1l2Q3QFZc11tfcAR8VFgb2Y+OdNLOrQd8QE0Mzdm5srMXLlo0aK3VaMklVTnHfCVwMci4leAU4EzIuIvgT0RsaS6+10C7K3O3wVc0HL9UuClGuuTpKJquwPOzFsyc2lmLmPiy7VvZuZvAVuA1dVpq4H7qudbgFURMRQRFwErgMfrqk+SSuvGKIh2twKbI+J6YCdwLUBmbo+IzcDTwEFgrSMgJM1lXQngzHyYidEOZOYrwFVHOW89EyMmpJ721sE3aTabU9reeOMNAE455ZTJtpGREYaGhrpam/pHiTtgqe8d2LOTDS++zuKWgRpj27Yyb/7ZLF5+CTAximPDWhgdHS1UpXqdASydoPmLl00OmYOJwD1p4eIpbdKxuBaEJBViAEtSIXZBqBjXftCgM4BVjGs/aNAZwCrKtR80yOwDlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQtiaSavHXwTZrN5pS2kZERhoaGClWkXmMASzU5sGcnG158ncXVVnf7x3awYS2Mjo6WLUw9wwCWajR/8bLJTUeldvYBS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFWIAS1IhBrAkFeK29KrF+Pg4jUbjiPaRkRGGhoYKVCT1ntoCOCJOBbYCQ9X73J2Zn46Is4AvAcuAF4DrMvPV6ppbgOuBQ8C6zHygrvpUr0ajwbrb7mXB8PLJtv1jO9iwFkZHRwtWJvWOOu+Ax4EPZ+ZrEXEy8O2I+FvgE8BDmXlrRNwM3AzcFBEXA6uAS4DzgW9ExLsz81CNNapGC4aXc87yy0qXIfWs2vqAc8Jr1eHJ1a8ErgE2Ve2bgI9Xz68B7srM8cx8HngOuKKu+iSptFq/hIuIeRGxDdgLPJiZjwHnZeZugOrx3Or0YeBHLZfvqtraf+aaiHgiIp7Yt29fneVLUq1qDeDMPJSZPwssBa6IiEuPcXp0+hEdfubGzFyZmSsXLVo0S5VKUvd1ZRhaZv4T8DBwNbAnIpYAVI97q9N2ARe0XLYUeKkb9UlSCbUFcEQsioiF1fPTgF8AngW2AKur01YD91XPtwCrImIoIi4CVgCP11WfJJVW5yiIJcCmiJjHRNBvzsyvRsSjwOaIuB7YCVwLkJnbI2Iz8DRwEFjrCIi55a2Db9JsNiePm80meUQn09zV/vsHx0UPutoCODP/AXhfh/ZXgKuOcs16YH1dNamsA3t2suHF11m8Y+J4bNtWFq64vGxRXdT++3dctJwJp66av3jZ5Njg/WM7ClfTfa2/f8kAlgrp1CUBdksMEgNYKqS9SwLslhg0MwrgiLgyM/9uujZJx8cuicE202Fo/3OGbZKkGTrmHXBEjALvBxZFxI0tL50BzKuzMEma66brgjgFeFd13vyW9h8Dv15XUZI0CI4ZwJn5CPBIRHwhM1/sUk2SNBBmOgpiKCI2MrGI+uQ1mfnhOoqSpEEw0wD+a+B/AbczsVuFJOltmmkAH8zMP621EkkaMDMdhvY3EfGfImJJRJx1+FetlUnSHDfTO+DDy0f+QUtbAj8zu+VI0uCYUQBn5kV1FyJJg2amU5F/u1N7Zv757JYjSYNjpl0QP9fy/FQm1vNtAAawJJ2gmXZB/F7rcUQsAP6ilorUl8bHx2k0GpPHg7bbhXQiTnQ5yn9mYs82CYBGo8G62+5lwfByYPB2u5BOxEz7gP+Gn24RPw94L7C5rqLUnxYMLx/o3S6k4zXTO+DPtDw/CLyYmbtqqEeSBsaMJmJUi/I8y8SKaGcCb9RZlCQNghkFcERcBzzOxBby1wGPRYTLUUrS2zDTLog/Bn4uM/cCRMQi4BvA3XUVJklz3UzXgnjH4fCtvHIc10qSOpjpHfD9EfEAcGd1/BvA1+opSZIGw3R7wv1L4LzM/IOI+ATwb4AAHgW+2IX6JGnOmq4b4XPAAYDM/Epm3piZv8/E3e/n6i1Nkua26QJ4WWb+Q3tjZj7BxPZEkqQTNF0An3qM106bzUIkadBMF8DfjYj/0N4YEdcDT9ZTkiQNhulGQXwKuCci/i0/DdyVwCnAr9VYlyTNeccM4MzcA7w/Ij4EXFo1/+/M/GbtlUnSHDfT9YC/BXyr5lokaaCc6HrAkmrw1sE3aTabU9pGRkYYGhoqVJHqZABLPeTAnp1sePF1FlfLKe8f28GGtTA6Olq2MNXCAJZ6zPzFyyYXttfc5oI6klSIASxJhRjAklSIASxJhfglnE7I+Pg4jUZj8rjZbJJ5jAskHcEA1glpNBqsu+1eFgwvB2Bs21YWrri8cFVSfzGAdcIWDC+fHC61f2xH4Wqk/mMfsCQVYgBLUiEGsCQVYgBLUiEGsCQVYgBLUiEGsCQVYgBLUiEGsCQVUlsAR8QFEfGtiHgmIrZHxA1V+1kR8WBE/KB6PLPlmlsi4rmI+H5E/FJdtUlSL6jzDvgg8J8z873AzwNrI+Ji4GbgocxcATxUHVO9tgq4BLga+JOImFdjfZJUVG1rQWTmbmB39fxARDwDDAPXAB+sTtsEPAzcVLXflZnjwPMR8RxwBfBoXTVKvc5NOue2rizGExHLgPcBjwHnVeFMZu6OiHOr04aB77Rctqtqa/9Za4A1ABdeeGGNVUvluUnn3FZ7AEfEu4AvA5/KzB9HxFFP7dB2xAqzmbkR2AiwcuVKV6DVnOcmnXNXraMgIuJkJsL3i5n5lap5T0QsqV5fAuyt2ncBF7RcvhR4qc76JKmkOkdBBHAH8ExmfrblpS3A6ur5auC+lvZVETEUERcBK4DH66pPkkqrswviSuCTQDMitlVtfwTcCmyOiOuBncC1AJm5PSI2A08zMYJibWYeqrE+SSqqzlEQ36Zzvy7AVUe5Zj2wvq6aJKmXOBNOkgoxgCWpEANYkgpxV2RNa3x8nEajMaWt2WySjsKW3hYDWNNqNBqsu+1eFgwvn2wb27aVhSsuL1iV1P8MYM3IguHlU2Zj7R/bUbAaaW6wD1iSCjGAJakQA1iSCrEPWEdoH/XgiAepHgawjtA+6sERD1I9DGB11DrqwREPUj3sA5akQrwDlvpIpz3iwH3i+pUBLPWR9j3iwH3i+pkBLPUZ94ibO+wDlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsQAlqRCDGBJKsRt6cX4+DiNRmPyuNlsklmwIGlAGMCi0Wiw7rZ7WTC8HICxbVtZuOLywlVppt46+CbNZnNK28jICENDQ4Uq0kwZwAJgwfByzll+GQD7x3YUrkbH48CenWx48XUWV//b9o/tYMNaGB0dLVuYpmUAS3PA/MXLJv8BVf/wSzhJKsQAlqRCDGBJKsQ+YGmOcVRE/zCApTnGURH9wwCW5iBHRfQH+4AlqRADWJIKsQtiwLSv+wCu/SCVUlsAR8TngY8CezPz0qrtLOBLwDLgBeC6zHy1eu0W4HrgELAuMx+oq7ZB1r7uA7j2g1RKnV0QXwCubmu7GXgoM1cAD1XHRMTFwCrgkuqaP4mIeTXWNtAOr/tw+Ne7Fg2XLkkaSLUFcGZuBf6xrfkaYFP1fBPw8Zb2uzJzPDOfB54DrqirNknqBd3+Eu68zNwNUD2eW7UPAz9qOW9X1XaEiFgTEU9ExBP79u2rtVhJqlOvjIKIDm0dvxbKzI2ZuTIzVy5atKjmsiSpPt0O4D0RsQSgetxbte8CLmg5bynwUpdrk6Su6nYAbwFWV89XA/e1tK+KiKGIuAhYATze5dokqavqHIZ2J/BB4JyI2AV8GrgV2BwR1wM7gWsBMnN7RGwGngYOAmsz81BdtUlSL6gtgDPzN4/y0lVHOX89sL6ueiSp1zgTbo5zx2OpdxnAc5w7Hku9ywAeAO54LPWmXhkHLEkDxwCWpELsgphj/NJN6h8G8Bzjl25S/zCA5yC/dJP6gwEszXGdtqkHt6rvBQawNMe1b1MPblXfKwxgaQC4TX1vchiaJBViAEtSIQawJBViAEtSIQawJBXiKAhpAHUaG+y44O4zgKUB1D422HHBZRjA0oBybHB59gFLUiEGsCQVYgBLUiH2AUtyVEQhBrAkR0UUYgBLAhwVUYJ9wJJUiAEsSYUYwJJUiH3Afc5t6KX+ZQD3Obehl/qXATwHuA291J8M4B7W3r3wxhtvAHDKKadMttnlIPUvA7iHdepemDf/bBYvv2TyHLscpP5lAPe49u6FkxYunjJY3i4HldL+CQ2cvny8DGBJJ6T9E5rTl4+fASzpCDNdnKf1E5qOnwEs6QguztMdBnAPcVKFeomL89TPAO4hTqpQr+rUJeENwttnAPcYJ1WoF7V3SYA3CLPBAJY0I+1dEt4gvH2uhiZJhXgHXEinQez2qUmDxQAupP0LN7BPTf3NjT2PnwFcUPsgdvvU1M8cO3z8DGBJs8axw8fHAO4SJ1lIamcAd4mTLCS1M4Br0umO94zznWQh6acM4Jp4xytpOgbwLDjamF7veDXIOg1LA4emtTKAT0Cn7oWNj+xg4VLH9EqHdVo/4tWd3+d3PtTksssmbkw67XPYHtDT7Y3Y6Wd0+jm9qOcCOCKuBv4HMA+4PTNvLVzSEY7WveCYXmmqTutHbPj69slQbt/nsNPY4en2Ruy0V2K/jEHuqQCOiHnAbcBHgF3AdyNiS2Y+PVvvcSL/mra32b0gnbjWUO60z2Enx9obsdPPaO/+mMnf8+nuvDud83b1VAADVwDPZeYPASLiLuAaYNYCuNFosPqPP8vpZy8B4OUfPsW80+Zz5pJ/0fH4aOcsuOgyIiZef23fGPNef52XTz998pr2tumOu3WNtfbONdZ65PH+sR20dxs3m80pNzkzeZ/dTz3Kf3vsAGcueQqY/u/5T17ZzX9Z9ZHJrpHD7/uZux6czIqfvLKbTetvnNW76sgemg0QEb8OXJ2Z/746/iTwrzPzd1vOWQOsqQ7fA7wCvNztWo/iHKylE2vpzFo6m4u1vJyZV7c39todcHRom/IvRGZuBDZOXhDxRGaurLuwmbCWzqylM2vpbJBq6bX1gHcBF7QcLwVeKlSLJNWq1wL4u8CKiLgoIk4BVgFbCtckSbXoqS6IzDwYEb8LPMDEMLTPZ+b2aS7bOM3r3WQtnVlLZ9bS2cDU0lNfwknSIOm1LghJGhgGsCQV0rcBHBGfj4i9EfFUD9RyQUR8KyKeiYjtEXFDwVpOjYjHI+Lvq1r+a6laqnrmRcT3IuKrJeuoankhIpoRsS0inihcy8KIuDsinq3+3BSZMxsR76n+exz+9eOI+FShWn6/+jP7VETcGRGnlqijquWGqo7tdf736Ns+4Ij4APAa8OeZeWnhWpYASzKzERHzgSeBj8/mFOrjqCWA0zPztYg4Gfg2cENmfqfbtVT13AisBM7IzI+WqKGllheAlZlZfJB/RGwC/k9m3l6N+HlnZv5T4ZrmAWNMTH56scvvPczEn9WLM/P/RcRm4GuZ+YVu1lHVcilwFxMzc98A7gf+Y2b+YLbfq2/vgDNzK/CPpesAyMzdmdmonh8AngGGC9WSmfladXhy9avIv7IRsRT4VeD2Eu/fqyLiDOADwB0AmflG6fCtXAXs6Hb4tjgJOC0iTgLeSbk5AO8FvpOZ/5yZB4FHgF+r4436NoB7VUQsA94HPFawhnkRsQ3YCzyYmaVq+Rzwh8Bbhd6/XQJfj4gnqyntpfwMsA/4s6p75vaIOH26i7pgFXBniTfOzDHgM8BOYDewPzO/XqIW4CngAxFxdkS8E/gVpk4QmzUG8CyKiHcBXwY+lZk/LlVHZh7KzJ9lYibhFdVHqq6KiI8CezPzyW6/9zFcmZkjwC8Da6turBJOAkaAP83M9wE/AW4uVAsAVTfIx4C/LvT+ZzKx8NZFwPnA6RHxWyVqycxngP8OPMhE98PfAwfreC8DeJZU/a1fBr6YmV8pXQ9A9bH2YeCIRUC64ErgY1W/613AhyPiLwvUMSkzX6oe9wL3MNHHV8IuYFfLJ5O7mQjkkn4ZaGTmnkLv/wvA85m5LzPfBL4CvL9QLWTmHZk5kpkfYKKrc9b7f8EAnhXVF193AM9k5mcL17IoIhZWz09j4g/2s92uIzNvycylmbmMiY+238zMInc0ABFxevUFKdXH/V9k4qNm12Xm/wV+FBHvqZquYhaXXD1Bv0mh7ofKTuDnI+Kd1d+nq5j4LqWIiDi3erwQ+AQ1/bfpqanIxyMi7gQ+CJwTEbuAT2fmHYXKuRL4JNCs+l4B/igzv1agliXApuob7XcAmzOz+BCwHnAecM/E321OAv4qM+8vWM/vAV+sPvr/EPh3pQqp+jk/AvxOqRoy87GIuBtoMPFx/3uUnZL85Yg4G3gTWJuZr9bxJn07DE2S+p1dEJJUiAEsSYUYwJJUiAEsSYUYwJJUiAEsSYUYwJJUyP8HL5Wzx2fvt4cAAAAASUVORK5CYII=", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "# Why \"lognormal\"?\n", - "\n", - "sns.displot(np.log(s2))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Box plots\n", - "\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Outliers, missing values\n", - "\n", - "An *outlier* is an observation far from the center of mass of the distribution. It might be an error or a genuine observation: this distinction requires domain knowledge. Outliers infuence the outcomes of several statistics and machine learning methods: it is important to decide how to deal with them.\n", - "\n", - "A *missing value* is an observation without a value. There can be many reasons for a missing value: the value might not exist (hence its absence is informative and it should be left empty) or might not be known (hence the value is existing but missing in the dataset and it should be marked as NA).\n", - "\n", - "*One way to think about the difference is with this Zen-like koan: An explicit missing value is the presence of an absence; an implicit missing value is the absence of a presence.*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Summary statistics\n", - "A statistic is a function of a collection of observations, or otherwise stated a measure over a distribution. \n", - "\n", - "A statistic is said to be *robust* if not sensitive to outliers.\n", - "\n", - "* Not robust: min, max, mean, standard deviation.\n", - "* Robust: mode, median, other quartiles.\n", - "\n", - "A closer look at the mean:\n", - "\n", - "$\\bar{x} = \\frac{1}{n} \\sum_{i}x_i$\n", - "\n", - "And variance (the standard deviation is the square root of the variance):\n", - "\n", - "$Var(x) = \\frac{1}{n} \\sum_{i}(x_i - \\bar{x})^2$\n", - "\n", - "The mean, the median, etc. are measures of location (e.g., the typical value); the variance is a measure of dispersion." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "" - ] - }, - { - "cell_type": "code", - "execution_count": 95, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "4.993729761026251\n", - "241.22132048015996\n" - ] - } - ], - "source": [ - "# Not robust: min, max, mean, mode, standard deviation\n", - "\n", - "print(np.mean(s1)) # should be 5\n", - "print(np.mean(s2))" - ] - }, - { - "cell_type": "code", - "execution_count": 96, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "5.0017132760085286\n", - "148.79782622743468\n" - ] - } - ], - "source": [ - "# Robust: median, other quartiles\n", - "\n", - "print(np.quantile(s1, 0.5)) # should coincide with mean and mode\n", - "print(np.quantile(s2, 0.5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Questions\n", - "\n", - "* Calculate the min, max, mode and sd. *hint: explore the numpy documentation!*\n", - "* Calculate the 90% quantile values.\n", - "* Consider our normally distributed data in s1. Add an outlier (e.g., value 100). What happens to the mean and mode? Write down your answer and then check." - ] - }, - { - "cell_type": "code", - "execution_count": 97, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
annual_salarya_agelength
count7870.0000009303.0000009645.000000
mean5.91692114.2666885.005694
std6.9852142.9027701.462343
min0.1666671.0000000.083333
25%3.00000012.0000004.000000
50%4.00000014.0000005.000000
75%6.00000016.0000006.000000
max180.00000050.00000015.000000
\n", - "
" - ], - "text/plain": [ - " annual_salary a_age length\n", - "count 7870.000000 9303.000000 9645.000000\n", - "mean 5.916921 14.266688 5.005694\n", - "std 6.985214 2.902770 1.462343\n", - "min 0.166667 1.000000 0.083333\n", - "25% 3.000000 12.000000 4.000000\n", - "50% 4.000000 14.000000 5.000000\n", - "75% 6.000000 16.000000 6.000000\n", - "max 180.000000 50.000000 15.000000" - ] - }, - "execution_count": 97, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Let's explore our dataset\n", - "df_contracts[[\"annual_salary\",\"a_age\",\"length\"]].describe()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Relating two variables\n", - "\n", - "### Covariance\n", - "\n", - "Measure of association, specifically of the joint linear variability of two variables:\n", - "\n", - "\n", - "\n", - "Its normalized version is called the (Pearson's) correlation coefficient:\n", - "\n", - "\n", - "\n", - "Correlation is helpful to spot possible relations, but is of tricky interpretation and is not exhaustive:\n", - "\n", - "\n", - "\n", - "See: https://en.wikipedia.org/wiki/Covariance and https://en.wikipedia.org/wiki/Pearson_correlation_coefficient.\n", - "\n", - "*Note: correlation is not causation!*" - ] - }, - { - "cell_type": "code", - "execution_count": 98, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
annual_salarya_agelength
annual_salary1.0000000.205404-0.361611
a_age0.2054041.000000-0.430062
length-0.361611-0.4300621.000000
\n", - "
" - ], - "text/plain": [ - " annual_salary a_age length\n", - "annual_salary 1.000000 0.205404 -0.361611\n", - "a_age 0.205404 1.000000 -0.430062\n", - "length -0.361611 -0.430062 1.000000" - ] - }, - "execution_count": 98, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_contracts[[\"annual_salary\",\"a_age\",\"length\"]].corr()" - ] - }, - { - "cell_type": "code", - "execution_count": 99, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 99, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYYAAAEGCAYAAABhMDI9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAABAjElEQVR4nO3deXycZbnw8d81k0wme9M0TdKWNk2bLqQbECoqm61oxdKyCaKCAr7Ie0A44oIbIEU9ighHhFdEdhAERWU5yBGBI+gBIQW6UaClG22TNE3b7DOTmbnfP2bpPJln2sxkpjNpru/nk0+SJzP3XJk2z/U893LdYoxBKaWUinBkOwCllFK5RRODUkopC00MSimlLDQxKKWUstDEoJRSyiIv2wEM17hx40xdXV22w1BKqRFl5cqVu40xVXY/G/GJoa6ujubm5myHoZRSI4qIbE30M+1KUkopZaGJQSmllIUmBqWUUhaaGJRSSlloYlBKKWUx4mclDUcwaNjS0Utbl4fqMjd1lcU4HJLtsJRSKqtGbWIIBg3PrmvlqsfewjMQxJ3v4OZzFrCksUaTg1JqVBu1XUlbOnqjSQHAMxDkqsfeYktHb5YjU0qp7Bq1iaGtyxNNChGegSC7uj1ZikgppXLDqE0M1WVu3PnWX9+d72B8qTtLESmlVG4YtYmhrrKYm89ZEE0OkTGGusriLEemlFLZldHBZxG5B1gK7DLGzAkfexSYGX7IGGCfMWaBiNQB64F3wz971RhzaaZicziEJY01zLriBHZ1exhfqrOSlFIKMj8r6T7gNuCByAFjzLmRr0Xk50BnzOPfN8YsyHBMUQ6HUF9VQn1VyaF6SaWUynkZTQzGmJfCdwJxRESAc4BFmYxBKaVUcrI5xnAC0GaM2RBzbKqIvCkifxeRExI9UUQuEZFmEWlub2/PfKRKKTWKZDMxnAc8EvN9CzDZGHMUcBXwsIiU2T3RGHOnMabJGNNUVWW7z4RSSqkUZSUxiEgecCbwaOSYMcZrjOkIf70SeB+YkY34lFJqNMvWHcPHgXeMMdsjB0SkSkSc4a/rgQZgU5biU0qpUSujiUFEHgFeAWaKyHYRuTj8o89i7UYCOBFYLSKrgD8Alxpj9mQyPqWUUvEyPSvpvATHv2Rz7HHg8UzGo5RS6uBG7cpnpZRS9jQxKKWUstDEoJRSykITg1JKKQtNDEoppSw0MSillLLQxKCUUspCE4NSSikLTQxKKaUsNDEopZSy0MSglFLKQhODUkopC00MSimlLDQxKKWUstDEoJRSykITg1JKKQtNDEoppSw0MSillLLI9J7P94jILhFZG3PsByKyQ0TeCn+cGvOz74jIRhF5V0Q+mcnYlFJK2cv0HcN9wBKb47cYYxaEP54BEJEjgc8CjeHn/D8RcWY4PqWUUoNkNDEYY14C9gzx4cuB3xljvMaYzcBGYGHGglNKKWUrW2MMl4vI6nBXU0X42ETgg5jHbA8fiyMil4hIs4g0t7e3ZzpWpZQaVbKRGH4FTAMWAC3Az8PHxeaxxq4BY8ydxpgmY0xTVVVVRoJUSqnR6pAnBmNMmzEmYIwJAr9hf3fRduCImIdOAnYe6viUUmq0O+SJQURqY749A4jMWHoS+KyIFIjIVKABeO1Qx6eUUqNdXiYbF5FHgJOBcSKyHbgOOFlEFhDqJtoCfAXAGLNORB4D3gb8wGXGmEAm41NKKRVPjLHtxh8xmpqaTHNzc7bDUEqpEUVEVhpjmux+piuflVJKWWhiUEopZaGJQSmllIUmBqWUUhaaGJRSSlloYlBKKWWhiUEppZSFJgallFIWmhiUUkpZaGJQSilloYlBKaWUhSYGpZRSFpoYlFJKWWhiUEopZaGJQSmllIUmBqWUUhaaGJRSSlloYlBKKWWhiUEppZRFRhODiNwjIrtEZG3MsZ+JyDsislpE/iQiY8LH60SkX0TeCn/ckcnYlFJK2cv0HcN9wJJBx54D5hhj5gHvAd+J+dn7xpgF4Y9LMxybUkopGxlNDMaYl4A9g4791RjjD3/7KjApkzEopZRKTrbHGC4C/hLz/VQReVNE/i4iJyR6kohcIiLNItLc3t6e+SiVUmoUyVpiEJHvAX7gt+FDLcBkY8xRwFXAwyJSZvdcY8ydxpgmY0xTVVXVoQlYKaVGiawkBhH5IrAU+LwxxgAYY7zGmI7w1yuB94EZ2YhPKaVGs0OeGERkCXA1sMwY0xdzvEpEnOGv64EGYNOhjk8ppUa7vEw2LiKPACcD40RkO3AdoVlIBcBzIgLwangG0onAChHxAwHgUmPMHtuGlVJKZUxGE4Mx5jybw3cneOzjwOOZjEcppdTBDbkrSUTmZDIQpZRSuSGZMYY7ROQ1Efm3yGplpZRSh58hJwZjzPHA54EjgGYReVhETslYZEoppbIiqVlJxpgNwPcJzSo6Cbg1XPfozEwEp5RS6tBLZoxhnojcAqwHFgGnGWNmh7++JUPxKaWUOsSSmZV0G/Ab4LvGmP7IQWPMThH5ftojU0oplRVDSgzhhWcfGGMetPt5ouNKKaVGniF1JRljAkCliLgyHI9SSqksS6YraSvwTxF5EuiNHDTG3Jz2qJRSSmVNMolhZ/jDAZRmJhyllFLZNuTEYIy5PpOBKKWUyg1DTgwiUgV8C2gE3JHjxphFGYhLKaVUliSzwO23wDvAVOB6YAvwegZiUkoplUXJJIZKY8zdwIAx5u/GmIuA4zIUl1JKqSxJZvB5IPy5RUQ+TWggelL6Q1JKKZVNySSGH4pIOfB14JdAGfC1jESllFIqa5KZlfR0+MtO4GOZCUcppVS2HTQxiMgvAZPo58aYK9IakVJKqawayh1Dc6qNi8g9wFJglzFmTvjYWOBRoI7QzKZzjDF7wz/7DnAxoT2frzDG/Heqr61GlmDQsKWjl7YuD9Vlbuoqi3E4JNthKTUqHTQxGGPuH0b79xGqyvpAzLFvA88bY34iIt8Of3+1iBwJfJbQOokJwN9EZEa4TpM6jAWDhmfXtXLVY2/hGQjizndw8zkLWNJYo8lBqSxIZj+GKhG5SUSeEZEXIh8Heo4x5iVgz6DDy4FIsrkfOD3m+O+MMV5jzGZgI7BwqPGpkWtLR280KQB4BoJc9dhbbOnoPcgzlVKZkOwCt/UMf4FbtTGmBSD8eXz4+ETgg5jHbQ8fiyMil4hIs4g0t7e3pxCCyiVtXZ5oUojwDATZ1e3JUkRKjW65tMDNrs/AdtDbGHOnMabJGNNUVVWVxhBUNlSXuXHnW/8ruvMdjC91J3iGUiqTkkkMlgVuInIUqS1waxORWoDw513h49uBI2IeN4nQIjp1mKurLObmcxZEk0NkjKGusjjLkSk1OmVjgduTwBeBn4Q/PxFz/GERuZnQ4HMD8FoK7asRxuEQljTWMOuKE9jV7WF8qc5KUiqbMrrATUQeAU4GxonIduA6QgnhMRG5GNgGfCbc/joReQx4G/ADl+mMpNHD4RDqq0qoryrJdihKjXrJlN2+Efgh0A88C8wH/t0Y81Ci5xhjzkvwo8UJHv8j4EdDjUkppVT6JTPG8AljTBehBWvbgRnANzMSlVJKqaxJJjHkhz+fCjxijBm8PkEppdRhIJnB56dE5B1CXUn/Ft7RbdRPNM9EKYd0t6nlJpRSyUhm8PnbIvJToMsYExCRPkKrlQEQkVOMMc9lIshclYlSDuluU8tNKKWSlUxXEsaYvZGZQsaYXmNMa8yPf5rWyEaATJRySHebWm5CKZWspBLDQYy6y89MlHJId5tabkIplax0JoaEezYcrjJRyiHdbWq5CaVUstKZGEadTJRySHebWm5CKZUsMSY9F/oi8kdjzJlpaSwJTU1Nprk55b2Ehi0y4yedpRzS3WYmYlRKjWwistIY02T7s4MlBhE54MneGPPHYcQ2bNlODEopNRIdKDEMZbrqaQf4mQGymhiUUkql11C29rzwUASilFIqNySz8hkR+TShPZmjU1qMMSvSHZRSSqnsSaa66h1AEaGS23cBZ6P7JWSElsRQSmVTMncMHzHGzBOR1caY60Xk5+j4QtppSQylVLYls46hP/y5T0QmENrqc2r6QxrdtCSGUirbkkkMT4vIGOBnwBvAFuB3GYhpVNOSGEqpbEumuuoN4S8fF5GnAbcxpjMzYY1ekRIWsSfzdJTESFd7SqnD35DvGETkgsgHcC6wPPx10kRkpoi8FfPRJSL/LiI/EJEdMcdPTaX9kUxLYiilsm3IJTFE5Jcx37oJ7dv8hjHm7GEFIOIEdgAfAi4EeowxNw31+Yfjyme/P8i6lk5aOj3UlrtprC0nLy/1slZaEkMpNdhwVz4DYIz56qBGy4EHhxkbhBLM+8aYrSJ6sgoGDX9d35bWWUQOh1BfVUJ9VUmao1VKHY6GU121D2hIQwyfBR6J+f5yEVktIveISEUa2h9RdBaRUirbkhljeEpEngx/PA28CzwxnBcXERewDPh9+NCvgGnAAqAF+HmC510iIs0i0tze3j6cEHKOziJKj2DQsKm9h1fe382m9h6CwVG3XYhSKUtmgVtsv78f2GqM2T7M1/8UoXGKNoDIZwAR+Q3wtN2TjDF3AndCaIxhmDHkFJ1FNHy6qE+p4UlmjOHvGXj984jpRhKRWmNMS/jbM4C1GXjNrBhqWYq6ymJu+9xRrN7eSdCAU2DupPKcmkWUiRIb6WwzUXfcrCtO0HEWpYYgmVpJZwI/BcYT2t9ZAGOMKUvlhUWkCDgF+ErM4RtFZAGhct5bBv1sxEr2CtbnN9z50ibLY3NFJq7G093mgbrjNDEodXDJDD7fCCwzxpQbY8qMMaWpJgUAY0yfMaYydpGcMeZ8Y8xcY8w8Y8yymLuHES2ZAeVcH3zORHzpblP3uVZqeJJJDG3GmPUZi2SEGsogZzIDyrk++JyJ+NLdpi7qU2p4khl8bhaRR4E/A97IwWxv7ZlNQ+0CSWZAOdcHnzMRX7rbdDiEJY01zLriBF3Up1QKkln5fK/NYWOMuSi9ISUnmyufN7X3cOqtL8ed0J4ZNMiZTB96rs+oCQYNL7zbFjc4vmhm9bDGGF7euIvu/gC9Xj/F7jxK3U5OmD4+J35npQ5H6Vr5rFt8DjLUQc5krmBHwtVuugfHg0FDe7eP7/95bbTNH54+h2DQ5NTvrdRokcyspCrg/wB1sc/L9h1DNiXTBZJMWYpcLmGRiamg61o6o0kh0ub3/7yWhvElzD9i1C1+Vyrrkhl8fgIoB/4G/FfMx6g1Ggc5MzH43NJp32ZrZ24MuCs12iQz+FxkjLk6Y5GMQCOh2yfdMjH4XFteaNtmTXluDLgrNdoku4PbqNsf4WAi3T7H1Y+jvqrksE4KkJm7pMbaMn54+hxLmz88fQ6NteVpiVkplZxkZiV1A8WEpqoOMMyVz+lyOO7H4PMFWL2zk9YuD7VlbuZOKMflcmY7rCiPx8+alk5au7zUlBUwt7YctzuZm894/f0DrGntoq3LS3VZAXNryigszE+5vUyU7VDqcJKuWUmlIjKWUKltvcfPEJ8vwJ9X7+TaJ/bP0FmxfA6nz5uQE8nB5wvw5NqWtMbn8wV4al1r2trM9Sm/SuW6ZMpufxn4O/As8IPw52szE9botXpnZ/QECaFB2GufWMvqnbmxvXYm4kt3m7leVkSpXJfMGMOVwLGEym1/DDgK2J2RqEax1gSzftq6cmOGTibiS3ebuV5WRKlcl0xi8BhjPAAiUmCMeQeYmZmwRq/aBAXgqstyo/cuE/HVltu3WZNim1pET6nhSSYxbBeRMYRqJT0nIk8AOzMR1Gg2d0I5K5ZbZ+isWD6HeRNyY4ZOJuKrLHZx3WmNljavO62RyhJXSu2NxvUlSqXTkGclWZ4kchKhxW7PGmN8aY8qCdmelZSJ2S+RWUmRNufl2KykdMf3yvu7+Y9n1vPlE6fR7/NT6Mrjrpfe57ufns1x9eNSajPy7zJa1pcolay0zEqKlaHd3LIu2ZN8pma/uFxOmurGpvz8TMvLczC22MVAIMjYYhd5ecnceMarLnPz3q4ernjkzeix4Xb95HJZEaVy3fD+og8jkZP8qbe+zHm/+Ren3voyz65rPeAm8pt3289+2bz78J39ksr7dDDa9aNUbhneqqTDSCrF4bbu6aWiyMWZR09CwjcIj6/czrY9vUwbf3heqWaiiJ7DIXxidjWPXnIcLZ0eassLaawt064fpbJEE0NYKvsEl7vzueDDU/jF8xuiXUlXLm6gzJ36it1cl4n9lINBw1/Xt+mCNKVyRNa6kkRki4isEZG3RKQ5fGysiDwnIhvCnw9ZzeVUpjjmOyWaFCB0gvzF8xvIdw7vZOb3B1n1wV6eXdvCqg/24fcHD/6kA/B4/Ly+uYOnVu3k9c0deDz+lNvKxFTQTCxIG8qWq0ope9keY/iYMWZBzMj4t4HnjTENwPPh7w+JVPq5e7wB26vnXl8g5Tj8/iB/XrWDc+98lUsfeoNz73yFP6/akXJy8Hj8PLmmhfPveY2vPvIm59/zGk+uaUk5OUyuKLIteDe5oiil9gBaE5TdTnWBWybGQZQaTXKtK2k5cHL46/uB/wEOSanvVEpoFxfk2ZaLLhrG1M10b1qzpqWTa58cVG7iybVMHVfEsVMrk25v294+fvnCBi4+vh4RMAZ++cIGjp5ckXJXUkGew/Z9zHemdt2SiXEQpUaTbN4xGOCvIrJSRC4JH6s2xrQAhD+Pt3uiiFwiIs0i0tze3p62gJItoe0LBLhiUYPl6vmKRQ0MBFLv+kn3pjWtXd4EV+PelNpr6/KwtaOf21/cyG0vbOT2FzeytaN/WOUmOj0+2/ex25PaEhktiaHU8GTzjuGjxpidIjKe0Erqd4b6RGPMncCdEFrglqkAD6ayuIBHm7dZrp4fbd7Gkjk1KbeZ7k1rasoKbNurLitIqb1MbNRTWezm0ea3497HWz97VM7EqNRokrU7BmPMzvDnXcCfgIVAm4jUAoQ/78pWfENRV1nM1Utmc/c/NnHbCxu5+x+buHrJ7JzatGZubTkrlg0qYbFsDnNTbC9TG/V8dVGD5X386qKGlH9nXReh1PCkVBJj2C8qUgw4jDHd4a+fA1YAi4EOY8xPROTbwFhjzLcO1FamS2IcaDV0MGh44d02Vm/vJGjAITBvUjmLZlYPa5ql3x9kXUsnrZ0easrdNNaWD2t1caY2wUlnuYl0b06kJTGUOrADlcTIVmKoJ3SXAKHurIeNMT8SkUrgMWAysA34jDFmz4HaymRiOFjJi03tPZx668txXRbP5NAg50jYtGYkxKjU4eZAiSErXUnGmE3GmPnhj0ZjzI/CxzuMMYuNMQ3hzwdMCpl2sPn1iQY5c2XvBBgZm9aMhBiVGk2yvY4hpx1sdkuRK892sddwpqum20iYoTMSYlRqNNHEcAAHW+Wbiemq6TYSNq0ZCTEqNZrk2gK3nBEMGhwCPz5jLt/90xpL33dkdktlcQFvfdDBr88/hr29A4wtzuehVzfbTleNDCjHFolLNKCczsHiuspifnneUazZERogdwrMmVg+rBk6vf1e1rX2RONrrCmhuDC16a+RGB+4qIlAUGjv9lJVWoDTYYYVYyb2yVBqtNDEYCN2MLSiyMUlJ9Yzo7qU2TVlTB23/wQzqbyQjx85ga88uDKaOFYsn8Ok8kJLe5EyF5EVzZEpqKfPnxiXHPr7B3hqbWt0tXJkeulpc2pSSg4+X4B9fQPc+dKmaHs3LJ+DzxfA7U7+n7+338t/rd0VF9+n54xPOTn0er1s2e2Ja3N2jZfSwuTvGnQwW6nh0a4kG7GDoS2dHm59fiPf+P0qRLCcWNa3dXH7i6HyEJcvms6XT6jn9hc3sL6ty9JeojIX61o64157TWuXbQmLNa1dcY8dijUtnVzzhLW9a55Yyxqb1x6Kda09tvGta+1JqT2A9a29tm2ub01t8FkHs5UaHr1jsDHU0tIdvV7ObZrMrS/sL7t9xaIG9vRay00cqMzF/CMGv3Z6S1ikvyRGetvLRJuZKA2u1Giidww2hjoYWuZ2RZMChE4+t76wgVK3dRP7SJmLwe3ZlbmoDpewGPzYVEtY1KS5vXTHl4k2dTBbqeEZlYnhYLX6E5VUmFxRZHmebyBIRZGLyz42ncsXhT4qilxxs5KSKXNRkOfg+mWNlsfGfp+sdJfEaKwpsW2vsSb1K/GKIicrBv3OK5Y1MrY4tWm/WhJDqeEZdV1JQxmYtCvBPbmiKG6Xsfu+dKztDm61g+4E8vIcnD5/Ig3jSw5a5qLUnc/KLbu550vH0tHjpbKkgD+/sY1j68am9Pu6XE7qqtzc+6Vjae/xUlVSgNNpUi43UVxYwKfmVFE3bmF0VtLsmuJhzUpySB5dff08cOFC2ro9VJe6Wf3BboTUfmcAV55wyYn10VIlrjwddFZqqEZdYhhqrf5ICe7IsU3tPXHPe/ODfbY7uJ0yuzrudfPyHMw/oiJuTGGwSeWFNE2t4qL7Xj/gTKeh2ranl5VbOuOS1/iSQurGJX+V7/cH+e+324c0w2qoHAIDxskF975miTHVCURbOnq5/OE3c7pUiVK5bNR1JaW6ytbueb0++x3cPtjbl3J869u6uHbQLKJrn1gbN9NpqFq7vLbJK9WB3WRmWA3V5o5eHnhla3R218XH1/PAK1tTnkU0EkqVKJXLRl1iSHVg0u55TiFBSYzUb8TSvVFPV/+AbXud/QM5ER9AsSvP0tUj4a6fVN/HkVCqRKlcNuoSQ6oDk3bPm39EOdedZh00ve60RmrKU+9vT2YG01CUuJ227RUXpHaSTHd8ADXlBVx60vTofgx3vbyJS0+anvL7OBJKlSiVy0bdGEMqezsnel4waPjda1v59fnHsK93gDHF+fz21c0sTHGgGEIzmO6/6BgwzujrIIGUN62pKMrnlnPmEwhCr9dPsTsPp8DY4tRKbDTWlnHTZ+bzXlt3tMRGQ3VpyvEB+APw4jstcaVFUn0fK4sL2NPTx30XLqQ9/B4+t24HY4tT31kPkitrotRINuoSA8QPLKf6vDe37eFjM2stJTGuW9pIV4p7FQP4Bvxs3e2NKw8xt8ZPXp7r4A0M4s5z0tnv5wdPrYu294PTGnHnpXbHEAwa+n2BuBIbg6f8JqPb42XxbOv7eP2yRro9XiD5weLakgJm1FTwpZjB7BXL5lBbkvqdXDJlTZQa6fR/9DD4A4bH39jGjWfP56dnzuVnZ8/n8Te24Q+kfpJc29ptWx5ibWt3Su3t6vZFk0KkvR88tY5d3aklr9U77UtsrN6Z+uCzLwDXPWmN8bon1+ELpNbe2gRlRdamWFYEMjPorlSuGpV3DOnS5/NzTtNkvvWHVTFXpo30+fwpt5nu8hDtPfbt7e5JtcRG+mf8JPqdd+VIGRBIrqyJUiOd3jEMQ0lBPtcOutK99sl1lBTE99/7/UFWfbCXZ9e2sOqDffj99gOh6S4PMa7EZdteZUny3VIAtQlmdVWXpT74nOh3Hp8jZUAgM4PuSuWqrCQGETlCRF4UkfUisk5Ergwf/4GI7BCRt8Ifp2YjvqFKfDVu7aaJ9E+fe+erXPrQG5x75yv8edUO2+QwvtRlWxJjfGlqJ/LqkgLbmVPVKfa3z51Qzorlg0piLJ/DvAmpDz47xHDd0kExLm3EIal1yaW7DAgkV9ZEqZEuW11JfuDrxpg3RKQUWCkiz4V/dosx5qYsxZWUYldoKujgFbaFLmu+TdQ/3TC+hPlHVFge29Lp5Z2d+ywzap5ds4MplUXUjStNOsZOjx+nGG46ez69Pj/Frjz6fAN0elLr7nK5nJzaWEVd5f6SGEfWFKdcYgOgosjNrs5d3H/hQnZ1hzbWeWVDGwunpjYrye3Osy3bkcr+ExEOh1BV6rK8j6WFTt3fQR2WspIYjDEtQEv4624RWQ9MzEYsYN3ta3ypG6cj1KecaOevyOO9/iC3f+5oVjy9jq0d/dGr8bHFLja190R3D0umf7qmtIBZE8ZYZtRcvyz1K/wd+/p56NVtfPnEaWDAAA+9uo2vnDSNBZMrDvr8wfr7B3hmbXvaNhICyHMGqK0o5YuDZhHlOVMbfe7vH+AvaY5xS0cvX3nwDS2zoUaFrA8+i0gdcBTwL+CjwOUicgHQTOiuYm8mX9+uqN6Vixt44JWt7O3zxRXYs3v8NUuPpNszQLcnwB1/38hNZ8/n7DteoaLIxWeaJnFkbRlXLp7OY83baQmvEHbnO+KK7QF09Pr4fXNoplO/109RQR73/+8mpo0rZmoKv19tuZtPza21DJDbFfobqkQbCdWNK2Lh1MqU2mzr8tu2+cBFC5mSQpOZiVH3eFCjR1YHn0WkBHgc+HdjTBfwK2AasIDQHcXPEzzvEhFpFpHm9vb2YcVgV1TvF89v4MyjJ9nu/DX48RVFLtq6PEwcU8SsmlLK3fm0dnmoKHJx/nFTuPOlTVz60Bv8+qVNXPDhKdSWu6Mn51J3/NVr74Cfs44OzXS6+o9r+OYfVnHW0ZPpG0it68frD9jWSvImGPw+mGRnTR2sxDnArkSzkrpzZzMh3eNBjSZZSwwikk8oKfzWGPNHAGNMmzEmYIwJAr8BFto91xhzpzGmyRjTVFVVNaw4El0Jiuz/OrbAXuzja8vd0ZP/Vx95k2/+YRXnfWgKkyoKOfPoSXGb+Pzi+Q1899TZ0SJxbTaF+9x5eVz/tHWm0/VPr6MgL7Wbu85+f1prJSWcNVUa39UVubs69daXOe83/+LUW1/m2XWtccmhsjTBzKniFAfck4hxqHSPBzWaZKUrSUQEuBtYb4y5OeZ4bXj8AeAMYG2mY4lcCQ7uOzZm/9exV4Wxjz/z6Ek82ryNi4+vjyaSO/6+kRvPmocItifkd9u6uf3FjbjzHVSVxF9t7u7xUlHk4syjJ0XbfHzl9pTXHVSVFNj+fuNSHLOYUFHAr75wFE5xsCdcviJggkwYG9/elo5efvrsesv789Nn1zOrptTS/VKYJ1x3WiPXx6zOvu60RgptNieKHQ9KNAaUTIxD5XAIixqqePCihbR2eakpK2BubbkOPqvDUrbGGD4KnA+sEZG3wse+C5wnIgsIjZFuAb6S6UAiV4J2Ywx2V4WTK4pYsXwO1z6xllK303bP5y6PP1p51S7hRF7DaXO/NnGM23bzn4ljUuuyyHMaVixrjK63iCzCy3emNhW0pCCP9q6BuIHdoyZb/ysFg4bdPV4u/1gD2/f28Vjzdvb2+aJ7YscmhhJ3HhVFeZYZP3nOUAHAwW0ebJOlZGJMhs8X4Mm1LdGS6JFpuqfPmzCsGVlK5SIxJvXyDbmgqanJNDc3D6uNyFXorm4PVSWhWUmtXfYF9ja193Dhfa+xdN5Ejp9eyYXhDXUi3PkOHrhwISu37aWsMJ8bnn7bUlPI5RQcDgd/XPkBl548naAxlivf5i17+MLd/4pr86GLP0RTCkXlXtvcwTf/sIql8yYiAsbA06t38LOz56c0EPva5g4uuOe1+N/5ooXR9uxO4FcsauDBV0MD+o9ecpxlmu7rmzv4hk2MN509n2NjYtzU3sOpt7580JlBQ4kxWen+d1Eq20RkpTGmye5nWZ+VlAvsiuol2t2srcvD1o5+bn9xI40TSm27izp6vfz02XepLXdz8fH1TB1XRLErj588u94yrfVbj6+Kfh+58t3VndpGQom0dXmj8Q4+PthQumnaurzMGF/Cl0+cFp019ZuX3re0Zzegf+sLG7j4+Hpuf3EjfYOKIO3qto9x8ODzUGcGZWLwOROlQJTKVZoYkhQ7xjAuQf99ZXGoL7ul08PtL27k6iUz+f7z1umT1z+1LnqijMx+mnXFCQnbHFecWv94dVkBUyoLo1fjAE+t2hFXHmKo3TSTxxZy3oemWKa/XndaI5PH7t969EAD+nblM6pK7X/nqkHjIInGgwbPDIoMPg9+3LBKYiR47eGUAlEqV2mtpAQ8Hj+vb+7gqVU7eX1zBx6PH78/SHf/AD87ez63fe4our0DthvCdPsGLMemVBYdcOZT5Ptd3R729du32ekZiJv2OZT6S0Uu4bKTGyyb4Fx2cgNFLuudQKK9sAdvrxk0hjv+vjG6DeeXT6jnjr9vJBjTJRk7tbO23M1lH5vOFYunM6umlNs+d1TcTJ4uT4L30WudOTXUmUEzaoq5YVDZjhuWz2FGTeoziDJRCkSpXKV3DDY8Hj9PrmmxDF7esHwOhflOvhFzpXzHF46xzEoyBh5t3sbNn1nAM1ecwJ5eL/lOB30+/wFnPkW+H1/qxiFe2zZ/dtb8aP+6O9/Bzz+zgDwnrNvZRdDA+pYutu3p5VONtZb9Afp9JuHisVhD7abZ2+fjy8fX097jjW7U8+Xj69nXt78+VOQE/tNn18cNzt98zoK497uiyMUL77SGFvX5/BS5Qov6Fk490vK4oW6ytLW9HxHDJSfWEzTgEBAxbG3vZ8zk1K7wXS4nSxurqassipbZmFNTqgPP6rCkicHGmpbOuJPpNU+s5ZIT6y3HbnvhPS77WEPcTJWyQgd1lcVs2t3D6u2dTB5bxC3nLuAnf7GOMdzx91CfeuyVr9Ph57ufms1AwER3XJs3cTYVxU7La3/9929x++eOssS9t9fHO21dzJk4Zv+xPp/t9NfYEzkMvZumsqiAd1t7LBv1XLm4gbFF+7tpIifwiWPcnHvnq3F3IbMGDRa7HMLnj6tj4679u8J9/rg6CpzxU0GHsslSj8/P9/+8Lu53ueeLtuNsQ+LzBXh6XZvOSlKjgiYGG4nq+U8cU8jli6bz+MpQaYsP1Vfxt7d3xm1JWT+ugW17etnQZj2BrljWSHlRqI7SI//ayo9On0ueUyxXvvv6DHv6BuLm9Nf4rCfoiiIXrV3euBN0j9e6QnpcSYHt9NfB6xjspu3addN0e/22K6nnTrR2qTgcQp8vMKS7kB6fn9ZOT9zvUpPimECv135RX483xZ1/CG1QdO2gDYqufWIt9eOKdVaSOuxoYhjE7w8mHLzctqefu/+xKTr1srbMxZSx1i0pVyxrJGAC7Or20j8Q4Msn1PP4yu0AbN/XjyF0EnblCSUFefQPWE9Wfb5ANCnA/oHqe790rOVxn2maFJ0KG3ncL57fwF0XNEUL+I0vddPrC/C7162L8H73+jbmTCy3FPqrqyweUjdNT4KTbq83vmRHdZmbpinlXPCRekvdp8F3If4gPL++Na4+1OBkM1SF+XkJqt6mfmWfaFZS6zBmJQ1lFlg221OjlyaGGJF9E/66bifXL2uMbjcZOw8/MvXykhPrmVlTxvkx8+U9A0Fu/5+NXLl4Bt/905roc7+zZBYDQcNNf303euxHZ8zlh/+1juatnZYZQB29PtsT0J5eX/Rk5853MLPafqqs1x/g1FtfjhbwWzC53HYRnj8QtIxZ/PD0uTRUFUfHNRIZW+yyPelWFMWXr5hUXsg5x1pnMK1YPodJ5YWWx4kEo/WhondJSxsRSa2ekytPuHJxQ9xdkisv9ZNkZMOj+BXkqZXtGOossGy1p0Y3TQwxYvdNGPDDr88/hq7+Ad5p7eHBV7dGK6N6BoLMrCm1PYkvnTcxmhQij+3o80W7SSLHvvenNXzvU7OjV9NBY9i+r5dxJS7bq+zKEhcXH1/P5IpC9vT5qCq1n4ZaWpDPlYsboovrHrxoYVzNpltf2MCDFy20HPvlC+9x6UnTLV1YN5+zgE/Mrmbb3r7oVWifLzSDaHCisSvyt76ty7b7ZWa1dR8KlzOPO15aZy0t8tJGfnb2/Lg2/f4g61o6aen0UFteSGNtmWWwHSDf6YjbO8Hj95PvSH0SXpHLyQ9Oa4zun+3Od/CD0xopyk/tLiTRLLDB4y/Zai9C70JGJ00MMWL3TXjxvd28+N5uLl80nbv/sSnuSvHd1m6aplTEXUU6HfE1koLGvm5SeVH+oP2i5zD/iFIu+MhUy0DsBR+ZSmVx6J/K6RTqKovxBfxceuL0aMG9yFV2/4Cf/oH9FVX39dvfgcQW0astd3P1J2fxTlt3tOurpdPDVY+9xW8uaOL/PNAcfY0HLlpoP2sqfBKPPZHs7bN/7cH7UHR5Bmzvaro91umqkTu6SPIO3enM4fT5EwclhyDGiGUG2Q3L5wCp3YEA9HoHKHNby3Y4BHp9qRUjTDQLrK0rtTLemSgLrncho5cmhhiRfX1j/8CeWrWDa5YeaSltsWJZI7f8bQMnNoyL67KYXVsWbaO23M2ZR09ickWh7X4MG9t74qaRPvLlD7Fjb3/cQGxtuTuaoNz5Dn71hWNsq7A+cOFCSyIaW2Q/XjIm3PUTqRB71e/3n0S/s2QW3V4/Hn+QAX+QiiJXNGn2+Qb43MIp3PK396KP/9rHZ9Dn8/Pa5g76fH4K8pzs6vZSk2CmU0V4I6PI1WeZOz+uGGFssol4O8FOeDPGlzAv5g7EHxQefX1r3JjFt5bMTvn/htPh5Krfr4z7XQZP+x2qIpf9OEhRiuMgQ51VloxM3YWo3KeJIUZkX9/YK9Jzmybz6GvWK+TJlUXs7fPxwZ5eJo5xW+bLOzD8+Iy5/OL59+KugmM3AFqxbA63/O09y+tHZs7Yzfq58/xjLMfe3LbX9gpxX7+PD9WN5a7wSaIgfLV8zRPWNRnufAfufEdcefCKIhd9AwFuC6/Ijh1faen0UJDn3F/wLjydts87QKHLSWffAIGgoccTYGtHHy37+rnxrHl86/HVlqTa7RnA5w/yfnsP9eNK6Oy3v2PoCi/qi+6El+CquKXTy7yYO5D+AT9nHTM5bnW2x6a7q6/fx9rWbsvahKLC+HGDXd3p3TPCFwjwtY/PiEuwA4HU7mqGOqssGem+q1EjhyaGGA6HMKYoP3qib6wti9Y3Wr2jC4AplYVMH1/MlYsbKC5wsa/Py4zxpdHuhc7+AWbWlHH1J2dFr8Jh/wn+Z2fP553Wbvb1edk7aC2BO99xgFk/1tlL4xOUzhhbXMDXf/9WdBygfyBAv89vSV79vlB305WLG6gqLbC0cebRk+IS060vhOL+8TPrKXPn0dE7wLVPvm1JeDOqHfzwmbfjTvBXnTKDBy9eyKb2Xsrd+RS6HGzc1cuEMYUETWgTn/KifNtxkAcuWmgZIP/1+cfYjqsMrsJamJ/H9U+9YWnv+qfWxV3d9/Z7eaetC4yDUEFfWN/Wyezq8rjkkGjw2W7PiKH0y1eVFFCY77D8uxTmp14OfaiL/5KR7rsaNXJoYoixpaOXyx9+M/qHcPWSmVxy4jRLN9LVS2Zz1WNvUVHk4saz5/GjZ96PO1H98PS5dHvsT/CR/Rhqy91xXVTXndZIZYITUEWxdbe3aeOLueqUGdz83HuWk7AAp82fyLNrW7j4+Hr6fQHu/ufmaIxBA3f/czM3LJ9DZVE+JQV50ZNtQZ6DORPK+GO46yg27n6fn/97Uj17+vy201/nTixn6byJcSf43/5rKwsmjaHYlUd5cT6dvR78QcPXYq5sbzlnvu171dHrtbTV3tXPv5083TJb7PpljYwtsr43Q7m6DwYNO7v6eb/dE1eeu6wwn4ZBiaHMnceNZ89j466e6NjPtPEllLnjy42/8G4bq7d3Rh83d1I5i2ZWW07SgSD8Y+MuPn/cVMsamI9Or4q2k+yg71AW/yXDFwjYTjRI9a5GjRyaGGIMvnXu9QUQIXpVZwxs3LV/XGAgELDtAtnb6+WDff0HLIOxt89Hr2eA2z53NPv6fJQW5BEwhqJ8Z9xU2euXNVKY57RMV+3y+ClwWq84C5wOdvf6uOvl/Wstjq0bw0UfmUpHny96orroI1MxxjBg4I9vbOP/njTdMtvmqlNmcO8/t1jGQ7bv66ekII9A0P539geDcZsT1Za7+dJHpnLBva9ZBoufe7vFcsKfOMZteycwYdC+1KVuFz/+i3Xjn//3Pxu58az9YxHBoKGq1D65xhbl29LRy96+wJDKhQAU5IE73xm6Owx3obnyHAzenXVrR/zCxisXNzBtXAlTY07Yvb4BPrtwCrGn+s8unEKvbyDlQd+hzNhKRmVxge1EgyVzalJuU40Mmhhi2A3gPfLatuiJsKLIxfdOnc13PzWTYnc+/gC2g6bXnHokj6/cHne1FbsB0Irlc5g2rpirfv8WWzv6uWLxdP53YzvXLG2kMF+4M7yauqI4n319XhC4+Ph6nA44dspYCvIdXPm7t+JOfvdfuDDaFXPx8fWUufPpGwjEnahK3fnc8PQb3HLOgujVO4ROjjc/9x5XLm7gp8++izvfwTVLj6TbM8Bv/7WVm86en3D6aySGyM/+7aRpPLFqe9wg8L+d3MBF9zdb4r7s5Ia4K/fBO7gFDbZJLlJgcHJFEX9d38aRtYXceNY8NrbHXN1XlVBdtr8LpK3Lw+4e+1lTduW5+32hFdWbd/dG26wbV0y/tTeQnZ39tmNE8yaVWxKDQ4T2bl/cBcD4UndKg75Dn7E1dHWVxVyz9Mjo3U+eA65ZeqRuZzoKaGKIMXgA76lVO7jsYw3c/uKG6NqAb/xhFRcfX8/Nf9vAz86eZ3v1bCQ09fXBV7dy8fH1zKoppa2rnyPGFvPvH2+g0JXHXS+9z1dOmsY3PjGLd9u6mTauGHeek45eL/6goavfj2cgQJdH8AcNHb3e6H4Fvzh3ASVup+1JraPXS225m5ZOD05HaKXygQazA0Fj286kiiIuXzSdI2tKueG/1kd3X+vo9dnux9DR6+OpVTssybCuqtB24VpBvvWqt6vf/sr9oYsWWu6SasoKQkkyhitPqCp28cRbO5g+vpSWfX1MKC9gX8zaEXe+g+9+ahb7+gqjg9lFrjyqy8T2zqK6rIBg0Fiuzvv9AZxiaJpSEd0udF+fl35/wDJAnqgLscdjHSPq9QZ4fn1LXDmVusrihG0caOrpup32M7YaqkqYP7nC9jlDUZTvsPzOw7gBUSOIJoYYdgN4kyuKaJpSQXu3ly+Gu0QiXSZlhfnc+sLquKvnu8PF2lo6Pdz9j03c+6Vj+eYfVsWdgIpcTnq8Ae56eRM/C1+JP/p/PkRbl4/v/sk6o6YyPL3Une9gTHE+hflO25NaMBgaQL77H5uYMb4Urz9oe5Lx+kNtlxfmx02vdTqgvDCPp1btYO7EI6MF+Lz+AFUlLtv9GKpKXCydNxGHA248ez5bdvdS4Mzj+qcHDQKHp9TGnvB3J1jt3dHr48az57NxVzeBYKjk90DAuix7IGAIGsOvX9oU2i5VwOsPcvc/N1vu5O7+52ZuPGtedDB7SmUh93/paG48ay4b23tj7iyKqSpx8syaFmbXljF1XKhvP88BXj9cElP+5Iblc8hzEN3Rz+mAE6aPs/13qR0zaFBZAiyebS2ncv2yRpAA1aVFSU893dnpsS2W2NLlIX6Z4NB8sLeXD/Z64u5qJu7tZUqlzkrKpkwvPNTEMIjdAF59VUnc+IM730F7gkHOfX0D0cdcsaiB93d125Zo2NDWzb3/u5UrFzewtaMXz0CQvvAMmsEzau790rHRK+5fPv8eXzlpuu3A4PZ9fTgdsGJZI64864k/NvZydz5XLGqIlo/43evb4u5+/uOMuezr81nWT3x0+jjb+B64aKFlBzZ3voNpVfaDyu09Xu7+YhNd/QM4xUFpkf3sl7LCfP75fge3vRBq94SGSnp98d1i/vBdT2TWl2fAfhzEMxCwJIs9vcG4QoRXnTKDSWOKuPyRNy19+/4A0Sm/kd/jmifW8uBFCy2v9cRbO+L22P7mJ2dS4rL+qQnO6Ak30t51T4bex1SmntaW2xdLrClNfXOi1k6vbYz3X7gw5cSgK6mH71AsPMy5xCAiS4BfAE7gLmPMT7IcEmAdf4iMH+zu9ia4sivg8kXTMQYefHUrAN//9GzLQHFDdQnTq0qYd8QYasrcdPb7ue3FjXQk6Pfu6PXx0MUfIs8BN569gN099vs2LF8wkROmj0MEKooKeH93j20C6fQM8GjzNk6aMY76ccV88xOzoiuFI6+5uaM3rpRHom0z22Pei8hrJNyZrbSAXV1etu/ro88X4OjJ5bYxev0BnLJ/3GLAb2y7xX5zflP0+z6vn+rSAvvprxcutCS5Y+sqorO6Io+7+bn3+PX5x0S/j/TtH+iuJva1fH6DZyBg+bfOE2FnZz/140ujz23vsX8fd/d4U5p6OhCwf2+aUlyAByS88GnvSW3thq6kTo9DsfAwpxKDiDiB24FTgO3A6yLypDHm7exGZh1/aOn08GjzNn50xty4GUTf+MRM1u7ojF7lQujE1jihjNm1ZXF/6JG9pYNBw83nLEg4XXV8SYGlvPP2vb1xs4muXNzApIpCjjqiIjrguLfPPoHcdPZ8vrqogZnVZcyuLee/17XGnQTsSnmUF9pf3Y8tdvGf5yzAFwiysT1UW2r2hBKuW9oYV7aj1+dnfWt3NJ78PKdtjD8+Yy6nL5jIzJoyrnrsLfoH7PveIxVq3fkO2nu8lLjzEp7QYv+YPAP2ZcG9vqDl+13dHkoK7H/v4gLra5159CR+/Jd34h73wIXWE3TVQbZwTXbqaVuaF+ABjE9QZXh8inchupI6PTJR/mSwnEoMwEJgozFmE4CI/A5YDmQ9MdhdxTkEvvenNZYT2oOvbuGG5XMtV883n7OAyWOLo3/sB2r/7ZbOuK6IFcsaKXJZR/3KCwu49fkN3HLOAvxBQ7HLyfa9fUwdV2yZhZLvxHYzIVcelhkrdgu4Yq/WI7Z19HLdaY1x+0UUu5zs7Rvg1Dm1bNvbx0emVeIQ4fE3NkZ3Zit05fFAuDRFbLfTJ2Z/mM9/aErcmoxil5OpVSVMqSxm1hUn0NXvS3AHEor9ax+fwcOvhWZOJbpTiVWd6I6mzGX5fnypm/Zuj33FVqd1AHvwlF0IL1D0WVddd/bbFyPs9KRWe6kyQdXbsTYL8Iaqf8Bvm9j7bVaQD8WhOKGNBpkofzJYriWGicAHMd9vBz40+EEicglwCcDkyZMPTWTEX8UFgya64C02CXykvpJnUliBGnlMmdsZmq7aN0BFUT593gEM1uc31pbx2YVTLAvFfnj6HGZVl1keNxCAlj3d3H/hwmg8r25so35ckSWBzK0tZ8WyOZYpo9OqSvj5Z+bz9Zg6SlWlbhAsxeTy84TyImHOxBrLe7ShbR/nNE0eVCiwkfJC68rZ6jIHNeXW0iI15W5qyhyW931fv4cfnT6X7/15f0nzH50+l3GlDh695DgGAkFOObKaPp837ndZsWwOgyuDP/yvLfz4jLmWEuk/PmMuj/xrC2DdWa/XO0Cxy2mJsdjlpNDl5KdnzePqcNkPu2Tqzncweax1fKCiKD/hnVwqSgqcCRN2qkpc+Tz+xoa4xP7tTx158CfbOBQntNEgE+VPBhNzoOL7h5iIfAb4pDHmy+HvzwcWGmO+mug5TU1Nprm5OdGPMy4ymJauMgR+f5C/rGthQ3iFrUOgYXxJ3F7Okceua+mktdNDTbmbxtryuMf09ftst6Rc2lgdV/bB4/GzpqUzWjdobm05LpfT8vvlO/109gfp8xrauj1Ul7opKhBK3Q6mVI6xtLe5vYcd+3pxOpzs7vEyrqSAQDCA0+Hkwvtej8bz5GUfxu2C1n2BaJs1Y5x4fDCjxtrmvn4P77X2RmOcUVPMmELriWXdjj386Y3tfLxxIu3dHqpK3fxt3Q6WzJ3IF+5+zfLHNHGMC88A0d/PnQ8lBQW091j/Pff1e/jHe3t4L1z11iEwY3wpx88YS1lBQfQ9qilzs25nN1///YH70Te37+P1Ld1xyauprpT6KuvvPBQej5+/v9/OgN9YEvZJ06pwu1O7/vP5Avx59c60bWeqYwzpk47zjoisNMbY7neba4nhw8APjDGfDH//HQBjzH8kek62E0MmDOWEn4yhFoobamzvtO2LSwyzqsfExTi4PIRDYN6kck5uGM+2vX3R/9QTSt28smU3vd5g9KRWXODgo/VVKZ2AfL4Af3m7NXoV78538NOz5vHJWdXs7PZY/piAIf+BDSUpRX7vg7Xp8wVY07KHQMARfR+dziBza8emvIe0XWJPNSnExrl6Z2d0FtG8CeXD2uM63RdSKnUjKTHkAe8Bi4EdwOvA54wx6xI953BMDLkumcQ11BNBuk9A6W4vE0ZCjOrwNWISA4CInAr8J6HpqvcYY350oMdrYlBKqeQdKDHk2uAzxphngGeyHYdSSo1WWvlEKaWUhSYGpZRSFpoYlFJKWWhiUEopZZFzs5KSJSLtwNYUnjoO2J3mcNIt12PM9fhAY0wXjXH4ci2+KcaYKrsfjPjEkCoRaU40VStX5HqMuR4faIzpojEOX67HF0u7kpRSSlloYlBKKWUxmhPDndkOYAhyPcZcjw80xnTRGIcv1+OLGrVjDEoppeyN5jsGpZRSNjQxKKWUshh1iUFElojIuyKyUUS+ne14BhORI0TkRRFZLyLrROTKbMeUiIg4ReRNEXk627HYEZExIvIHEXkn/H5+ONsxxRKRr4X/jdeKyCMikhNbmYnIPSKyS0TWxhwbKyLPiciG8OeKHIvvZ+F/59Ui8icRGZOt+MLxxMUY87NviIgRkXHZiG0oRlViEBEncDvwKeBI4DwRSW2fwszxA183xswGjgMuy8EYI64E1mc7iAP4BfCsMWYWMJ8cilVEJgJXAE3GmDmEysx/NrtRRd0HLBl07NvA88aYBuD58PfZch/x8T0HzDHGzCO0p8t3DnVQg9xHfIyIyBHAKcC2Qx1QMkZVYgAWAhuNMZuMMT7gd8DyLMdkYYxpMca8Ef66m9DJbGJ2o4onIpOATwN3ZTsWOyJSBpwI3A1gjPEZY/ZlNah4eUBheIOqImBnluMBwBjzErBn0OHlwP3hr+8HTj+UMcWyi88Y81djjD/87avApEMemDUeu/cQ4BbgW0BOz/oZbYlhIvBBzPfbycGTboSI1AFHAf/Kcih2/pPQf/DgQR6XLfVAO3BvuLvrLhFJ327pw2SM2QHcROjKsQXoNMb8NbtRHVC1MaYFQhcvwPgsx3MgFwF/yXYQg4nIMmCHMWZVtmM5mNGWGOw2l83JzC0iJcDjwL8bY7qyHU8sEVkK7DLGrMx2LAeQBxwN/MoYcxTQS3a7PyzCffTLganABKBYRL6Q3ahGPhH5HqHu2N9mO5ZYIlIEfA+4NtuxDMVoSwzbgSNivp9Ejty+xxKRfEJJ4bfGmD9mOx4bHwWWicgWQt1xi0TkoeyGFGc7sN0YE7nb+gOhRJErPg5sNsa0G2MGgD8CH8lyTAfSJiK1AOHPu7IcTxwR+SKwFPi8yb0FWtMIXQSsCv/dTALeEJGarEaVwGhLDK8DDSIyVURchAb7nsxyTBYiIoT6xdcbY27Odjx2jDHfMcZMMsbUEXoPXzDG5NTVrjGmFfhARGaGDy0G3s5iSINtA44TkaLwv/licmhw3MaTwBfDX38ReCKLscQRkSXA1cAyY0xftuMZzBizxhgz3hhTF/672Q4cHf5/mnNGVWIID05dDvw3oT/Cx4wx67IbVZyPAucTugp/K/xxaraDGqG+CvxWRFYDC4AfZzec/cJ3Mn8A3gDWEPpbzImSCSLyCPAKMFNEtovIxcBPgFNEZAOhWTU/ybH4bgNKgefCfzN3ZCu+A8Q4YmhJDKWUUhaj6o5BKaXUwWliUEopZaGJQSmllIUmBqWUUhaaGJRSSlloYlDqIESkJwNtLoidhiwiPxCRb6T7dZRKhSYGpbJjAaDrU1RO0sSgVBJE5Jsi8nq47v/14WN14f0efhPeX+GvIlIY/tmx4ce+Et4zYG141f0K4NzwYqxzw80fKSL/IyKbROSKLP2KSmliUGqoROQTQAOh8u0LgGNE5MTwjxuA240xjcA+4Kzw8XuBS40xHwYCECoBTqiY2qPGmAXGmEfDj50FfDLc/nXhmllKHXKaGJQauk+EP94kVMpiFqGEAKGCeG+Fv14J1IV3ESs1xvxv+PjDB2n/v4wxXmPMbkJF6qrTGLtSQ5aX7QCUGkEE+A9jzK8tB0P7ZnhjDgWAQuzLvB/I4Db071Nlhd4xKDV0/w1cFN4rAxGZKCIJN6wxxuwFukXkuPCh2K07uwkVfVMq52hiUGqIwjusPQy8IiJrCFVHPdjJ/WLgThF5hdAdRGf4+IuEBptjB5+VyglaXVWpDBKREmNMT/jrbwO1xpgrsxyWUgekfZhKZdanReQ7hP7WtgJfym44Sh2c3jEopZSy0DEGpZRSFpoYlFJKWWhiUEopZaGJQSmllIUmBqWUUhb/H8hjVCoTroWfAAAAAElFTkSuQmCC", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "sns.scatterplot(x=df_contracts.length,y=df_contracts.annual_salary)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Other ways to measure correlation exist. For example, if you are interested into how one variable will increase (or decrease) as another variable increases (or decreases), the *Spearman’s or Kendall’s rank correlation coefficients* might work well." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Questions\n", - "\n", - "* Try to explore the correlation of other variables in the dataset.\n", - "* Can you think of a possible motivation for the trend we see: older apprentices with a shorter contract getting on average a higher annual salary?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Sampling and uncertainty\n", - "\n", - "Often, we work with samples and we want the sample to be representative of the population it is taken from, in order to draw conclusions that generalise from the sample to the full population.\n", - "\n", - "Sampling is *tricky*. Samples have *variance* (variation between samples from the same population) and *bias* (systematic variation from the population)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Further reading\n", - "\n", - "* For a complementary introduction to statistics and data analysis, see https://www.humanitiesdataanalysis.org/statistics-essentials/notebook.html.\n", - "* Related to statistics and data analysis is the realm of probability theory, which allows us to formally model and calculate the likelihood of events. For an introduction, see https://www.humanitiesdataanalysis.org/intro-probability/notebook.html." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Part 2: working with texts\n", - "\n", - "Let's get some basics (or a refresher) of working with texts in Python. Texts are sequences of discrete symbols (words or, more generically, tokens).\n", - "\n", - "Key challenge: representing text for further processing. Two mainstream approaches:\n", - "* *Bag of words*: a text is a collection of tokens occurring with a certain frequence and assumed independently from each other within the text. The mapping from texts to features is determinsitic and straighforward, each text is represented as a vector of the size of the vocabulary.\n", - "* *Embeddings*: a method is used (typically, neural networks), to learn a mapping from each token to a (usually small) vector representing it. A text can be represented in turn as an aggregation of these embeddings." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Import the dataset\n", - "Let us import the Elon Musk's tweets dataset in memory.\n", - "\n", - "" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [], - "source": [ - "root_folder = \"../data/musk_tweets\"\n", - "df_elon = pd.read_csv(codecs.open(os.path.join(root_folder,\"elonmusk_tweets.csv\"), encoding=\"utf8\"), sep=\",\")\n", - "df_elon['text'] = df_elon['text'].str[1:]" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idcreated_attext
08496368680522752002017-04-05 14:56:29'And so the robots spared humanity ... https:/...
18489887305850961922017-04-03 20:01:01\"@ForIn2020 @waltmossberg @mims @defcon_5 Exac...
28489430724234977282017-04-03 16:59:35'@waltmossberg @mims @defcon_5 Et tu, Walt?'
38489357050572800012017-04-03 16:30:19'Stormy weather in Shortville ...'
48484160495736586242017-04-02 06:05:23\"@DaveLeeBBC @verge Coal is dying due to nat g...
\n", - "
" - ], - "text/plain": [ - " id created_at \\\n", - "0 849636868052275200 2017-04-05 14:56:29 \n", - "1 848988730585096192 2017-04-03 20:01:01 \n", - "2 848943072423497728 2017-04-03 16:59:35 \n", - "3 848935705057280001 2017-04-03 16:30:19 \n", - "4 848416049573658624 2017-04-02 06:05:23 \n", - "\n", - " text \n", - "0 'And so the robots spared humanity ... https:/... \n", - "1 \"@ForIn2020 @waltmossberg @mims @defcon_5 Exac... \n", - "2 '@waltmossberg @mims @defcon_5 Et tu, Walt?' \n", - "3 'Stormy weather in Shortville ...' \n", - "4 \"@DaveLeeBBC @verge Coal is dying due to nat g... " - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_elon.head(5)" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(2819, 3)" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_elon.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Natural Language Processing in Python" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [], - "source": [ - "# import some of the most popular libraries for NLP in Python\n", - "import spacy\n", - "import nltk\n", - "import string\n", - "import sklearn" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "[nltk_data] Downloading package punkt to\n", - "[nltk_data] /Users/giovannicolavizza/nltk_data...\n", - "[nltk_data] Unzipping tokenizers/punkt.zip.\n" - ] - }, - { - "data": { - "text/plain": [ - "True" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "nltk.download('punkt')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A typical NLP pipeline might look like the following:\n", - " \n", - "\n", - "\n", - "### Tokenization: splitting a text into constituent tokens" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [], - "source": [ - "from nltk.tokenize import TweetTokenizer, word_tokenize\n", - "tknzr = TweetTokenizer(preserve_case=True, reduce_len=False, strip_handles=False)" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\"@ForIn2020 @waltmossberg @mims @defcon_5 Exactly. Tesla is absurdly overvalued if based on the past, but that's irr\\xe2\\x80\\xa6 https://t.co/qQcTqkzgMl\"\n" - ] - } - ], - "source": [ - "example_tweet = df_elon.text[1]\n", - "print(example_tweet)" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "['\"', '@ForIn2020', '@waltmossberg', '@mims', '@defcon_5', 'Exactly', '.', 'Tesla', 'is', 'absurdly', 'overvalued', 'if', 'based', 'on', 'the', 'past', ',', 'but', \"that's\", 'irr', '\\\\', 'xe2', '\\\\', 'x80', '\\\\', 'xa6', 'https://t.co/qQcTqkzgMl', '\"']\n", - "['``', '@', 'ForIn2020', '@', 'waltmossberg', '@', 'mims', '@', 'defcon_5', 'Exactly', '.', 'Tesla', 'is', 'absurdly', 'overvalued', 'if', 'based', 'on', 'the', 'past', ',', 'but', 'that', \"'s\", 'irr\\\\xe2\\\\x80\\\\xa6', 'https', ':', '//t.co/qQcTqkzgMl', \"''\"]\n" - ] - } - ], - "source": [ - "tkz1 = tknzr.tokenize(example_tweet)\n", - "print(tkz1)\n", - "tkz2 = word_tokenize(example_tweet)\n", - "print(tkz2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Question: can you spot what the Twitter tokenizer is doing instead of a standard one?" - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "'!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'" - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "string.punctuation" - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, - "outputs": [], - "source": [ - "# some more pre-processing\n", - "\n", - "def filter(tweet):\n", - " \n", - " # remove punctuation and short words and urls\n", - " tweet = [t for t in tweet if t not in string.punctuation and len(t) > 3 and not t.startswith(\"http\")]\n", - " return tweet\n", - "\n", - "def tokenize_and_string(tweet):\n", - " \n", - " tkz = tknzr.tokenize(tweet)\n", - " \n", - " tkz = filter(tkz)\n", - " \n", - " return \" \".join(tkz)" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "['\"', '@ForIn2020', '@waltmossberg', '@mims', '@defcon_5', 'Exactly', '.', 'Tesla', 'is', 'absurdly', 'overvalued', 'if', 'based', 'on', 'the', 'past', ',', 'but', \"that's\", 'irr', '\\\\', 'xe2', '\\\\', 'x80', '\\\\', 'xa6', 'https://t.co/qQcTqkzgMl', '\"']\n", - "['@ForIn2020', '@waltmossberg', '@mims', '@defcon_5', 'Exactly', 'Tesla', 'absurdly', 'overvalued', 'based', 'past', \"that's\"]\n" - ] - } - ], - "source": [ - "print(tkz1)\n", - "print(filter(tkz1))" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [], - "source": [ - "df_elon[\"clean_text\"] = df_elon[\"text\"].apply(tokenize_and_string)" - ] - }, - { - "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idcreated_attextclean_text
08496368680522752002017-04-05 14:56:29'And so the robots spared humanity ... https:/...robots spared humanity
18489887305850961922017-04-03 20:01:01\"@ForIn2020 @waltmossberg @mims @defcon_5 Exac...@ForIn2020 @waltmossberg @mims @defcon_5 Exact...
28489430724234977282017-04-03 16:59:35'@waltmossberg @mims @defcon_5 Et tu, Walt?'@waltmossberg @mims @defcon_5 Walt
38489357050572800012017-04-03 16:30:19'Stormy weather in Shortville ...'Stormy weather Shortville
48484160495736586242017-04-02 06:05:23\"@DaveLeeBBC @verge Coal is dying due to nat g...@DaveLeeBBC @verge Coal dying fracking It's ba...
\n", - "
" - ], - "text/plain": [ - " id created_at \\\n", - "0 849636868052275200 2017-04-05 14:56:29 \n", - "1 848988730585096192 2017-04-03 20:01:01 \n", - "2 848943072423497728 2017-04-03 16:59:35 \n", - "3 848935705057280001 2017-04-03 16:30:19 \n", - "4 848416049573658624 2017-04-02 06:05:23 \n", - "\n", - " text \\\n", - "0 'And so the robots spared humanity ... https:/... \n", - "1 \"@ForIn2020 @waltmossberg @mims @defcon_5 Exac... \n", - "2 '@waltmossberg @mims @defcon_5 Et tu, Walt?' \n", - "3 'Stormy weather in Shortville ...' \n", - "4 \"@DaveLeeBBC @verge Coal is dying due to nat g... \n", - "\n", - " clean_text \n", - "0 robots spared humanity \n", - "1 @ForIn2020 @waltmossberg @mims @defcon_5 Exact... \n", - "2 @waltmossberg @mims @defcon_5 Walt \n", - "3 Stormy weather Shortville \n", - "4 @DaveLeeBBC @verge Coal dying fracking It's ba... " - ] - }, - "execution_count": 44, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_elon.head(5)" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [], - "source": [ - "# save cleaned up version\n", - "\n", - "df_elon.to_csv(os.path.join(root_folder,\"df_elon.csv\"), index=False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Building a dictionary" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(2819, 7864)" - ] - }, - "execution_count": 46, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from sklearn.feature_extraction.text import CountVectorizer\n", - "count_vect = CountVectorizer(lowercase=False, tokenizer=tknzr.tokenize)\n", - "X_count = count_vect.fit_transform(df_elon.clean_text)\n", - "X_count.shape" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "6617" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "word_list = count_vect.get_feature_names_out() \n", - "count_list = X_count.toarray().sum(axis=0)\n", - "dictionary = dict(zip(word_list,count_list))\n", - "count_vect.vocabulary_.get(\"robots\")" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3" - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "X_count[:,count_vect.vocabulary_.get(\"robots\")].toarray().sum()" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3" - ] - }, - "execution_count": 50, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dictionary[\"robots\"]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Questions\n", - "\n", - "* Find the tokens most used by Elon.\n", - "* Find the twitter users most referred to by Elon (hint: use the @ handler to spot them)." - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('Tesla', 322),\n", - " ('Model', 236),\n", - " ('that', 223),\n", - " ('will', 218),\n", - " ('with', 177),\n", - " ('@SpaceX', 169),\n", - " ('from', 163),\n", - " ('this', 159),\n", - " ('@TeslaMotors', 149),\n", - " ('launch', 124)]" - ] - }, - "execution_count": 51, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dictionary_list = sorted(dictionary.items(), key=lambda x:x[1], reverse=True)\n", - "[d for d in dictionary_list][:10]" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('@SpaceX', 169),\n", - " ('@TeslaMotors', 149),\n", - " ('@elonmusk', 85),\n", - " ('@NASA', 48),\n", - " ('@Space_Station', 19),\n", - " ('@FredericLambert', 17),\n", - " ('@ID_AA_Carmack', 15),\n", - " ('@WIRED', 14),\n", - " ('@vicentes', 14),\n", - " ('@BadAstronomer', 11)]" - ] - }, - "execution_count": 52, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "dictionary_list_users = sorted(dictionary.items(), key=lambda x:x[1], reverse=True)\n", - "[d for d in dictionary_list if d[0].startswith('@')][:10]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Representing tweets as vectors\n", - "\n", - "Texts are of variable length and need to be represented numerically in some way. Most typically, we represent them as *equally-sized vectors*.\n", - "\n", - "Actually, this is what we have already done! Let's take a closer look at `X_count` above.." - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "id 849636868052275200\n", - "created_at 2017-04-05 14:56:29\n", - "text 'And so the robots spared humanity ... https:/...\n", - "clean_text robots spared humanity\n", - "Name: 0, dtype: object" - ] - }, - "execution_count": 53, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# This is the first Tweet of the data frame\n", - "\n", - "df_elon.loc[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [], - "source": [ - "# let's get the vector representation for this Tweet\n", - "\n", - "vector_representation = X_count[0,:]" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3" - ] - }, - "execution_count": 55, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# there are 3 positions not to zero, as we would expect: the vector contains 1 in the columns related to the 3 words that make up the Tweet. \n", - "# It would contain a number higher than 1 if a given word were occurring multiple times.\n", - "\n", - "np.sum(vector_representation)" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "1\n", - "1\n", - "1\n" - ] - } - ], - "source": [ - "# Let's check that indeed the vector contains 1s for the right words\n", - "# Remember, the vector has shape (1 x size of the vocabulary)\n", - "\n", - "print(vector_representation[0,count_vect.vocabulary_.get(\"robots\")])\n", - "print(vector_representation[0,count_vect.vocabulary_.get(\"spared\")])\n", - "print(vector_representation[0,count_vect.vocabulary_.get(\"humanity\")])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Term Frequency - Inverse Document Frequency\n", - "We can use boolean counts (1/0) and raw counts (as we did before) to represent a Tweet over the space of the vocabulary, but there exist improvements on this basic idea. For example, the TF-IDF weighting scheme:\n", - "\n", - "$tfidf(t, d, D) = tf(t, d) \\cdot idf(t, D)$\n", - "\n", - "$tf(t, d) = f_{t,d}$\n", - "\n", - "$idf(t, D) = log \\Big( \\frac{|D|}{|{d \\in D: t \\in d}|} \\Big)$" - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(2819, 7864)" - ] - }, - "execution_count": 57, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from sklearn.feature_extraction.text import TfidfVectorizer\n", - "count_vect = TfidfVectorizer(lowercase=False, tokenizer=tknzr.tokenize)\n", - "X_count_tfidf = count_vect.fit_transform(df_elon.clean_text)\n", - "X_count_tfidf.shape" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "1.7226760995112569" - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "X_count_tfidf[0,:].sum()" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "3" - ] - }, - "execution_count": 59, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "X_count[0,:].sum()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Sparse vectors (mention)\n", - "How is Python representing these vectors in memory? Most of their cells are set to zero. \n", - "\n", - "We call any vector or matrix whose cells are mostly to zero *sparse*.\n", - "There are efficient ways to store them in memory." - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "<1x7864 sparse matrix of type ''\n", - "\twith 3 stored elements in Compressed Sparse Row format>" - ] - }, - "execution_count": 60, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "X_count_tfidf[0,:]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Spacy pipelines\n", - "\n", - "Useful to construct sequences of pre-processing steps: https://spacy.io/usage/processing-pipelines." - ] - }, - { - "cell_type": "code", - "execution_count": 65, - "metadata": {}, - "outputs": [], - "source": [ - "# Load a pre-trained pipeline (Web Small): https://spacy.io/usage/models\n", - "\n", - "#!python -m spacy download en_core_web_sm\n", - "nlp = spacy.load('en_core_web_sm')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "*.. the model’s meta.json tells spaCy to use the language \"en\" and the pipeline [\"tagger\", \"parser\", \"ner\"]. spaCy will then initialize spacy.lang.en.English, and create each pipeline component and add it to the processing pipeline. It’ll then load in the model’s data from its data directory and return the modified Language class for you to use as the nlp object.*\n", - "\n", - "Let's create a simple pipeline that does **lemmatization**, **part of speech tagging** and **named entity recognition** using spaCy models.\n", - "\n", - "*If you don't know what these NLP tasks are, please ask!*" - ] - }, - { - "cell_type": "code", - "execution_count": 66, - "metadata": {}, - "outputs": [], - "source": [ - "tweet_pos = list()\n", - "tweet_ner = list()\n", - "tweet_lemmas = list()\n", - "\n", - "for tweet in df_elon.text.values:\n", - " spacy_tweet = nlp(tweet)\n", - " \n", - " local_tweet_pos = list()\n", - " local_tweet_ner = list()\n", - " local_tweet_lemmas = list()\n", - " \n", - " for sentence in list(spacy_tweet.sents):\n", - " # --- lemmatization, remove punctuation and stop wors\n", - " local_tweet_lemmas.extend([token.lemma_ for token in sentence if not token.is_punct | token.is_stop])\n", - " local_tweet_pos.extend([token.pos_ for token in sentence if not token.is_punct | token.is_stop])\n", - " for ent in spacy_tweet.ents:\n", - " local_tweet_ner.append(ent)\n", - "\n", - " tweet_pos.append(local_tweet_pos)\n", - " tweet_ner.append(local_tweet_ner)\n", - " tweet_lemmas.append(local_tweet_lemmas)" - ] - }, - { - "cell_type": "code", - "execution_count": 67, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['robot', 'spare', 'humanity', 'https://t.co/v7JUJQWfCv']" - ] - }, - "execution_count": 67, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "tweet_lemmas[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 68, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "['NOUN', 'VERB', 'NOUN', 'NOUN']" - ] - }, - "execution_count": 68, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "tweet_pos[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 69, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[https://t.co/v7JUJQWfCv]" - ] - }, - "execution_count": 69, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "tweet_ner[0]" - ] - }, - { - "cell_type": "code", - "execution_count": 70, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[Shortville]" - ] - }, - "execution_count": 70, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# but it actually works!\n", - "\n", - "tweet_ner[3]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "*Note: we are really just scratching the surface of spaCy, but it is worth knowing it's there.*" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Searching tweets\n", - "\n", - "Once we have represented Tweets as vectors, we can easily find similar ones using basic operations such as filtering." - ] - }, - { - "cell_type": "code", - "execution_count": 71, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "robots spared humanity\n" - ] - } - ], - "source": [ - "target = 0\n", - "print(df_elon.clean_text[target])" - ] - }, - { - "cell_type": "code", - "execution_count": 72, - "metadata": {}, - "outputs": [], - "source": [ - "condition = X_count_tfidf[target,:] > 0" - ] - }, - { - "cell_type": "code", - "execution_count": 73, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " (0, 5198)\tTrue\n", - " (0, 6617)\tTrue\n", - " (0, 6949)\tTrue\n" - ] - } - ], - "source": [ - "print(condition)" - ] - }, - { - "cell_type": "code", - "execution_count": 74, - "metadata": {}, - "outputs": [], - "source": [ - "X_filtered = X_count_tfidf[:,np.ravel(condition.toarray())]" - ] - }, - { - "cell_type": "code", - "execution_count": 75, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "<2819x3 sparse matrix of type ''\n", - "\twith 16 stored elements in Compressed Sparse Row format>" - ] - }, - "execution_count": 75, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "X_filtered" - ] - }, - { - "cell_type": "code", - "execution_count": 76, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " (0, 0)\t0.495283407359234\n", - " (0, 2)\t0.6406029997190412\n", - " (0, 1)\t0.5867896924329815\n", - " (217, 0)\t0.2972381925908634\n", - " (271, 0)\t0.3284547085372313\n", - " (464, 0)\t0.2273880239746895\n", - " (473, 0)\t0.5667220639589731\n", - " (734, 1)\t0.3846355279044392\n", - " (940, 0)\t0.27312597149485407\n", - " (1004, 0)\t0.28161575586607157\n", - " (1550, 1)\t0.33303254164524276\n", - " (1862, 0)\t0.3196675199194523\n", - " (2493, 0)\t0.2685018991334563\n", - " (2559, 0)\t0.31145247014227906\n", - " (2565, 0)\t0.2645117238497897\n", - " (2661, 0)\t0.2729016388865858\n" - ] - } - ], - "source": [ - "print(X_filtered)" - ] - }, - { - "cell_type": "code", - "execution_count": 77, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(array([ 0, 217, 271, 464, 473, 940, 1004, 1862, 2493, 2559, 2565,\n", - " 2661, 0, 734, 1550, 0], dtype=int32),\n", - " array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 2], dtype=int32),\n", - " array([0.49528341, 0.29723819, 0.32845471, 0.22738802, 0.56672206,\n", - " 0.27312597, 0.28161576, 0.31966752, 0.2685019 , 0.31145247,\n", - " 0.26451172, 0.27290164, 0.58678969, 0.38463553, 0.33303254,\n", - " 0.640603 ]))" - ] - }, - "execution_count": 77, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from scipy import sparse\n", - "\n", - "sparse.find(X_filtered)" - ] - }, - { - "cell_type": "code", - "execution_count": 78, - "metadata": {}, - "outputs": [], - "source": [ - "tweet_indices = list(sparse.find(X_filtered)[0])" - ] - }, - { - "cell_type": "code", - "execution_count": 79, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "TARGET: robots spared humanity\n", - "1)@JustBe74 important make humanity proud this case particular duty owed American taxpayer\n", - "2)@pud Faith restored humanity French toast money\n", - "3)humanity have exciting inspiring future cannot confined Earth forever @love_to_dream #APSpaceChat\n", - "4)@ShireeshAgrawal like humanity\n", - "5)Creating neural lace thing that really matters humanity achieve symbiosis with machines\n", - "6)@tzepr Certainly agree that first foremost triumph humanity cheering good spirit\n", - "7)@ReesAndersen @FLIxrisk believe that critical ensure good future humanity\n", - "8)@NASA #Mars hard x99s worth risks extend humanity x99s frontier beyond Earth Learn about neighbor planet\n", - "9)Astronomer Royal Martin Rees soon will robots take over world @Telegraph\n", - "10)@thelogicbox @IanrossWins Mars critical long-term survival humanity life Earth know\n", - "11)humanity wishes become multi-planet species then must figure move millions people Mars\n", - "12)Sure feels weird find myself defending robots\n", - "13)Neil Armstrong hero humanity spirit will carry stars\n" - ] - } - ], - "source": [ - "print(\"TARGET: \" + df_elon.clean_text[target])\n", - "\n", - "for n, tweet_index in enumerate(list(set(tweet_indices))):\n", - " if tweet_index != target:\n", - " print(str(n) +\")\"+ df_elon.clean_text[tweet_index])" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Questions\n", - "\n", - "* Can you rank the matched tweets using their tf-idf weights, so to put higher weighted tweets first?\n", - "* Which limitations do you think a bag of words representation has?\n", - "* Can you spot any limitations of this approach based on similarity measures over bag of words representations?" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "---" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 (ipykernel)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.0" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -}