slides/class_4/class_4.Rmd

---
title: "Análisis de Datos Categóricos (SOC3070)"
subtitle: "Clase #4"
author: "<br> Mauricio Bucca<br> Profesor Asistente, Sociología UC"
date: "[github.com/mebucca](https://github.com/mebucca)"
output:
  xaringan::moon_reader:
    lib_dir: libs
    css: ["default","default-fonts","gentle-r.css"]
    df_print: default
    nature:
      ratio: 16:9
      highlightStyle: github
      highlightLines: true
      countIncrementalSlides: true
      slideNumberFormat: "%current%"
editor_options: 
  chunk_output_type: console
---

class: inverse, center, middle

#Valor Esperado y Varianza de variables discretas

```{r, echo=FALSE, message=FALSE}
library("tidyverse")
library("viridis")
```
---
## Valor Esperado

El valor esperado de una variable es el análogo teórico de un promedio. Los posibles valores de la variable se ponderan por su probabilidad de ocurrencia. En el caso de variables discretas:

<br>
--

\begin{align}
\mathbb{E}(X) &= \sum_{i} x_{i} \times \mathbb{P}(X=x_{i}) \\
&\equiv  \sum_{i} x_{i} \times f_{X}(x_{i})
\end{align}

<br>
--
Es teórico porque esta información la podemos saber *a priori*, sin necesidad de datos. 

--

Análogamente, para variables continuas:

\begin{align}
\mathbb{E}(X) =  \int x f(x)dx
\end{align}

---
## Valor Esperado

Por ejemplo, supongamos que $Y$ es una variable que resulta de tirar un dado "justo".  ¿Cuál es el valor esperado de $Y$?

<br>
--

\begin{align}
\mathbb{E}(Y) &= \sum_{i}y_{i} \times \mathbb{P}(Y=y_{i})  \\ \\
     &=  1 \times  \frac{1}{6}+ 2 \times \frac{1}{6} + \dots + 6 \times \frac{1}{6}  \\ \\
     &= 3.5
\end{align}


---
## Valor Esperado, algunas propiedades útiles  

1) El valor esperado de una constante es una constante.

$$\mathbb{E}(c)=c$$

--

2) Si $X$ es una variable aleatoria y $c$ una constante, entonces 

$$\mathbb{E}(c X)= c \mathbb{E}(X)$$
--

3) Si $X$ e $Y$ son variables aleatorias (sin importar si $X \bot Y$ o no), entonces

$$\mathbb{E}(X + Y)=  \mathbb{E}(X) + \mathbb{E}(Y)$$
--

4) Si $X$ es una variable aleatoria y $g(.)$ es una función lineal, entonces

$$\mathbb{E}(g(X))=  g(\mathbb{E}(X))$$

--

5) Si $X$ e $Y$ son variables aleatorias independientes, entonces

$$\mathbb{E}(XY) = \mathbb{E}(X)\mathbb{E}(Y)$$
---

## Valor Esperado, ejemplo

Por ejemplo, supongamos que $X_{i}$ es la variable que resulta de tirar un dado "justo". Participamos de un concurso que consiste en tirar el mismo dado 100 veces. El premio (G) es $ $1000$ de base, más el resultado de cada dado $i$ multiplicado por 10.
--

 ¿Cuánto es el premio esperado?

--

$$G = 1000 + \sum^{n=100}_{i=1} X_{i} \times 10 \quad \text{ por tanto,}$$
--

$$\mathbb{E}(G) = \mathbb{E}(1000 + \sum^{n=100}_{i=1} X_{i} \times 10)$$

--

$$\mathbb{E}(G) = 1000 + 10 \times \sum^{n=100}_{i=1}\mathbb{E}(X_{i})$$

--

$$\mathbb{E}(G) = 1000 + 10 (3.5 + 3.5 + \dots + 3.5)) = 1000 + 10 \times 100 \times 3.5 = \$4500$$ 

---

## Valor Esperado de variables discretas

###  Bernoulli

Si X es una variable Bernoulli, su valor esperado viene dado por:

\begin{align}
\mathbb{E}(X) = \sum_{i} x_{i} \times \mathbb{P}(X=x_{i}) &= \sum_{i} x_{i} \times p^{x_{i}}(1-p)^{1 - x_{i}} \\ 
     &= 1 \times p + 0 \times (1-p) \\ 
     &= p
\end{align}

--

### Binomial

Si X es una variable Binomial, su valor esperado viene dado por:

\begin{align}
\mathbb{E}(X) = np
\end{align}

--
.bold[Pregunta]: ¿Cuántas "Caras" debo esperar si tiro 200 monedas "justas"?

--

.bold[Respuesta]: $np = 200 \times 0.5 = 100$


---
## Varianza 

La varianza de una variable aleatoria es el análogo teórico de la varianza de los datos.
--
 Mide cuánta dispersión existe en torno al centro (la media). Formalmente, en el caso de variables aleatorias discretas:

<br>

$$\mathbb{Var}(X) = \sum_{i} \bigg( x_{i} - \mathbb{E}(X) \bigg)^{2} \times f_{X}(x_{i})$$
<br>
--

Análogamente, para variables continuas:

\begin{align}
\mathbb{Var}(X) =  \int \bigg(x -  \mathbb{E}(X)\bigg)^{2} f(x)dx
\end{align}

---
## Varianza
Por ejemplo, si $Y$ es una variable que resulta de tirar un dado "justo", ¿cuál es la varianza de $Y$?

<br>
--

\begin{align}
\mathbb{Var}(Y) &= \sum_{i} \bigg( y_{i} - \mathbb{E}(Y) \bigg)^{2} \times f_{Y}(y_{i})   \\ \\
     &=  (1 - 3.5)^{2} \times  \frac{1}{6} + (2-3.5)^{2} \times \frac{1}{6} + \dots + (6-3.5)^{2} \times \frac{1}{6}  \\ \\
     &= 2.91
\end{align}


---
## Varianza, algunas propiedades útiles  

1) La varianza de una constante es cero.

$$\mathbb{Var}(c)=0$$
--

2) Si $X$ es una variable aleatoria y $c$ una constante, entonces 

$$\mathbb{Var}(c X)= c^{2} \mathbb{Var}(X)$$
--

3) Si $X$ e $Y$ son dos variables aleatorias independientes, entonces

$$\mathbb{Var}(X \pm Y) =  \mathbb{Var}(X) + \mathbb{Var}(Y)$$
--

4) Si $X$ e $Y$ son dos variables aleatorias no independientes, entonces

$$\mathbb{Var}(X + Y) =  \mathbb{Var}(X) + \mathbb{Var}(Y) + \mathbb{Cov}(X,Y)$$
<br>
--

 - Def: $\mathbb{Cov}(X,Y) = \mathbb{E}(X,Y) -  \mathbb{E}(X)\mathbb{E}(Y)$

---
## Varianza, ejemplo  

Por ejemplo, supongamos que $X_{i}$ es la variable que resulta de tirar un dado "justo". Participamos de un concurso que consiste en tirar el mismo dado 100 veces. El premio (G) es $ $1000$ de base, más el resultado de cada dado $i$ multiplicado por 10.
--
 ¿Cuánto es la desviación estándar del premio?

--

$$G = 1000 + \sum^{n=100}_{i=1} X_{i} \times 10 \quad \text{ por tanto,}$$

--

$$\mathbb{Var}(G) = \mathbb{Var}(1000 + \sum^{n=100}_{i=1} X_{i} \times 10)$$

--

$$\mathbb{Var}(G) = \mathbb{Var}(1000) + 10^{2} \times \sum^{n=100}_{i=1}\mathbb{Var}(X_{i})$$

--

$$\mathbb{Var}(G) =  0 +  10^{2} \times 100 \times 2.91 = \$29100$$ 

--

$$\sigma_{G} = \sqrt{0 + 100 \times 100 \times 2.91} = \$ 170.6$$
---
## Varianza de variables discretas

### Bernoulli

Si X es una variable Bernoulli, su varianza viene dada por:

\begin{align}
\mathbb{Var}(X) &= \sum_{i} \bigg( x_{i} - \mathbb{E}(X) \bigg)^{2} \times f_{X}(x_{i})  \\ \\
 &= (1 - \mathbb{E}(X))^{2} \times \mathbb{P}(X=1) + (0 - \mathbb{E}(X))^{2} \times \mathbb{P}(X=0) \\ \\
 &= (1 - p)^{2} \times p +  (0 - p)^{2} \times (1-p) \\ \\
 &=p^{2} − p^{3} + p − 2p^{2} + p^{3} \\ \\
 &=p(1-p)
\end{align}

---
## Varianza de variables discretas

### Binomial

Si X es una variable Binomial, su varianza viene dada por:

\begin{align}
\mathbb{Var}(X) = n \times p(1-p)
\end{align}

<br>
--
.bold[Pregunta]: ¿Cuánta variabilidad debo esperar en el número de "Caras" si tiro 200 monedas "justas"?

--

.bold[Respuesta]: varianza es $n \times p(1-p) = 200 \times 0.5 \times 0.5 = 50$. La desviación estándar es $\sqrt{50} = 7.01$.

---
## Varianza de variables Binomial

.bold[Ilustración via Monte Carlo simulation]

```{r}

# Repeat experiment of tossing 200 coins 10000 times
coins200 <- rbinom(10000, size=200, p=0.5)
glimpse(coins200)
moments = list(mean=mean(coins200), var=var(coins200))
print(moments)
```

---
## Varianza de variables Binomial


```{r, echo=FALSE, fig.height=8, fig.width=12, message=FALSE}
library("cowplot")
theme_set(theme_cowplot())

plot <- ggplot(data = data.frame(p = 0), mapping = aes(x = p, color=""))
var_binom <- function(p,n) n*p*(1-p)

plot + stat_function(fun =var_binom, args = list(n= 1000), size=1.5) + 
  xlim(0,1) + labs(title="Varianza Binomial(n,p)", x="p", y=expression(paste(n,p,(1-p))) ) +
    scale_color_viridis_d() + 
    guides(fill=FALSE, color=FALSE) +
    theme(axis.text.y = element_text(size = 10), axis.text.x = element_text(size = 16),
    axis.title.y = element_text(size = 24), axis.title.x = element_text(size = 24), 
    legend.text = element_text(size = 12), legend.position="bottom") +
    geom_point(aes(x=0.5,y=1000/4), color="green", size=2.5) +
   annotate(geom="text", x=0.5, y=1040/4, label='bold("n/4")', color="black", parse=TRUE, size=8) 
```


---
## Distribución Binomial es asintóticamente Normal

.pull-left[


- Si $X \sim \text{Binomial}(n,p)$, entonces:

$$ X \xrightarrow[]{d} \mathcal{N}(\mu=np, \sigma= \sqrt{np(1-p)})$$
(cuando $n \to \infty$ )


- guideline: $\min \{np,  n(1-p) \} \geq 5$ 

- Resultado muy conveniente


]

.pull-right[

```{r, echo=FALSE, message=FALSE, warning=FALSE, fig.width=8}
library("tidyverse")

n = 25
p = 0.4

mydata <- data_frame(x = seq(from = 0, to = n, by = 1),
                    norm = dnorm(x,mean=n*p,sd=sqrt(n*p*(1-p))),
                    binom = dbinom(x,size=n,prob = p))

plot <- ggplot(data = mydata, mapping = aes(x = x)) +
    ## Entire curve
    geom_path(aes(y=binom,color="red"), size=1.5, alpha=0.8) +
    geom_path(aes(y=norm,color="blue"), size=1.5, alpha=0.7) +
  labs(y="f(y)", x="y", title="Probability function of Y | n=25, p=0.4") +
     scale_color_viridis_d(name = "",
                          breaks = c("red", "blue"),
                          labels = c("Binomial(n,p)", "N(mean=np,var=np(1-p))"),
                          guide = "legend") +
      theme(axis.text.y = element_text(size = 22), axis.text.x = element_text(size = 22),
      axis.title.y = element_text(size = 24), axis.title.x = element_text(size = 24), 
      legend.text = element_text(size = 18), legend.position="bottom") 
print(plot)
```
]

---
## Distribución Binomial es asintóticamente Normal

.pull-left[


- Si $X \sim \text{Binomial}(n,p)$, entonces:

$$\frac{X - np}{\sqrt{np(1-p)}}  \xrightarrow[]{d} \mathcal{N}(\mu=0, \sigma= 1)$$

(cuando $n \to \infty$ )

<br>

- Distribución "Standard Normal"

- La base de casi todas las aproximaciones para inferencia estadística

]

.pull-right[

```{r, echo=FALSE, message=FALSE, warning=FALSE, fig.width=8}
library("tidyverse")

sims = 5000
n = 1000
p = 0.5

mydata <- data_frame(
              binom = ((rbinom(sims,size=n,prob=p) - n*p)/(sqrt(n*p*(1-p)))), 
              normal = rnorm(sims)
                    )

data_bin  <- with(density(mydata$binom), data.frame(x, y))
data_norm <- with(density(mydata$normal), data.frame(x, y))

plot <- data_bin %>% ggplot(aes(x=x)) +
    geom_line(aes(y=y, color="red"), size=1.5, alpha=0.8) +
    geom_line(data =data_norm, aes(y=y, color="blue"), size=1.5, alpha=0.7) +
    xlim(-4,4) +
    labs(y="f(y)", x="y", title="Probability function of Y") +
     scale_color_viridis_d(name = "",
                          breaks = c("red", "blue"),
                          labels = c("(Binomial(np) - np)/sqrt(np(1-p))", "N(0,1)"),
                          guide = "legend") +
      theme(axis.text.y = element_text(size = 22), axis.text.x = element_text(size = 22),
      axis.title.y = element_text(size = 24), axis.title.x = element_text(size = 24), 
      legend.text = element_text(size = 16), legend.position="bottom") 


print(plot)
```
]

---
class: inverse, center, middle

#Estimación
##Maximum Likelihood Estimation (MLE)

---

##Estimación 

.bold[Modelos de probabilidad]: ¿Cuál es la probabilidad de observar los *datos* dado los *parámetros* que conocemos?

Ej. ¿Cuán probable es que obtengamos 9 "Caras" (1) si lanzamos una moneda "justa" ( $p=0.5$ ) 10 veces? 

```{r}
dbinom(x=9,size=10,prob=0.5)
```

---
##Estimación 

.bold[Modelos estadísticos]:  ¿Cuáles son los valores más .bold[plausibles][1].footnote[[1] Notar que no dice "más probables"!] de los *parámetros* dado los *datos* que observamos? 


Ej. Supongamos que alguien lanza 100 veces la misma moneda y registra los resultados en una base de datos. Los datos se ven así:  

.pull-left[
```{r, echo=FALSE, fig.height=5, fig.width=6, message=FALSE}
library("tidyverse")
set.seed(481)
data_coins <- data.frame(X = rbinom(n=100, size=1, prob=0.8))

data_coins %>% ggplot(aes(x=factor(X), fill="")) + 
    geom_bar() +
    geom_text(aes(label=..count..), stat='count', vjust=-0.2) +
    scale_fill_viridis_d() + 
    guides(fill=FALSE, color=FALSE) + labs(x="") +
    theme(axis.text.y = element_text(size = 22), axis.text.x = element_text(size = 22),
    axis.title.y = element_text(size = 24), axis.title.x = element_text(size = 24), 
    legend.text = element_text(size = 18), legend.position="bottom") 
```
]

--

.pull-right[

- Lo que vemos en la izquierda son .bold[datos]

- Datos: realización de $n$ variables aleatorias 

- Normalmente *no conocemos* la distribución de las variables

- Datos nos dan una pista sobre cuál podría ser esa distribución

- .bold[Estadística]: aprender de los datos para .bold[*estimar*] los parámetros que los generan

]

---
##Estimación via Maximum Likelihood (MLE) 

Previamente lanzamos la misma moneda 100 veces y obtuvimos "Cara" (1) 82 veces.
--
 ¿Qué valor de $p$ es más plausible ("likely") que genere estos datos?

MLE es justamente la formalización de esta pregunta. Pasos:

--

1) Decidir sobre la distribución subyacente que genera los datos. En este caso, podemos asumir que: 

  * Cada lanzamiento $X_{1}, X_{2}, \dots X_{100} \sim \text{Bernoulli}(p)$, donde X's son $iid$ 

--

2)  Escribir una función que cuantifique la plausibilidad de diferentes valores del parámetro. Dicha función se denomina .bold[likelihood function]: 

<br>
  * $\mathcal{L}(p \mid \text{ Datos}) = \mathbb{P}(\text{ Datos : \{1,0,1,1,....0,1\}} | \text{ } p)$

<br>
--

  * $\mathcal{L}(p \mid \text{ Datos}) = \mathbb{P}(x_{1})\mathbb{P}(x_{2}) \dots \mathbb{P}(x_{100}) = p^{82}(1-p)^{18}$


---
##Estimación via Maximum Likelihood (MLE) 

Podemos inspeccionar visualmente la "likelihood" de diferentes valores $p$.

```{r, echo=FALSE, fig.height=5, fig.width=9, message=FALSE}
plot <- ggplot(data = data.frame(p = 0), mapping = aes(x = p, color=""))
binom_distrib <- function(p,n,k) (p^(k))*((1-p)^(n-k))

plot + stat_function(fun = binom_distrib, args = list(n= 100, k= 82), size=1.5) + 
  xlim(0,1) + labs(title="Likelihood of p", x="p", y=expression(paste(p^{82}, (1-p)^{18})) ) +
    scale_color_viridis_d() + 
    guides(fill=FALSE, color=FALSE) +
    theme(axis.text.y = element_text(size = 10), axis.text.x = element_text(size = 16),
    axis.title.y = element_text(size = 24), axis.title.x = element_text(size = 24), 
    legend.text = element_text(size = 12), legend.position="bottom") 
```

Intuitivamente: habiendo obtenido 82 caras, $p=0.82$ es el valor más plausible de $p$


---

##Estimación via Maximum Likelihood (MLE) 

3) Encontrar matemáticamente el valor de $p$ que maximiza $\mathcal{L}(p \mid \text{ Datos})$.


- $\mathcal{L}(p \mid \text{ Datos}) = \mathbb{P}(x_{1})\mathbb{P}(x_{2}) \dots \mathbb{P}(x_{n}) =\prod_{i=1}^{n} f(x_{i}) =  p^{k}(1-p)^{n-k} \quad \text{   donde  } k= \sum x_{i}$

--

- Para facilitar el cálculo tomamos logaritmo natural en ambos lados (.bold[log-likelihood])

  - $\ell\ell(p) = \ln \mathcal{L}(p \mid \text{ Datos})  = k \ln(p) + (n - k) \ln(1-p)$ 

--
-  Calcular la primera* derivada de $\ell\ell(p)$ con respecto a $p$: pendiente de la curva en cada valor de $p$.

  - $\ell\ell^{\text{ '}}(p) = \frac{k}{p} -  \frac{n-k}{1-p}$

--

- Encontrar el máximo de la función $\ell\ell(p)$: valor de $p$ en el cual la curva no cambia, es decir cuando $\ell\ell^{\text{ '}}(p)=0$ 

  - $\frac{k}{p} -  \frac{n-k}{1-p} = 0$
  
--

- Después de resolver por $p$ obtenemos:
  
   $$p = \frac{k}{n} = \frac{\sum x_{i}}{n}$$


---
##Estimación via Maximum Likelihood (MLE) 

<br>

- El estimador ML de $p$ es ....


- $\hat{p} = \frac{\sum x_{i}}{n}$


- Es decir, el porcentaje de 1's en la muestra!

--

- Intuitivo y elegante


---

.bold["Optimización" numérica en R]

```{r, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE}

# log-likelihood function
ll <- function(p,n,k) {
  ell = k * log(p) + (n - k)*log(1-p)
  return(ll = ell)
}


# Evaluate the log-likelihood function for some arbitrary values
ll(p=0.1,n=100,k=82); ll(p=0.7,n=100,k=82)
```
--

```{r, include=TRUE, echo=TRUE, warning=FALSE, message=FALSE}

# Evaluate the log-likelihood function for many possible values of p

parameter_space <- tibble(p=seq(0,1,by=0.01)) %>% 
  rowwise() %>% mutate(loglik = ll(p,n=100,k=82)) 

# Find the value of p that yield the largest value for the log-likelihood function

parameter_space %>% as.matrix() -> m
m[which.max(m[,2]),]
```

---
.bold["Optimización" numérica en R]

.center[
```{r loglik_density,  include=TRUE, echo=FALSE, warning=FALSE, message=FALSE, fig.height=10, fig.width=12}
parameter_space %>% as.data.frame() %>% ggplot(aes(x=p, y=loglik, colour=loglik)) + 
geom_line(size=1.5) + geom_point(aes(x=m[which.max(m[,2]),1], m[which.max(m[,2]),2]), size=2.5) +
scale_color_viridis() + guides(fill=FALSE, color=FALSE) + labs(title="Log-likelihood function",x="p", y="82*log(p) + (100 - 82)*log(1-p)") +
annotate(geom="text", x=0.82, y=-35, label='bold("0.82")', color="black", parse=TRUE, size=8) +
theme(axis.text.y = element_text(size = 22), axis.text.x = element_text(size = 22),
axis.title.y = element_text(size = 24), axis.title.x = element_text(size = 24), title=element_text(size = 24),
legend.text = element_text(size = 18), legend.position="bottom") 
```
]

---
##Estimación via Maximum Likelihood (MLE) 

.bold[Generalización]

<br>


$$\hat{\boldsymbol{\beta}}_{MLE} = \underset{\beta}{\arg\max\ } \mathcal{L}(\boldsymbol{\beta} \mid \boldsymbol{X})$$
$\hat{\boldsymbol{\beta}}$ es el MLE de $\boldsymbol{\beta}$ si es el(los) valor(es) que maximiza(n) la "likelihood function", condicional en los datos observados.

<br>
--

- Recordar que   $\mathcal{L}(\boldsymbol{\beta} \mid \boldsymbol{X}) = \mathbb{P}(\boldsymbol{X} \mid \boldsymbol{\beta})$.
  
--

- Requiere especificar de antemano la distribución conjunta de las observaciones (dif. de OLS, por ejemplo).

--

- ML es probablemente el approach de estimación más popular. 

--

- Intuitivo, pero, por lo general, no tan simple como el ejemplo que vimos hoy.

--

- Normalmente la maximización se realiza numéricamente (ej. método Newton–Raphson)

---
class: fullscreen,left, top, text-white
background-image: url(valdorcia.jpeg)

##Estimación via Maximum Likelihood (MLE) 


---
class: inverse, center, middle

#Inferencia
##Inferencia Estadística para una proporción

---
##"Sampling distribution" de una proporción


- Nuestro MLE para una poporción es $\hat{p} = \frac{\sum x_{i}}{n}$

  - Donde $x_{i} \sim \text{Bernoulli}(p)$

<br>
--

- ¿Cómo podemos cuantificar la incertidumbre en torno a nuestras estimaciones?

  - En ejemplo anterior $\hat{p} = 0.82$, pero los datos fueron simulados con $p=0.8$ (pámetro verdadero)


<br>
--

- Para responder esta pregunta debemos conocer la .bold[sampling distribution] de nuestro estimador, especialmente su _variabilidad_.

 - El estimador, aplicado a diferentes muestras, produce distintos resultados, pero sigue una distribución


---
##"Sampling distribution" de una proporción


- $\mathbb{Var}\big(\hat{p}\big)$ o  $\sqrt{\mathbb{Var}\big(\hat{p}\big)}$ "Error Estándar" (SE)?


--

  - $\mathbb{Var}\big(\hat{p}\big) = \mathbb{Var}\Big( \frac{\sum_{i} x_{i}}{n} \Big) = \mathbb{Var}\Big(\frac{x_{i} + ... + x_{n}}{n}\Big)$

--

  - $\mathbb{Var}\big(\hat{p}\big) = \frac{1}{n^2} \Big(\mathbb{Var}(x_{i}) + ... + \mathbb{Var}(x_{n})\Big)$

--

  - $\mathbb{Var}\big(\hat{p}\big) = \frac{1}{n^2}( \sigma^2 + ... + \sigma^2 ) = \frac{n\sigma^2}{n^2} = \frac{\sigma^2}{n} = \frac{p(1-p)}{n}$

<br>
--

.content-box-blue[
$$\sigma_{\hat{p}} =  \frac{\sigma}{\sqrt{n}} =  \sqrt{\frac{p(1-p)}{n}}$$
]

---
##"Sampling distribution" de una proporción

  - $\mathbb{E}\big(\hat{p}\big) = \mathbb{E}\Big( \frac{\sum_{i} x_{i}}{n} \Big) =  \frac{1}{n} \sum_{i} \mathbb{E}(x_{i}) =  \frac{1}{n}\sum_{i} p = \frac{np}{n} = p$


--

- Importante resultado teórico: la sampling distribution de $\hat{p}$ es _asintóticamente_ normal:

$$\hat{p} \sim \mathcal{N}(\mu,\sigma) \quad \text{ con }   \quad \mu = p \quad \text{ y } \quad \sigma = \sqrt{\frac{p(1-p)}{n}}$$

--

Podemos usar este resultado para construir un intervalo de confianza para $\hat{p}$, al (1 - $\alpha$)% de confianza. Para un nivel de significación estadística de $\alpha=0.05$,

\begin{align}
  95\% \text{ CI}_{\hat{p}} &= \hat{p} \pm 1.96 \times SE \\ \\
          &=\hat{p} \pm 1.96  \sqrt{\frac{p(1-p)}{n}} 
\end{align}

<br>
--

.bold[Nota importante]: cuando no conocemos los _verdaderos_ parámetros (casi siempre!), reemplazamos por sus valores estimados en la muestra (en este caso, $\hat{p}$ en vez de $p$).

---
##"Sampling distribution" de una proporción

.bold[Ilustración via Monte Carlo simulation]

```{r}
# Theoretical parameters of Bernoulli random variable X: "tosing a coin"
p = 0.35

# Our sample: a random sample of 500 coin toses 
n = 500
our_sample <- rbinom(n,size=1,prob=p); our_sample %>% glimpse()
p_hat <- mean(our_sample); p_hat

```

---
##"Sampling distribution" de una proporción

.bold[Ilustración via Monte Carlo simulation]

```{r}
#  Demonstration of the distrution of p_hat across 1000 samples of size 500.
p_hat_n = NULL
for (i in 1:1000) {
  a_sample <- rbinom(n,size=1,prob=p)
  p_hat_n[i] <- mean(a_sample)
}
```
```{r,echo=FALSE}
p_hat_n %>% glimpse()
```

--

```{r}
# mean of p_hat_n  = p
moment_1 <- c(mean_samp_dis = mean(p_hat_n), our_estimate_p= p_hat)
```
```{r,echo=FALSE} 
print(moment_1) 
```

--

```{r}
# sd of p_hat_n  = sqrt(p*(1-p)/n)
moment_2 <- c(sd_samp_dis = sd(p_hat_n), our_estimate_sd= sqrt((p_hat*(1-p_hat))/n))
```
```{r,echo=FALSE} 
print(moment_2) 
```

---
##"Sampling distribution" de una proporción

.bold[Ilustración via Monte Carlo simulation]

```{r, echo=FALSE, message=FALSE, warning=FALSE, fig.width=12}
library("tidyverse")

# our estimate, z-score

ci_ll = p_hat - 1.96*(((p_hat*(1-p_hat))/n)^(1/2))
ci_ul = p_hat + 1.96*(((p_hat*(1-p_hat))/n)^(1/2))

samp_distrib <- with(density(p_hat_n), data.frame(x, y)) 

plot <- samp_distrib %>% ggplot(aes(x=x, y=y)) +
    geom_line(aes(color=""), size=1.5, alpha=0.8) +
    geom_ribbon(data = samp_distrib %>% as_tibble() %>% filter(x >= ci_ul), aes(ymin = 0, ymax = y), alpha=0.5) +
    geom_ribbon(data = samp_distrib %>% as_tibble() %>% filter(x <= ci_ll), aes(ymin = 0, ymax = y), alpha=0.5) +
    labs(y="f(y)", x="y", title="Sampling Distribution p_hat") +
    scale_color_viridis_d() +
    theme(axis.text.y = element_text(size = 22), axis.text.x = element_text(size = 22),
    axis.title.y = element_text(size = 24), axis.title.x = element_text(size = 24), 
    legend.text = element_text(size = 16), legend.position="none") +
    geom_vline(xintercept = p_hat, color = "blue", size=1.5) +
    annotate(geom="text", x=p_hat+0.017, y=20, label='bold("nuestro p_hat")', color="black", parse=TRUE, size=8) 

print(plot)
```

---


class: inverse, center, middle

.huge[
**Hasta la próxima clase. Gracias!**
]

<br>
Mauricio Bucca <br>
https://mebucca.github.io/ <br>
github.com/mebucca