Skip to content

Commit

Permalink
forest bad model notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
tijana-zrnic authored Oct 2, 2023
1 parent 8440ec3 commit 1c009c7
Showing 1 changed file with 53 additions and 7 deletions.
60 changes: 53 additions & 7 deletions examples/baselines/forest_badmodel.ipynb
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "b1a2661f",
"metadata": {},
"source": [
"# Cases Where Prediction-Powered Inference is Underpowered: Bad Model\n",
"\n",
"The goal of this experiment is to demonstrate a case where prediction-powered inference is underpowered due to the machine-learning model not being accurate enough.\n",
"The inferential target is the fraction of the Amazon rainforest lost between 2000 and 2015. The same problem is studied in the notebook [```forest.ipynb```](https://github.com/aangelopoulos/ppi_py/blob/main/examples/forest.ipynb), however here a worse predictive model is trained for the purpose of the demonstration."
]
},
{
"cell_type": "markdown",
"id": "0c1f0f0a",
"metadata": {},
"source": [
"### Import necessary packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
Expand Down Expand Up @@ -29,7 +48,9 @@
"id": "5cf90ae6",
"metadata": {},
"source": [
"# Import the forest data set using the linear regression model"
"### Import the forest data set with predictions made via a linear model\n",
"\n",
"Load the data. The data set contains gold-standard deforestation labels (```Y```) and deforestation labels predicted via linear regression (```Yhat```)."
]
},
{
Expand All @@ -40,8 +61,8 @@
"outputs": [],
"source": [
"data = np.load(\n",
" \"../data/forest_badmodel.npz\"\n",
") # This data can be downloaded from this Google Drive link:\n",
" \"forest_badmodel.npz\"\n",
") \n",
"Y_total = data[\"Y\"]\n",
"Yhat_total = data[\"Yhat\"]"
]
Expand All @@ -51,7 +72,11 @@
"id": "8969f9db",
"metadata": {},
"source": [
"# Problem setup"
"### Problem setup\n",
"\n",
"Specify the error level (```alpha```), range of values for the labeled data set size (```ns```), and number of trials (```num_trials```).\n",
"\n",
"Compute the ground-truth value of the estimand."
]
},
{
Expand All @@ -77,7 +102,14 @@
"id": "83ce18be",
"metadata": {},
"source": [
"# Construct intervals"
"### Construct intervals\n",
"\n",
"Form confidence intervals for all methods and problem parameters. A dataframe with the following columns is formed:\n",
"1. ```method``` (one of ```PPI```, ```Classical```, and ```Imputation```)\n",
"2. ```n``` (labeled data set size, takes values in ```ns```)\n",
"3. ```lower``` (lower endpoint of the confidence interval)\n",
"4. ```upper``` (upper endpoint of the confidence interval)\n",
"5. ```trial``` (index of trial, goes from ```0``` to ```num_trials-1```)"
]
},
{
Expand Down Expand Up @@ -164,7 +196,11 @@
"id": "d15ba288",
"metadata": {},
"source": [
"# Plot results"
"### Plot results\n",
"\n",
"Plot:\n",
"1. Five randomly chosen intervals from the dataframe for PPI and the classical method, and the imputed interval;\n",
"2. The average interval width for PPI and the classical method, together with a scatterplot of the widths from the five random draws."
]
},
{
Expand Down Expand Up @@ -197,6 +233,16 @@
")"
]
},
{
"cell_type": "markdown",
"id": "6f7398dd",
"metadata": {},
"source": [
"### Power experiment\n",
"\n",
"For PPI and the classical approach, find the smallest value of ```n``` such that the method has power 80% against the null that there is no deforestation, $H_0: \\text{deforestation} \\leq 0$."
]
},
{
"cell_type": "code",
"execution_count": 6,
Expand Down Expand Up @@ -285,7 +331,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
"version": "3.8.15"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 1c009c7

Please sign in to comment.