Skip to content

Commit

Permalink
Release/1.0.0 (#29)
Browse files Browse the repository at this point in the history
* started updates to readme

* changed readme image

* adjusted curves img in readme

* adjusted image again

* pylint clean-up

* started dividing up the GPS class

* more work on dividing GPS up

* started updating TMLE core

* made good progress on TMLE

* got the code working

* start revising unit tests. Done with tests of Core class

* finished revising tests

* whoops, needed more tests

* started making big changes to docs

* tested outside of project folder

* revised documentation

* final changes

* fixed docs

Co-authored-by: rkobrosly <[email protected]>
  • Loading branch information
ronikobrosly and rkobrosly authored Jan 3, 2021
1 parent 50213c1 commit ab10e30
Show file tree
Hide file tree
Showing 36 changed files with 2,285 additions and 1,193 deletions.
598 changes: 598 additions & 0 deletions .pylintrc

Large diffs are not rendered by default.

36 changes: 14 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,28 @@
[![codecov](https://codecov.io/gh/ronikobrosly/causal-curve/branch/master/graph/badge.svg)](https://codecov.io/gh/ronikobrosly/causal-curve)
[![DOI](https://zenodo.org/badge/256017107.svg)](https://zenodo.org/badge/latestdoi/256017107)

Python tools to perform causal inference using observational data when the treatment of interest is continuous.
Python tools to perform causal inference when the treatment of interest is continuous.


<p align="center">
<img src="https://upload.wikimedia.org/wikipedia/commons/e/e8/Antikythera_mechanism.svg" align="middle" width="350" height="477" />
<img src="/imgs/curves.png" align="middle"/>
</p>


[The Antikythera mechanism](https://en.wikipedia.org/wiki/Antikythera_mechanism), an ancient analog computer, with lots of beautiful curves.



## Table of Contents

- [Overview](#overview)
- [Installation](#installation)
- [Documentation](#documentation)
- [In Progress](#in-progress)
- [Contributing](#contributing)
- [Citation](#citation)
- [References](#references)

## Overview

(**Version 1.0.0 released in January 2021!**)

There are many implemented methods to perform causal inference when your intervention of interest is binary,
but few methods exist to handle continuous treatments.

Expand Down Expand Up @@ -61,15 +59,6 @@ pip install .
[Documentation is available at readthedocs.org](https://causal-curve.readthedocs.io/en/latest/)


## In Progress

(12/26/2020) Currently working towards version 1.0.0!

This major update will include:
* An overhaul of the TMLE tool to make it more accurate and user-friendly.
* Separate model classes for predicting binary or continuous outcomes (much like sklearn's approach)
* Better TMLE example documentation

## Contributing

Your help is absolutely welcome! Please do reach out or create a feature branch!
Expand All @@ -83,19 +72,22 @@ Kobrosly, R. W., (2020). causal-curve: A Python Causal Inference Package to Esti
Galagate, D. Causal Inference with a Continuous Treatment and Outcome: Alternative
Estimators for Parametric Dose-Response function with Applications. PhD thesis, 2016.

Moodie E and Stephens DA. Estimation of dose–response functions for
longitudinal data using the generalised propensity score. In: Statistical Methods in
Medical Research 21(2), 2010, pp.149–166.

Hirano K and Imbens GW. The propensity score with continuous treatments.
In: Gelman A and Meng XL (eds) Applied bayesian modeling and causal inference
from incomplete-data perspectives. Oxford, UK: Wiley, 2004, pp.73–84.

Imai K, Keele L, Tingley D. A General Approach to Causal Mediation Analysis. Psychological
Methods. 15(4), 2010, pp.309–334.

Kennedy EH, Ma Z, McHugh MD, Small DS. Nonparametric methods for doubly robust estimation
of continuous treatment effects. Journal of the Royal Statistical Society, Series B. 79(4), 2017, pp.1229-1245.

Moodie E and Stephens DA. Estimation of dose–response functions for
longitudinal data using the generalised propensity score. In: Statistical Methods in
Medical Research 21(2), 2010, pp.149–166.

van der Laan MJ and Gruber S. Collaborative double robust penalized targeted
maximum likelihood estimation. In: The International Journal of Biostatistics 6(1), 2010.

van der Laan MJ and Rubin D. Targeted maximum likelihood learning. In: ​U.C. Berkeley Division of
Biostatistics Working Paper Series, 2006.

Imai K., Keele L., Tingley D. A General Approach to Causal Mediation Analysis. Psychological
Methods. 15(4), 2010, pp.309–334.
6 changes: 4 additions & 2 deletions causal_curve/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@

from statsmodels.genmod.generalized_linear_model import DomainWarning

from causal_curve.gps import GPS
from causal_curve.tmle import TMLE
from causal_curve.gps_classifier import GPS_Classifier
from causal_curve.gps_regressor import GPS_Regressor

from causal_curve.tmle_regressor import TMLE_Regressor
from causal_curve.mediation import Mediation


Expand Down
70 changes: 67 additions & 3 deletions causal_curve/core.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
"""
Core classes (with basic methods) that will be invoked when other, model classes are defined
"""
import pkg_resources

import numpy as np
from scipy.stats import norm


class Core:
"""Base class for causal_curve module"""

def __init__(self):
pass
__version__ = "1.0.0"

def get_params(self):
"""Returns a dict of all of the object's user-facing parameters
Expand All @@ -26,4 +28,66 @@ def get_params(self):
[(k, v) for k, v in list(attrs.items()) if (k[0] != "_") and (k[-1] != "_")]
)

__version__ = "0.5.2"
def if_verbose_print(self, string):
"""Prints the input statement if verbose is set to True
Parameters
----------
string: str, some string to be printed
Returns
----------
None
"""
if self.verbose:
print(string)

@staticmethod
def rand_seed_wrapper(random_seed=None):
"""Sets the random seed using numpy
Parameters
----------
random_seed: int, random seed number
Returns
----------
None
"""
if random_seed is None:
pass
else:
np.random.seed(random_seed)

@staticmethod
def calculate_z_score(ci):
"""Calculates the critical z-score for a desired two-sided,
confidence interval width.
Parameters
----------
ci: float, the confidence interval width (e.g. 0.95)
Returns
-------
Float, critical z-score value
"""
return norm.ppf((1 + ci) / 2)

@staticmethod
def clip_negatives(number):
"""Helper function to clip negative numbers to zero
Parameters
----------
number: int or float, any number that needs a floor at zero
Returns
-------
Int or float of modified value
"""
if number < 0:
return 0
return number
110 changes: 110 additions & 0 deletions causal_curve/gps_classifier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
"""
Defines the Generalized Prospensity Score (GPS) classifier model class
"""

import numpy as np
from scipy.special import logit

from causal_curve.gps_core import GPS_Core


class GPS_Classifier(GPS_Core):
"""
A GPS tool that handles binary outcomes. Inherits the GPS_core
base class. See that base class code its docstring for more details.
"""

def __init__(
self,
gps_family=None,
treatment_grid_num=100,
lower_grid_constraint=0.01,
upper_grid_constraint=0.99,
spline_order=3,
n_splines=30,
lambda_=0.5,
max_iter=100,
random_seed=None,
verbose=False,
):
GPS_Core.__init__(
self,
gps_family=None,
treatment_grid_num=100,
lower_grid_constraint=0.01,
upper_grid_constraint=0.99,
spline_order=3,
n_splines=30,
lambda_=0.5,
max_iter=100,
random_seed=None,
verbose=False,
)

def _cdrc_predictions_binary(self, ci):
"""Returns the predictions of CDRC for each value of the treatment grid. Essentially,
we're making predictions using the original treatment and gps_at_grid.
To be used when the outcome of interest is binary.
"""
# To keep track of cdrc predictions, we create an empty 2d array of shape
# (n_samples, treatment_grid_num, 2). The last dimension is of length 2 because
# we are going to keep track of the point estimate (log-odds) of the prediction, as well as
# the standard error of the prediction interval (again, this is for the log odds)
cdrc_preds = np.zeros((len(self.T), self.treatment_grid_num, 2), dtype=float)

# Loop through each of the grid values, predict point estimate and get prediction interval
for i in range(0, self.treatment_grid_num):

temp_T = np.repeat(self.grid_values[i], repeats=len(self.T))
temp_gps = self.gps_at_grid[:, i]

temp_cdrc_preds = logit(
self.gam_results.predict_proba(np.column_stack((temp_T, temp_gps)))
)

temp_cdrc_interval = logit(
self.gam_results.confidence_intervals(
np.column_stack((temp_T, temp_gps)), width=ci
)
)

standard_error = (
temp_cdrc_interval[:, 1] - temp_cdrc_preds
) / self.calculate_z_score(ci)

cdrc_preds[:, i, 0] = temp_cdrc_preds
cdrc_preds[:, i, 1] = standard_error

return np.round(cdrc_preds, 3)

def estimate_log_odds(self, T):
"""Calculates the estimated log odds of the highest integer class. Can
only be used when the outcome is binary. Can be estimate for a single
data point or can be run in batch for many observations. Extrapolation will produce
untrustworthy results; the provided treatment should be within
the range of the training data.
Parameters
----------
T: Numpy array, shape (n_samples,)
A continuous treatment variable.
Returns
----------
array: Numpy array
Contains a set of log odds
"""
if self.outcome_type != "binary":
raise TypeError("Your outcome must be binary to use this function!")

return np.apply_along_axis(self._create_log_odds, 0, T.reshape(1, -1))

def _create_log_odds(self, T):
"""Take a single treatment value and produces the log odds of the higher
integer class, in the case of a binary outcome.
"""
return logit(
self.gam_results.predict_proba(
np.array([T, self.gps_function(T).mean()]).reshape(1, -1)
)
)
Loading

0 comments on commit ab10e30

Please sign in to comment.