Skip to content

Commit

Permalink
doc revisions (#17)
Browse files Browse the repository at this point in the history
  • Loading branch information
ronikobrosly authored Aug 14, 2020
1 parent ba94fe1 commit 93f9219
Show file tree
Hide file tree
Showing 5 changed files with 92 additions and 51 deletions.
8 changes: 7 additions & 1 deletion docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,16 @@
Change Log
==========

Version 0.3.5
-------------
- Re-organized documentation
- Added `Introduction` section to explain purpose and need for the package


Version 0.3.4
-------------
- Removed XGBoost as dependency.
- Now using sklearn's gradient boosting implementation.
- Now using sklearn's gradient boosting implementation.


Version 0.3.3
Expand Down
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
author = 'Roni Kobrosly'

# The full version, including alpha/beta/rc tags
release = '0.3.4'
release = '0.3.5'

# -- General configuration ---------------------------------------------------

Expand Down
54 changes: 6 additions & 48 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
Welcome to causal-curve's documentation!
========================================


.. toctree::
:maxdepth: 2
:hidden:
:caption: Getting Started

intro
install
contribute

Expand Down Expand Up @@ -52,32 +54,16 @@ Welcome to causal-curve's documentation!
**causal-curve** is a Python package with tools to perform causal inference
using observational data when the treatment of interest is continuous.


.. image:: ../imgs/welcome_plot.png



Summary
-------

Sometimes it would be nice to run a randomized, controlled experiment to determine whether drug `A`
is superior to drug `B`, whether the blue button gets more clicks than the orange button on your
e-commerce site, etc. Unfortunately, it isn't always possible (resources are finite, the
test might not be ethical, you lack a proper A/B testing infrastructure, etc).
In these situations, there are methods can be employed to help you infer causality from observational data.

There are many methods to perform causal inference when your intervention of interest is binary
(see the drug and button examples above), but few methods exist to handle continuous treatments.

This is unfortunate because there are many scenarios (in industry and research) where these methods would be useful.
For example, when you would like to:

* Estimate the causal response to increasing or decreasing the price of a product across a wide range.
* Understand how the number of hours per week of aerobic exercise causes positive health outcomes.
* Estimate how decreasing order wait time will impact customer satisfaction, after controlling for confounding effects.
* Estimate how changing neighborhood income inequality (Gini index) could be causally related to neighborhood crime rate.

This library attempts to address this gap, providing tools to estimate causal curves (AKA causal dose-response curves).
There are many available methods to perform causal inference when your intervention of interest is binary,
but few methods exist to handle continuous treatments. This is unfortunate because there are many
scenarios (in industry and research) where these methods would be useful. This library attempts to
address this gap, providing tools to estimate causal curves (AKA causal dose-response curves).


Quick example (of the ``GPS`` tool)
Expand Down Expand Up @@ -113,31 +99,3 @@ generalized propensity scores.
5. Estimate the points of the causal curve (along with 95% confidence interval bounds) with the ``.calculate_CDRC()`` method.

6. Explore or plot your results!

None of the methods provided in causal-curve rely on inference via instrumental variables, they only
rely on the data from the observed treatment, confounders, and the outcome of interest (like the above GPS example).



A caution about assumptions
---------------------------

There is a well-documented set of assumptions one must make to infer causal effects from
observational data. These are covered elsewhere in more detail, but briefly:

- Causes always occur before effects: The treatment variable needs to have occurred before the outcome.
- SUTVA: The treatment status of a given individual does not affect the potential outcomes of any other individuals.
- Positivity: Any individual has a positive probability of receiving all values of the treatment variable.
- Ignorability: All major confounding variables are included in the data you provide.

Violations of these assumptions will lead to biased results and incorrect conclusions!

In addition, any covariates that are included in `causal-curve` models are assumed to only
be **confounding** variables.



`Getting started <install.html>`_
---------------------------------

Information to install, test, and contribute to the package.
77 changes: 77 additions & 0 deletions docs/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
.. _intro:

============================
Introduction to causal-curve
============================

In academia and industry, randomized controlled experiments (or simply experiments or colloquially
known as "A/B tests") are considered the gold standard approach for assessing the true, causal impact
of a treatment or intervention. For example:

* We want to increase the number of times per day new customers log into our business's website. Will it help if send daily emails out to our customers? We take a group of 2000 new business customers and half is randomly chosen to receive daily emails while the other half receives one email per week. We follow both groups forward in time for a month compare each group's average number of logins per day.

However, for ethical or financial reasons experiments may not always be feasible to carry out.

* It's not ethical to randomly assign some people to receive a possible carcinogen in pill form while others receive a sugar pill, and then see which group is more likely to develop cancer.
* It's not feasible to increase the household incomes of some New York neighborhoods, while leaving others unchanged to see if changing a neighborhood's income inequality would improve the local crime rate.


"Causal inference" methods are a set of approaches that attempt to estimate causal effects
from observational rather than experimental data, correcting for the biases that are inherent
to analyzing observational data (e.g. confounding and selection bias) [@Hernán:2020].

As long as you have varying observational data on some treatment, your outcome of interest,
and potentially confounding variables across your units of analysis (in addition to meeting the assumptions described below),
then you can essentially simulate a proper experiment and make causal claims.


Interpretting the causal curve
------------------------------

Two of the methods contained within this package produce causal curves for continuous treatments
(see the GPS and TMLE methods).

.. image:: ../imgs/welcome_plot.png

Using the above causal curve as an example, we see that employing a treatment value between 50 - 60
causally produces the highest outcome values. We also see that
the treatment produces a smaller effect if lower or higher than that range. The confidence
intervals become wider on the parts of the curve where we have fewer data points (near the minimum and
maximum treatment values).

This curve differs from a simple bivariate plot of the treatment and outcome or even a similar-looking plot
generated through standard multivariable regression modeling in a few important ways:

* This curve represents the estimated causal effect of a treatment on an outcome, not the association between treatment and outcome.
* This curve represents a population-level effect, and should not be used to infer effects at the individual-level (or whatever the unit of analysis is).
* To generate a similar-looking plot using multivariable regression, you would have to hold covariates constant, and any treatment effect that is inferred occurs within the levels of the covariates specified in the model. The causal curve averages out across all of these strata and gives us the population marginal effect.


A caution about causal inference assumptions
--------------------------------------------

There is a well-documented set of assumptions one must make to infer causal effects from
observational data. These are covered elsewhere in more detail, but briefly:

- Causes always occur before effects: The treatment variable needs to have occurred before the outcome.
- SUTVA: The treatment status of a given individual does not affect the potential outcomes of any other individuals.
- Positivity: Any individual has a positive probability of receiving all values of the treatment variable.
- Ignorability: All major confounding variables are included in the data you provide.

Violations of these assumptions will lead to biased results and incorrect conclusions!

In addition, any covariates that are included in `causal-curve` models are assumed to only
be **confounding** variables.

None of the methods provided in causal-curve rely on inference via instrumental variables, they only
rely on the data from the observed treatment, confounders, and the outcome of interest (like the above GPS example).


References
----------

Hernán M. and Robins J. Causal Inference: What If. Chapman & Hall, 2020.

Ahern J, Hubbard A, and Galea S. Estimating the Effects of Potential Public Health Interventions
on Population Disease Burden: A Step-by-Step Illustration of Causal Inference Methods. American Journal of Epidemiology.
169(9), 2009. pp.1140–1147.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setuptools.setup(
name="causal-curve",
version="0.3.4",
version="0.3.5",
author="Roni Kobrosly",
author_email="[email protected]",
description="A python library with tools to perform causal inference using \
Expand Down

0 comments on commit 93f9219

Please sign in to comment.