doc revisions (#17)

ronikobrosly · Aug 14, 2020 · 93f9219 · 93f9219
1 parent ba94fe1
commit 93f9219
Show file tree

Hide file tree

Showing 5 changed files with 92 additions and 51 deletions.
diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -4,10 +4,16 @@
 Change Log
 ==========
 
+Version 0.3.5
+-------------
+- Re-organized documentation
+- Added `Introduction` section to explain purpose and need for the package
+
+
 Version 0.3.4
 -------------
 - Removed XGBoost as dependency.
-- Now using sklearn's gradient boosting implementation. 
+- Now using sklearn's gradient boosting implementation.
 
 
 Version 0.3.3

diff --git a/docs/conf.py b/docs/conf.py
@@ -22,7 +22,7 @@
 author = 'Roni Kobrosly'
 
 # The full version, including alpha/beta/rc tags
-release = '0.3.4'
+release = '0.3.5'
 
 # -- General configuration ---------------------------------------------------
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -1,11 +1,13 @@
 Welcome to causal-curve's documentation!
 ========================================
 
+
 .. toctree::
    :maxdepth: 2
    :hidden:
    :caption: Getting Started
 
+   intro
    install
    contribute
 
@@ -52,32 +54,16 @@ Welcome to causal-curve's documentation!
 **causal-curve** is a Python package with tools to perform causal inference
 using observational data when the treatment of interest is continuous.
 
-
 .. image:: ../imgs/welcome_plot.png
 
 
-
 Summary
 -------
 
-Sometimes it would be nice to run a randomized, controlled experiment to determine whether drug `A`
-is superior to drug `B`, whether the blue button gets more clicks than the orange button on your
-e-commerce site, etc. Unfortunately, it isn't always possible (resources are finite, the
-test might not be ethical, you lack a proper A/B testing infrastructure, etc).
-In these situations, there are methods can be employed to help you infer causality from observational data.
-
-There are many methods to perform causal inference when your intervention of interest is binary
-(see the drug and button examples above), but few methods exist to handle continuous treatments.
-
-This is unfortunate because there are many scenarios (in industry and research) where these methods would be useful.
-For example, when you would like to:
-
-* Estimate the causal response to increasing or decreasing the price of a product across a wide range.
-* Understand how the number of hours per week of aerobic exercise causes positive health outcomes.
-* Estimate how decreasing order wait time will impact customer satisfaction, after controlling for confounding effects.
-* Estimate how changing neighborhood income inequality (Gini index) could be causally related to neighborhood crime rate.
-
-This library attempts to address this gap, providing tools to estimate causal curves (AKA causal dose-response curves).
+There are many available methods to perform causal inference when your intervention of interest is binary,
+but few methods exist to handle continuous treatments. This is unfortunate because there are many
+scenarios (in industry and research) where these methods would be useful. This library attempts to
+address this gap, providing tools to estimate causal curves (AKA causal dose-response curves).
 
 
 Quick example (of the ``GPS`` tool)
@@ -113,31 +99,3 @@ generalized propensity scores.
 5. Estimate the points of the causal curve (along with 95% confidence interval bounds) with the ``.calculate_CDRC()`` method.
 
 6. Explore or plot your results!
-
-None of the methods provided in causal-curve rely on inference via instrumental variables, they only
-rely on the data from the observed treatment, confounders, and the outcome of interest (like the above GPS example).
-
-
-
-A caution about assumptions
----------------------------
-
-There is a well-documented set of assumptions one must make to infer causal effects from
-observational data. These are covered elsewhere in more detail, but briefly:
-
-- Causes always occur before effects: The treatment variable needs to have occurred before the outcome.
-- SUTVA: The treatment status of a given individual does not affect the potential outcomes of any other individuals.
-- Positivity: Any individual has a positive probability of receiving all values of the treatment variable.
-- Ignorability: All major confounding variables are included in the data you provide.
-
-Violations of these assumptions will lead to biased results and incorrect conclusions!
-
-In addition, any covariates that are included in `causal-curve` models are assumed to only
-be **confounding** variables.
-
-
-
-`Getting started <install.html>`_
----------------------------------
-
-Information to install, test, and contribute to the package.
diff --git a/docs/intro.rst b/docs/intro.rst
@@ -0,0 +1,77 @@
+.. _intro:
+
+============================
+Introduction to causal-curve
+============================
+
+In academia and industry, randomized controlled experiments (or simply experiments or colloquially
+known as "A/B tests") are considered the gold standard approach for assessing the true, causal impact
+of a treatment or intervention. For example:
+
+* We want to increase the number of times per day new customers log into our business's website. Will it help if send daily emails out to our customers? We take a group of 2000 new business customers and half is randomly chosen to receive daily emails while the other half receives one email per week. We follow both groups forward in time for a month compare each group's average number of logins per day.
+
+However, for ethical or financial reasons experiments may not always be feasible to carry out.
+
+* It's not ethical to randomly assign some people to receive a possible carcinogen in pill form while others receive a sugar pill, and then see which group is more likely to develop cancer.
+* It's not feasible to increase the household incomes of some New York neighborhoods, while leaving others unchanged to see if changing a neighborhood's income inequality would improve the local crime rate.
+
+
+"Causal inference" methods are a set of approaches that attempt to estimate causal effects
+from observational rather than experimental data, correcting for the biases that are inherent
+to analyzing observational data (e.g. confounding and selection bias) [@Hernán:2020].
+
+As long as you have varying observational data on some treatment, your outcome of interest,
+and potentially confounding variables across your units of analysis (in addition to meeting the assumptions described below),
+then you can essentially simulate a proper experiment and make causal claims.
+
+
+Interpretting the causal curve
+------------------------------
+
+Two of the methods contained within this package produce causal curves for continuous treatments
+(see the GPS and TMLE methods).
+
+.. image:: ../imgs/welcome_plot.png
+
+Using the above causal curve as an example, we see that employing a treatment value between 50 - 60
+causally produces the highest outcome values. We also see that
+the treatment produces a smaller effect if lower or higher than that range. The confidence
+intervals become wider on the parts of the curve where we have fewer data points (near the minimum and
+maximum treatment values).
+
+This curve differs from a simple bivariate plot of the treatment and outcome or even a similar-looking plot
+generated through standard multivariable regression modeling in a few important ways:
+
+* This curve represents the estimated causal effect of a treatment on an outcome, not the association between treatment and outcome.
+* This curve represents a population-level effect, and should not be used to infer effects at the individual-level (or whatever the unit of analysis is).
+* To generate a similar-looking plot using multivariable regression, you would have to hold covariates constant, and any treatment effect that is inferred occurs within the levels of the covariates specified in the model. The causal curve averages out across all of these strata and gives us the population marginal effect.
+
+
+A caution about causal inference assumptions
+--------------------------------------------
+
+There is a well-documented set of assumptions one must make to infer causal effects from
+observational data. These are covered elsewhere in more detail, but briefly:
+
+- Causes always occur before effects: The treatment variable needs to have occurred before the outcome.
+- SUTVA: The treatment status of a given individual does not affect the potential outcomes of any other individuals.
+- Positivity: Any individual has a positive probability of receiving all values of the treatment variable.
+- Ignorability: All major confounding variables are included in the data you provide.
+
+Violations of these assumptions will lead to biased results and incorrect conclusions!
+
+In addition, any covariates that are included in `causal-curve` models are assumed to only
+be **confounding** variables.
+
+None of the methods provided in causal-curve rely on inference via instrumental variables, they only
+rely on the data from the observed treatment, confounders, and the outcome of interest (like the above GPS example).
+
+
+References
+----------
+
+Hernán M. and Robins J. Causal Inference: What If. Chapman & Hall, 2020.
+
+Ahern J, Hubbard A, and Galea S. Estimating the Effects of Potential Public Health Interventions
+on Population Disease Burden: A Step-by-Step Illustration of Causal Inference Methods. American Journal of Epidemiology.
+169(9), 2009. pp.1140–1147.
diff --git a/setup.py b/setup.py
@@ -5,7 +5,7 @@
 
 setuptools.setup(
     name="causal-curve",
-    version="0.3.4",
+    version="0.3.5",
     author="Roni Kobrosly",
     author_email="[email protected]",
     description="A python library with tools to perform causal inference using \