diff --git a/docs/changelog.rst b/docs/changelog.rst index 21ed728..1d146dd 100644 --- a/docs/changelog.rst +++ b/docs/changelog.rst @@ -4,6 +4,13 @@ Change Log ========== +Version 0.3.0 +------------- +- Added full, end-to-end example of package usage to documentation +- Cleaned up documentation +- Added example folder with end-to-end notebook +- Added manuscript to paper folder + Version 0.2.4 ------------- - Strengthened unit tests diff --git a/docs/conf.py b/docs/conf.py index 428b69f..e2324d7 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -22,8 +22,7 @@ author = 'Roni Kobrosly' # The full version, including alpha/beta/rc tags -release = '0.2.4' - +release = '0.3.0' # -- General configuration --------------------------------------------------- diff --git a/docs/full_example.rst b/docs/full_example.rst new file mode 100644 index 0000000..46917fa --- /dev/null +++ b/docs/full_example.rst @@ -0,0 +1,130 @@ +.. _full_example: + +============================================================= +Health data: generating causal curves and examining mediation +============================================================= + +The causal effect of blood lead levels on cognitive performance in children +--------------------------------------------------------------------------- + +To provide an end-to-end example of the sorts of analyses `cause-curve` can be used for, we'll +begin with an epidemiology topic. A notebook containing the pipeline to produce the following +output `is available here `_. +Note: Specific examples of the individual `causal-curve` tools with +code are available elsewhere in this documentation. + +Despite the banning of the use of lead-based paint and the use of lead in gasoline in the United +States, lead exposure remains an enormous public health problem for children and adolescents. This +is particularly true for poorer children living in older homes in inner-city environments. +For children, there is no known safe level of exposure to lead, and even small levels of +lead measured in their blood have been shown to affect IQ and academic achievement. +One of the scariest parts of lead exposure is that its effects are permanent. Blood lead levels (BLLs) +of 5 ug/dL or higher are considered elevated. + +There are much research around and many government programs for lead abatement. In terms of +public policy, it would be helpful to understand how childhood cognitive outcomes would be affected by +reducing BLLs in children. This is the causal question to answer, with blood lead +levels being the continuous treatment, and the cognitive outcomes being the outcome of interest. + +.. image:: https://upload.wikimedia.org/wikipedia/commons/6/69/LeadPaint1.JPG + +(Photo attribution: Thester11 / CC BY (https://creativecommons.org/licenses/by/3.0)) + +To explore that problem, we can analyze data collected from the National Health and Nutrition +Examination Survey (NHANES) III. This was a large, national study of families throughout the United +States, carried out between 1988 and 1994. Participants were involved in extensive interviews, +medical examinations, and provided biological samples. As part of this project, BLLs +were measured, and four scaled sub-tests of the Wechsler Intelligence Scale for Children-Revised +and the Wide Range Achievement Test-Revised (WISC/WRAT) cognitive test were carried out. This data +is de-identified and publicly available on the Centers for Disease Control and Prevention (CDC) +government website. + +When processing the data and missing values were dropped, there were 1,764 children between +6 and 12 years of age with complete data. BLLs among these children were log-normally +distributed, as one would expect: + +.. image:: ../imgs/full_example/BLL_dist.png + +The four scaled sub-tests of the WISC/WRAT included a math test, a reading test, a block design +test (a test of spatial visualization ability and motor skill), and a digit spanning test +(a test of memory). Their distributions are shown here: + +.. image:: ../imgs/full_example/test_dist.png + +Using a well-known study by Bruce Lanphear conducted in 2000 as a guide, we used the following +features as potentially confounding "nuisance" variables: + +- Child age +- Child sex (in 1988 - 1994 the CDC assumed binary sex) +- Child race/ethnicity +- The education level of the guardian +- Whether someone smokes in the child's home +- Whether the child spent time in a neonatal intensive care unit as a baby +- Whether the child is experiencing food insecurity (is food sometimes not available due to lack of resources?). + +In our simulated "experiment", these above confounders will be controlled for. + +By using either the GPS or TMLE tools included in `causal-curve` one can generate the causal +dose-response curves for BLLs in relation to the four outcomes: + +.. image:: ../imgs/full_example/test_causal_curves.png + +Note that the lower limit of detection for the blood lead test in this version of NHANES was +0.7 ug/dL. So lead levels below that value are not possible. + +In the case of the math test, these results indicate that by reducing BLLs in this population +to their lowest value would cause scaled math scores to increase by around 2 points, relative +to the BLLs around 10 ug/dL. Similar results are found for the reading and block design test, +although the digit spanning test causal curve appears possibly flat (although with the sparse +observations at the higher end of the BLL range and the wide confidence intervals it is +difficult to say). + +The above curves differ from standard regression curves in a few big ways: + +- Even though the data that we used to generate these curves are observational, if causal inference assumptions are met, these curves can be interpretted as causal. +- These models were created using the potential outcomes / counterfactual framework, while standard models are not. Also, the approach we used here essentially simulates experimental conditions by balancing out treatment assignment across the various confounders, and controlling for their effects. +- Even if complex interactions between the variables are modelled, these curves average over the various interaction effects and subgroups. In this sense, these are "marginal" curves. +- These curves should not be used to make predictions at the individual level. These are population level estimates and should remain that way. + + + +Do blood lead levels mediate the relationship between poverty and cognitive performance? +---------------------------------------------------------------------------------------- + +There is a well-known link between household income and child academic performance. Now that we +have some evidence of a potentially causal relationship between BLLs and test performance in +children, one might wonder if lead exposure might mediate the relationship between household income +academic performance. In other words, in this population does low income cause one to be +exposed more to lead, which in turn causes lower performance? Or is household income directly +linked with academic performance or through other variables? + +NHANES III captured each household's Poverty Index Ratio (the ratio of total family income to +the federal poverty level for the year of the interview). For this example, let's focus just +on the math test as an outcome. Using `causal-curve`'s mediation tool, +we found that the overall, mediating indirect effect of BLLs are 0.20 (0.17 - 0.23). This means +that lead exposure accounts for 20% of the relationship between low income and low test +performance in this population. The mediation tool also allows you to see how the indirect effect +varies as a function of the treatment. As the plot shows, the mediating effect is relatively flat, +although interesting there is a hint of an increase as income increases relative to the povetry line. + +.. image:: ../imgs/full_example/mediation_curve.png + + +References +---------- + +Centers for Disease Control and Prevention. NHANES III (1988-1994). +https://wwwn.cdc.gov/nchs/nhanes/nhanes3/default.aspx. Accessed on July 2, 2020. + +Centers for Disease Control and Prevention. Blood Lead Levels in Children. +https://www.cdc.gov/nceh/lead/prevention/blood-lead-levels.htm. Accessed on July 2, 2020. + +Environmental Protection Agency. Learn about Lead. https://www.epa.gov/lead/learn-about-lead. +Accessed on July 2, 2020. + +Pirkle JL, Kaufmann RB, Brody DJ, Hickman T, Gunter EW, Paschal DC. Exposure of the +U.S. population to lead, 1991-1994. Environmental Health Perspectives, 106(11), 1998, pp. 745–750. + +Lanphear BP, Dietrich K, Auinger P, Cox C. Cognitive Deficits Associated with +Blood Lead Concentrations <10 pg/dL in US Children and Adolescents. +In: Public Health Reports, 115, 2000, pp.521-529. diff --git a/docs/index.rst b/docs/index.rst index b7aed64..594c9f6 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,22 +9,31 @@ Welcome to causal-curve's documentation! install contribute + .. toctree:: - :maxdepth: 1 - :hidden: - :caption: Module details + :maxdepth: 1 + :hidden: + :caption: End-to-end demonstration - modules + full_example .. toctree:: - :maxdepth: 1 - :hidden: - :caption: Tutorial - Examples + :maxdepth: 1 + :hidden: + :caption: Tutorials of Individual Tools + + GPS_example + TMLE_example + Mediation_example + +.. toctree:: + :maxdepth: 1 + :hidden: + :caption: Module details + + modules - GPS_example - TMLE_example - Mediation_example .. toctree:: :maxdepth: 1 diff --git a/examples/NHANES_BLL_example.ipynb b/examples/NHANES_BLL_example.ipynb new file mode 100644 index 0000000..67ea65e --- /dev/null +++ b/examples/NHANES_BLL_example.ipynb @@ -0,0 +1,750 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Full `causal-curve` tutorial: analyzing the causal impact of reducing blood lead levels in children on achievement and cognitive scores\n", + "All NHANES III data obtained here: https://wwwn.cdc.gov/nchs/nhanes/nhanes3/datafiles.aspx#core" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from os.path import expanduser\n", + "\n", + "import matplotlib.pyplot as plt\n", + "import numpy as np\n", + "import pandas as pd\n", + "from scipy.interpolate import interp1d\n", + "\n", + "from causal_curve import GPS\n", + "from causal_curve import Mediation\n", + "\n", + "%matplotlib inline\n", + "pd.options.mode.chained_assignment = None\n", + "plt.rcParams['figure.dpi'] = 200\n", + "plt.rcParams['figure.figsize'] = [5, 4]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Household Youth Data" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# Location of columns in ASCII text file\n", + "cols = [\n", + " (0,5), # Sequence number, SEQN, columns 1-5, page \n", + " (5,10), # Family sequence number, DMPFSEQ, columns 6-10, page\n", + " (10,11), # Examination/interview status, DMPSTAT, columns 11, page \n", + " (11,12), # Race-ethnicity, DMARETHN, columns 12, page\n", + " (12,13), # Race, DMARACER, columns 13, page\n", + " (13,14), # Ethnicity, DMAETHNR, columns 14, page\n", + " (14,15), # Sex, HSSEX, columns 15, page\n", + " (20,24), # Age in months, HSAITMOR, columns 21-24, page \n", + " (35,41), # Poverty Income Ratio, DMPPIR, columns 36-41, page \n", + " (1291,1292), # persons who smoke cigarettes in home, HFF1, columns 1292, page \n", + " (1312,1313), # Do you have enough food to eat, sometimes not enough to eat, or often not enough to eat?, HFF4, columns 1313, page \n", + " (1358,1360), # Highest grade or yr of school completed, HFHEDUCR, columns 1359-1360, page \n", + " (1378,1379), # Did mother smoke while pregnant with SP, HYA3, columns 1379, page\n", + " (1382,1383), # Did SP receive newborn intensive care, HYA6, columns 1383, page\n", + "]\n", + "\n", + "column_names = [\n", + " 'SEQN',\n", + " 'FAMILY_SEQN',\n", + " 'STATUS',\n", + " 'RACE_ETH',\n", + " 'RACE',\n", + " 'ETH',\n", + " 'SEX',\n", + " 'AGE',\n", + " 'PIR',\n", + " 'SMOKE_HOME',\n", + " 'FOOD',\n", + " 'EDU',\n", + " 'SMOKE_PREG',\n", + " 'BABY_NICU'\n", + "]\n", + "\n", + "raw_youth_file = []\n", + "with open(expanduser('~/Desktop/NHANES_III/youth.dat'), 'r') as f:\n", + " for line in f.readlines():\n", + " raw_youth_file.append([line[c[0]:c[1]] for c in cols])\n", + "\n", + "f.close()\n", + "\n", + "raw_youth_df = pd.DataFrame(raw_youth_file, columns = column_names)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Examination Data" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# NEED TO PARSE THIS ASCII FILE\n", + "# See: https://stackoverflow.com/questions/45286642/reading-values-from-a-text-file-using-specific-column-numbers-in-python\n", + "\n", + "# Location of columns\n", + "cols = [\n", + " (0,5), # Sequence number, SEQN, columns 1-5, page \n", + " (5,10), # Family sequence number, DMPFSEQ, columns 6-10, page\n", + " (10,11), # Examination/interview status, DMPSTAT, columns 11, page \n", + " (4432,4434), # WISC/WRAT Math scaled score, WWPMSCSR, columns 4433-4434, page\n", + " (4434, 4436), # WISC/WRAT Reading scaled score, WWPRSCSR, columns 4435-4436, page\n", + " (4436, 4438), # WISC/WRAT Block design scaled score, WWPBSCSR, columns 4437-4438, page\n", + " (4438, 4440) # WISC/WRAT Digit span scaled score, WWPDSCSR, columns 4439-4440, page\n", + "]\n", + "\n", + "column_names = [\n", + " 'SEQN',\n", + " 'FAMILY_SEQN',\n", + " 'STATUS',\n", + " 'MATH',\n", + " 'READING',\n", + " 'BLOCK',\n", + " 'DIGIT'\n", + "]\n", + "\n", + "raw_exam_file = []\n", + "with open(expanduser('~/Desktop/NHANES_III/exam.dat'), 'r') as f:\n", + " for line in f.readlines():\n", + " raw_exam_file.append([line[c[0]:c[1]] for c in cols])\n", + "\n", + "f.close()\n", + "\n", + "raw_exam_df = pd.DataFrame(raw_exam_file, columns = column_names)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Laboratory Data" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# NEED TO PARSE THIS ASCII FILE\n", + "# See: https://stackoverflow.com/questions/45286642/reading-values-from-a-text-file-using-specific-column-numbers-in-python\n", + "\n", + "# Location of columns\n", + "cols = [\n", + " (0,5), # Sequence number, SEQN, columns 1-5, page \n", + " (5,10), # Family sequence number, DMPFSEQ, columns 6-10, page\n", + " (10,11), # Examination/interview status, DMPSTAT, columns 11, page \n", + " (1422,1426), # Lead (ug/dL), PBP, columns 1423-1426, page \n", + "]\n", + "\n", + "column_names = [\n", + " 'SEQN',\n", + " 'FAMILY_SEQN',\n", + " 'STATUS',\n", + " 'BLL'\n", + "]\n", + "\n", + "raw_lab_file = []\n", + "with open(expanduser('~/Desktop/NHANES_III/lab.dat'), 'r') as f:\n", + " for line in f.readlines():\n", + " raw_lab_file.append([line[c[0]:c[1]] for c in cols])\n", + "\n", + "f.close()\n", + "\n", + "raw_lab_df = pd.DataFrame(raw_lab_file, columns = column_names)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Merge these together" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "raw_merged_df = raw_youth_df.merge(\n", + " raw_exam_df.drop(['FAMILY_SEQN', 'STATUS'], axis = 1), how = \"left\", on = \"SEQN\"\n", + ").merge(\n", + " raw_lab_df.drop(['FAMILY_SEQN', 'STATUS'], axis = 1), how = \"left\", on = \"SEQN\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Column formatting and some subsetting" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "# STATUS\n", + "raw_merged_df['STATUS'] = raw_merged_df['STATUS'].str.replace(\"1\", \"No_exam\")\n", + "raw_merged_df['STATUS'] = raw_merged_df['STATUS'].str.replace(\"2\", \"MEC_exam\")\n", + "raw_merged_df['STATUS'] = raw_merged_df['STATUS'].str.replace(\"3\", \"Home_exam\")\n", + "\n", + "# RACE_ETH\n", + "raw_merged_df['RACE_ETH'] = raw_merged_df['RACE_ETH'].str.replace(\"1\", \"NH_White\")\n", + "raw_merged_df['RACE_ETH'] = raw_merged_df['RACE_ETH'].str.replace(\"2\", \"NH_Black\")\n", + "raw_merged_df['RACE_ETH'] = raw_merged_df['RACE_ETH'].str.replace(\"3\", \"Mex_Am\")\n", + "raw_merged_df['RACE_ETH'] = raw_merged_df['RACE_ETH'].str.replace(\"4\", \"Other\")\n", + "\n", + "# RACE\n", + "raw_merged_df['RACE'] = raw_merged_df['RACE'].str.replace(\"1\", \"White\")\n", + "raw_merged_df['RACE'] = raw_merged_df['RACE'].str.replace(\"2\", \"Black\")\n", + "raw_merged_df['RACE'] = raw_merged_df['RACE'].str.replace(\"3\", \"Other\")\n", + "raw_merged_df['RACE'] = raw_merged_df['RACE'].str.replace(\"8\", \"Mex_Am\")\n", + "\n", + "# ETH\n", + "raw_merged_df['ETH'] = raw_merged_df['ETH'].str.replace(\"1\", \"Mex_Am\")\n", + "raw_merged_df['ETH'] = raw_merged_df['ETH'].str.replace(\"2\", \"Other_Hisp\")\n", + "raw_merged_df['ETH'] = raw_merged_df['ETH'].str.replace(\"3\", \"Not_Hisp\")\n", + "\n", + "# SEX\n", + "raw_merged_df['SEX'] = raw_merged_df['SEX'].str.replace(\"1\", \"Male\")\n", + "raw_merged_df['SEX'] = raw_merged_df['SEX'].str.replace(\"2\", \"Female\")\n", + "\n", + "# AGE\n", + "raw_merged_df['AGE'] = raw_merged_df['AGE'].astype(float) / 12\n", + "\n", + "# PIR\n", + "raw_merged_df['PIR'] = raw_merged_df['PIR'].astype(float) \n", + "raw_merged_df['PIR'][raw_merged_df['PIR'] == 888888.000] = np.nan\n", + "\n", + "# EDU\n", + "raw_merged_df['EDU'] = raw_merged_df['EDU'].astype(int)\n", + "raw_merged_df['EDU'][raw_merged_df['EDU'] == 88] = None\n", + "raw_merged_df['EDU'][raw_merged_df['EDU'] == 99] = None\n", + "\n", + "raw_merged_df['EDU_CAT'] = np.where(raw_merged_df['EDU'] < 9, 'LT_HS', \n", + " np.where(\n", + " ((raw_merged_df['EDU'] >= 9) & (raw_merged_df['EDU'] < 12)), 'HS', \n", + " np.where(raw_merged_df['EDU'] == 12, 'GRAD_HS', None)\n", + " )\n", + ")\n", + "\n", + "# SMOKE_HOME\n", + "raw_merged_df['SMOKE_HOME'] = raw_merged_df['SMOKE_HOME'].str.replace(\"1\", \"Yes\")\n", + "raw_merged_df['SMOKE_HOME'] = raw_merged_df['SMOKE_HOME'].str.replace(\"2\", \"No\")\n", + "raw_merged_df['SMOKE_HOME'] = raw_merged_df['SMOKE_HOME'].str.replace(\"8\", \"None\")\n", + "\n", + "# FOOD\n", + "raw_merged_df['FOOD'] = raw_merged_df['FOOD'].str.replace(\"1\", \"Good\")\n", + "raw_merged_df['FOOD'] = raw_merged_df['FOOD'].str.replace(\"2\", \"Sometimes_bad\")\n", + "raw_merged_df['FOOD'] = raw_merged_df['FOOD'].str.replace(\"3\", \"Often_bad\")\n", + "raw_merged_df['FOOD'] = raw_merged_df['FOOD'].str.replace(\"8\", \"None\")\n", + "\n", + "# SMOKE_PREG\n", + "raw_merged_df['SMOKE_PREG'] = raw_merged_df['SMOKE_PREG'].str.replace(\"1\", \"Yes\")\n", + "raw_merged_df['SMOKE_PREG'] = raw_merged_df['SMOKE_PREG'].str.replace(\"2\", \"No\")\n", + "raw_merged_df['SMOKE_PREG'] = raw_merged_df['SMOKE_PREG'].str.replace(\"8\", \"None\")\n", + "raw_merged_df['SMOKE_PREG'] = raw_merged_df['SMOKE_PREG'].str.replace(\" \", \"None\")\n", + "\n", + "# BABY_NICU\n", + "raw_merged_df['BABY_NICU'] = raw_merged_df['BABY_NICU'].str.replace(\"1\", \"Yes\")\n", + "raw_merged_df['BABY_NICU'] = raw_merged_df['BABY_NICU'].str.replace(\"2\", \"No\")\n", + "raw_merged_df['BABY_NICU'] = raw_merged_df['BABY_NICU'].str.replace(\"8\", \"None\")\n", + "raw_merged_df['BABY_NICU'] = raw_merged_df['BABY_NICU'].str.replace(\"9\", \"None\")\n", + "raw_merged_df['BABY_NICU'] = raw_merged_df['BABY_NICU'].str.replace(\" \", \"None\")\n", + "\n", + "# Drop Nans at this point\n", + "raw_merged_df = raw_merged_df.dropna()\n", + "\n", + "# MATH\n", + "raw_merged_df['MATH'] = raw_merged_df['MATH'].str.replace(\"NaN\", \"\")\n", + "raw_merged_df['MATH'] = raw_merged_df['MATH'].str.replace(\" \", \"\")\n", + "raw_merged_df['MATH'] = raw_merged_df['MATH'].str.replace(\"88\", \"\")\n", + "raw_merged_df = raw_merged_df[raw_merged_df['MATH'] != '']\n", + "raw_merged_df['MATH'] = raw_merged_df['MATH'].astype(float)\n", + "\n", + "# READING\n", + "raw_merged_df['READING'] = raw_merged_df['READING'].str.replace(\"NaN\", \"\")\n", + "raw_merged_df['READING'] = raw_merged_df['READING'].str.replace(\" \", \"\")\n", + "raw_merged_df['READING'] = raw_merged_df['READING'].str.replace(\"88\", \"\")\n", + "raw_merged_df = raw_merged_df[raw_merged_df['READING'] != '']\n", + "raw_merged_df['READING'] = raw_merged_df['READING'].astype(float)\n", + "\n", + "# BLOCK\n", + "raw_merged_df['BLOCK'] = raw_merged_df['BLOCK'].str.replace(\"NaN\", \"\")\n", + "raw_merged_df['BLOCK'] = raw_merged_df['BLOCK'].str.replace(\" \", \"\")\n", + "raw_merged_df['BLOCK'] = raw_merged_df['BLOCK'].str.replace(\"88\", \"\")\n", + "raw_merged_df = raw_merged_df[raw_merged_df['BLOCK'] != '']\n", + "raw_merged_df['BLOCK'] = raw_merged_df['BLOCK'].astype(float)\n", + "\n", + "# DIGIT\n", + "raw_merged_df['DIGIT'] = raw_merged_df['DIGIT'].str.replace(\"NaN\", \"\")\n", + "raw_merged_df['DIGIT'] = raw_merged_df['DIGIT'].str.replace(\" \", \"\")\n", + "raw_merged_df['DIGIT'] = raw_merged_df['DIGIT'].str.replace(\"88\", \"\")\n", + "raw_merged_df = raw_merged_df[raw_merged_df['DIGIT'] != '']\n", + "raw_merged_df['DIGIT'] = raw_merged_df['DIGIT'].astype(float)\n", + "\n", + "# BLL\n", + "raw_merged_df = raw_merged_df[raw_merged_df['BLL'] != '']\n", + "raw_merged_df = raw_merged_df[raw_merged_df['BLL'] != '8888']\n", + "raw_merged_df['BLL'] = raw_merged_df['BLL'].str.replace(\"000\", \"\")\n", + "raw_merged_df['BLL'] = raw_merged_df['BLL'].str.replace(\"00\", \"0\")\n", + "raw_merged_df['BLL'] = raw_merged_df['BLL'].str.lstrip(\"0\")\n", + "raw_merged_df['BLL'] = raw_merged_df['BLL'].astype(float, errors = 'ignore') \n", + "raw_merged_df['BLL'] = raw_merged_df['BLL'].str.replace(\".7\", \"0.7\")\n", + "raw_merged_df['BLL'] = raw_merged_df['BLL'].str.replace(\" \", \"\")\n", + "\n", + "raw_merged_df['BLL'] = pd.to_numeric(raw_merged_df['BLL'], errors = 'coerce')\n", + "\n", + "# Once again, remove any 'None' values\n", + "raw_merged_df = raw_merged_df.dropna()\n", + "\n", + "raw_merged_df = raw_merged_df[\n", + " ((raw_merged_df['SMOKE_HOME'] != 'None') & (raw_merged_df['FOOD'] != 'None') & (raw_merged_df['SMOKE_PREG'] != 'None') & (raw_merged_df['BABY_NICU'] != 'None'))\n", + "]\n", + "\n", + "format_merged_df = raw_merged_df.drop(['FAMILY_SEQN', 'STATUS', 'RACE', 'ETH', 'EDU'], axis = 1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Making dummy vars, prepping for causal inference" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "final_df = pd.concat(\n", + " [\n", + " pd.get_dummies(format_merged_df[\"RACE_ETH\"], prefix='Race', drop_first=True),\n", + " pd.get_dummies(format_merged_df[\"EDU_CAT\"], prefix='Edu', drop_first=True),\n", + " pd.get_dummies(format_merged_df[\"SEX\"], prefix='Sex', drop_first=True),\n", + " format_merged_df['AGE'].rename('Age'),\n", + " format_merged_df['PIR'].rename('PIR'),\n", + " pd.get_dummies(format_merged_df[\"SMOKE_HOME\"], prefix='Smoke_Home', drop_first=True),\n", + " pd.get_dummies(format_merged_df[\"FOOD\"], prefix='Food', drop_first=True),\n", + " pd.get_dummies(format_merged_df[\"SMOKE_PREG\"], prefix='Smoke_Preg', drop_first=True),\n", + " pd.get_dummies(format_merged_df[\"BABY_NICU\"], prefix='Baby_NICU', drop_first=True),\n", + " format_merged_df['MATH'].rename('Math'),\n", + " format_merged_df['READING'].rename('Reading'),\n", + " format_merged_df['BLOCK'].rename('Block'),\n", + " format_merged_df['DIGIT'].rename('Digit'),\n", + " format_merged_df['BLL']\n", + " ]\n", + " , axis = 1\n", + ")\n", + "\n", + "\n", + "# Let's only focus on BLLs less than 25 mg/dL. Anything above 5 mg/dL is considered elevated.\n", + "final_df = final_df[final_df['BLL'] <= 25]\n", + "\n", + "# Reset index\n", + "final_df.reset_index(drop = True, inplace = True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exploring the key distributions" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Blood lead levels are log-normally distributed (an expected result...)\n", + "\n", + "ax = plt.subplot(111) \n", + "final_df['BLL'].plot.hist(bins = 30, rwidth=0.9, color = 'steelblue')\n", + "ax.spines[\"top\"].set_visible(False) \n", + "ax.spines[\"right\"].set_visible(False) \n", + "ax.get_xaxis().tick_bottom() \n", + "ax.get_yaxis().tick_left()\n", + "ax.set_ylabel('Frequency')\n", + "ax.set_xlabel('Blood lead (ug/dL)')\n", + "ax.set_title(\"Blood lead distribution\", fontsize = 11)\n", + "plt.tight_layout()\n", + "plt.savefig('BLL_dist.png', bbox_inches='tight', dpi = 300)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# See distribution of the scaled test scores\n", + "\n", + "fig, axs = plt.subplots(2, 2)\n", + "final_df['Math'].plot.hist(ax=axs[0,0], bins = 15, rwidth=0.9, color = 'steelblue')\n", + "final_df['Reading'].plot.hist(ax=axs[0,1], bins = 15, rwidth=0.9, color = 'steelblue')\n", + "final_df['Block'].plot.hist(ax=axs[1,0], bins = 15, rwidth=0.9, color = 'steelblue')\n", + "final_df['Digit'].plot.hist(ax=axs[1,1], bins = 15, rwidth=0.9, color = 'steelblue')\n", + "axs[0,0].set_ylabel('Frequency')\n", + "axs[0,1].set_ylabel('')\n", + "axs[1,0].set_ylabel('Frequency')\n", + "axs[1,1].set_ylabel('')\n", + "axs[1,0].set_xlabel('Blood Lead (ug/dL)')\n", + "axs[1,1].set_xlabel('Blood Lead (ug/dL)')\n", + "axs[0,0].set_title(\"Math\", fontsize = 8)\n", + "axs[0,1].set_title(\"Reading\", fontsize = 8)\n", + "axs[1,0].set_title(\"Block\", fontsize = 8)\n", + "axs[1,1].set_title(\"Digit\", fontsize = 8)\n", + "\n", + "for i in [0,1]:\n", + " for j in [0,1]:\n", + " axs[i,j].spines[\"top\"].set_visible(False)\n", + " axs[i,j].spines[\"right\"].set_visible(False)\n", + " axs[i,j].tick_params(axis='both', which='major', labelsize=6)\n", + "\n", + "fig.tight_layout(rect=[0, 0.03, 1, 0.95])\n", + "plt.suptitle(\"Distributions of scaled test scores\", fontsize = 10)\n", + "fig.savefig('test_dist.png', bbox_inches='tight', dpi = 300)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Perform causal inference" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [], + "source": [ + "# Dictionary to store model results\n", + "results_dict = {}\n", + "\n", + "# Potential confounders\n", + "potential_confounders = [\n", + " 'Age', 'Sex_Male', 'Race_NH_Black', 'Race_NH_White', 'Race_Other', \n", + " 'Edu_HS', 'Edu_LT_HS', 'Smoke_Home_Yes', 'Baby_NICU_Yes', 'Food_Often_bad', 'Food_Sometimes_bad'\n", + "]\n", + "\n", + "\n", + "# Try the MATH model\n", + "math_gps = GPS(gps_family='normal', lower_grid_constraint = 0.0, upper_grid_constraint = 0.99, n_splines=10, verbose=False)\n", + "math_gps.fit(\n", + " T=final_df['BLL'], \n", + " X=final_df[potential_confounders], \n", + " y=final_df['Math']\n", + ")\n", + "\n", + "results_dict['math_CDRC'] = math_gps.calculate_CDRC()\n", + "\n", + "\n", + "# Try the READING model\n", + "reading_gps = GPS(gps_family='normal', lower_grid_constraint = 0.0, upper_grid_constraint = 0.99, n_splines=10, verbose=False)\n", + "\n", + "reading_gps.fit(\n", + " T=final_df['BLL'], \n", + " X=final_df[potential_confounders], \n", + " y=final_df['Reading']\n", + ")\n", + "\n", + "results_dict['reading_CDRC'] = reading_gps.calculate_CDRC()\n", + "\n", + "\n", + "\n", + "# Try the Block model\n", + "block_gps = GPS(gps_family='normal', lower_grid_constraint = 0.0, upper_grid_constraint = 0.99, n_splines=10, verbose=False)\n", + "\n", + "block_gps.fit(\n", + " T=final_df['BLL'], \n", + " X=final_df[potential_confounders], \n", + " y=final_df['Block']\n", + ")\n", + "\n", + "results_dict['block_CDRC'] = block_gps.calculate_CDRC()\n", + "\n", + "\n", + "\n", + "# Try the Digit model\n", + "digit_gps = GPS(gps_family='normal', lower_grid_constraint = 0.0, upper_grid_constraint = 0.99, n_splines=10, verbose=False)\n", + "\n", + "digit_gps.fit(\n", + " T=final_df['BLL'], \n", + " X=final_df[potential_confounders], \n", + " y=final_df['Digit']\n", + ")\n", + "\n", + "results_dict['digit_CDRC'] = digit_gps.calculate_CDRC()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Plot causal inference results" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "result_class = [['math_CDRC', 'reading_CDRC'], ['block_CDRC', 'digit_CDRC']]\n", + "\n", + "result_name = [['Math', 'Reading'], ['Block Design', 'Digit Spanning']]\n", + "\n", + "def plot_mean_and_CI(axs, i, j, treatment, mean, lb, ub, color_mean=None, color_shading=None):\n", + " # plot the shaded range of the confidence intervals\n", + " axs[i,j].fill_between(treatment, lb, ub, color=color_shading, alpha=0.3)\n", + " # plot the mean on top\n", + " axs[i,j].plot(treatment, mean, color_mean, linewidth=0.75)\n", + "\n", + "plt.rcParams['figure.dpi'] = 200\n", + "plt.rcParams['figure.figsize'] = [6, 5]\n", + "\n", + "fig, axs = plt.subplots(2, 2)\n", + "\n", + "for i in [0,1]:\n", + " for j in [0,1]:\n", + "\n", + " # Plotting quantities\n", + " treat = results_dict[result_class[i][j]]['Treatment']\n", + " mean = results_dict[result_class[i][j]]['CDRC']\n", + " lb = results_dict[result_class[i][j]]['Lower_CI']\n", + " ub = results_dict[result_class[i][j]]['Upper_CI']\n", + " plot_mean_and_CI(axs, i, j, treat, mean, lb, ub, color_mean='b', color_shading='b')\n", + "\n", + " # Labels\n", + " axs[0,0].set_ylabel('Scaled Test Score', fontsize = 8)\n", + " axs[0,1].set_ylabel('')\n", + " axs[1,0].set_ylabel('Scaled Test Score', fontsize = 8)\n", + " axs[1,1].set_ylabel('')\n", + " axs[1,0].set_xlabel('Blood Lead (ug/dL)', fontsize = 8)\n", + " axs[1,1].set_xlabel('Blood Lead (ug/dL)', fontsize = 8)\n", + "\n", + " axs[i,j].set_title(result_name[i][j], fontsize = 8)\n", + " axs[i,j].set_title(result_name[i][j], fontsize = 8)\n", + " axs[i,j].set_title(result_name[i][j], fontsize = 8)\n", + " axs[i,j].set_title(result_name[i][j], fontsize = 8)\n", + "\n", + " axs[i,j].spines[\"top\"].set_visible(False)\n", + " axs[i,j].spines[\"right\"].set_visible(False)\n", + "\n", + " axs[i,j].set_xlim(0, 10)\n", + " axs[i,j].set_ylim(0, 15)\n", + " \n", + " axs[i,j].tick_params(axis='both', which='major', labelsize=6)\n", + "\n", + "\n", + "fig.tight_layout(rect=[0, 0.03, 1, 0.95])\n", + "plt.suptitle(\"Test Performance Causal Curves (with 95% CIs)\", fontsize = 10)\n", + "fig.savefig('test_causal_curves.png', bbox_inches='tight', dpi = 300)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exploring how BLLs mediate the relationship between income and cognitive outcomes" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Using the following params for the mediation analysis:\n", + "{ 'bootstrap_draws': 500,\n", + " 'bootstrap_replicates': 100,\n", + " 'lower_grid_constraint': 0.01,\n", + " 'max_iter': 100,\n", + " 'n_splines': 5,\n", + " 'random_seed': None,\n", + " 'spline_order': 3,\n", + " 'treatment_grid_num': 10,\n", + " 'upper_grid_constraint': 0.99,\n", + " 'verbose': True}\n", + "Beginning main loop through treatment bins...\n", + "***** Starting iteration 1 of 9 *****\n", + "***** Starting iteration 2 of 9 *****\n", + "***** Starting iteration 3 of 9 *****\n", + "***** Starting iteration 4 of 9 *****\n", + "***** Starting iteration 5 of 9 *****\n", + "***** Starting iteration 6 of 9 *****\n", + "***** Starting iteration 7 of 9 *****\n", + "***** Starting iteration 8 of 9 *****\n", + "***** Starting iteration 9 of 9 *****\n", + "\n", + "\n", + "Mean indirect effect proportion:\n", + " 0.1952 (0.1663 - 0.2295)\n", + " \n" + ] + } + ], + "source": [ + "med = Mediation(\n", + " bootstrap_draws=500,\n", + " bootstrap_replicates=100,\n", + " spline_order=3,\n", + " n_splines=5,\n", + " verbose=True,\n", + ")\n", + "\n", + "med.fit(\n", + " T=final_df[\"PIR\"],\n", + " M=final_df[\"BLL\"],\n", + " y=final_df[\"Math\"],\n", + ")\n", + "\n", + "med_results = med.calculate_mediation(ci = 0.95)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "# Use cubic interpolation to create plot of relationship between poverty-income ratio and the proportion of mediation by BLLs\n", + "f = interp1d(med_results['Treatment_Value'], med_results['Proportion_Indirect_Effect'], kind='cubic')\n", + "PIR_grid = np.linspace(0.23, 3.10, num=100, endpoint=True)" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "ax = plt.subplot(111) \n", + "ax.plot(PIR_grid, f(PIR_grid), color = 'steelblue')\n", + "ax.spines[\"top\"].set_visible(False) \n", + "ax.spines[\"right\"].set_visible(False) \n", + "ax.get_xaxis().tick_bottom() \n", + "ax.get_yaxis().tick_left()\n", + "ax.set_ylabel('Proportion Indirect Effect', fontsize = 8)\n", + "ax.set_xlabel('Poverty to Income Ratio', fontsize = 8)\n", + "ax.set_ylim(0,1)\n", + "ax.set_title(\"Mediation effect of blood lead on PIR\", fontsize = 10)\n", + "ax.tick_params(axis='both', which='major', labelsize=8)\n", + "plt.savefig('mediation_curve.png', bbox_inches='tight', dpi = 300)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "ADRF", + "language": "python", + "name": "adrf" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.6" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/imgs/full_example/BLL_dist.png b/imgs/full_example/BLL_dist.png new file mode 100644 index 0000000..382a6db Binary files /dev/null and b/imgs/full_example/BLL_dist.png differ diff --git a/imgs/full_example/lead_paint_can.jpeg b/imgs/full_example/lead_paint_can.jpeg new file mode 100644 index 0000000..d72283a Binary files /dev/null and b/imgs/full_example/lead_paint_can.jpeg differ diff --git a/imgs/full_example/mediation_curve.png b/imgs/full_example/mediation_curve.png new file mode 100644 index 0000000..d2f6361 Binary files /dev/null and b/imgs/full_example/mediation_curve.png differ diff --git a/imgs/full_example/test_causal_curves.png b/imgs/full_example/test_causal_curves.png new file mode 100644 index 0000000..62bde1f Binary files /dev/null and b/imgs/full_example/test_causal_curves.png differ diff --git a/imgs/full_example/test_dist.png b/imgs/full_example/test_dist.png new file mode 100644 index 0000000..b9a1c13 Binary files /dev/null and b/imgs/full_example/test_dist.png differ diff --git a/paper/paper.bib b/paper/paper.bib new file mode 100644 index 0000000..12c9f85 --- /dev/null +++ b/paper/paper.bib @@ -0,0 +1,61 @@ + + +@book{Galagate:2016, + Adsurl = {https://drum.lib.umd.edu/handle/1903/18170}, + Author = {{Galagate}, D.}, + Title = {Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response function with Applications.}, + Booktitle = {Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response function with Applications.}, + Publisher = {Digital Repository at the University of Maryland}, + Year = 2016 +} + +@article{Moodie:2010, + author = {{Moodie}, E. and {Stephen}, D.}, + title = "{Estimation of dose–response functions for longitudinal data using the generalised propensity score.}", + journal = {Statistical Methods in Medical Research}, + year = 2010, + volume = 21, + doi = {10.1177/0962280209340213}, +} + +@book{Hirano:2004, + Author = {{Hirano}, K. and {Imbens}, G.}, + Booktitle = {Applied bayesian modeling and causal inference from incomplete-data perspectives, by Gelman A and Meng XL. ~Published by Wiley, Oxford, UK.}, + Publisher = {Wiley}, + Title = {{The propensity score with continuous treatments}}, + Year = 2004 +} + +@article{van_der_Laan:2010, + author = {{van der Laan}, M. and {Gruber}, S.}, + title = "{Collaborative double robust penalized targeted maximum likelihood estimation.}", + journal = {The International Journal of Biostatistics}, + year = 2010, + volume = 6, + doi = {10.2202/1557-4679.1181}, +} + +@article{van_der_Laan:2006, + author = {{van der Laan}, M. and {Rubin}, D.}, + title = "{Targeted maximum likelihood learning.}", + journal = {The International Journal of Biostatistics}, + year = 2006, + volume = 2, +} + +@article{Imai:2010, + author = {{Imai}, K., {Keele}, L., and {Tingley}, D.}, + title = "{A General Approach to Causal Mediation Analysis.}", + journal = {Psychological Methods}, + year = 2010, + volume = 15, + doi = {10.1037/a0020761} +} + +@book{Hernán:2020, + Author = {{Hernán}, M. and {Robins}, J.}, + Booktitle = {Causal Inference: What If.}, + Publisher = {Chapman & Hall}, + Title = {{Causal Inference: What If.}}, + Year = 2020 +} diff --git a/paper/paper.md b/paper/paper.md new file mode 100644 index 0000000..7a4b647 --- /dev/null +++ b/paper/paper.md @@ -0,0 +1,123 @@ +--- +title: 'causal-curve: A Python Causal Inference Package to Estimate Causal Dose-Response Curves' +tags: + - Python + - causal inference + - causality + - machine learning + +authors: + - name: Roni W. Kobrosly^[Custom footnotes for e.g. denoting who the corresspoinding author is can be included like this.] + orcid: 0000-0003-0363-9662 + affiliation: "1, 2" # (Multiple affiliations must be quoted) +affiliations: + - name: Department of Environmental Medicine and Public Health, Icahn School of Medicine at Mount Sinai, New York, NY, USA + index: 1 + - name: Flowcast, 44 Tehama St, San Francisco, CA, USA + index: 2 +date: 1 July 2020 +bibliography: paper.bib + +--- + +# Summary + +In academia and industry, randomized controlled experiments (colloquially "A/B tests") +are considered the gold standard approach for assessing the impact of a treatment or intervention. +However, for ethical or financial reasons, these experiments may not always be feasible to carry out. +"Causal inference" methods are a set of approaches that attempt to estimate causal effects +from observational rather than experimental data, correcting for the biases that are inherent +to analyzing observational data (e.g. confounding and selection bias) [@Hernán:2020]. + +Although much significant research and implementation effort has gone towards methods in +causal inference to estimate the effects of binary treatments (e.g. did the population receive +treatment "A" or "B"), much less has gone towards estimating the effects of continuous treatments. +This is unfortunate because there are there are a large number of use cases in research +and industry that could benefit from tools to estimate the effect of +continuous treatments, such as estimating how: + +- the number of minutes per week of aerobic exercise causes positive health outcomes, +after controlling for confounding effects. +- increasing or decreasing the price of a product would impact demand (price elasticity). +- changing neighborhood income inequality (as measured by the continuous Gini index) +might or might not be causally related to the neighborhood crime rate. +- blood lead levels are causally related to neurodevelopment delays in children. + +`causal-curve` is a Python package created to address this gap; it is designed to perform +causal inference when the treatment of interest is continuous in nature. +From the observational data that is provided by the user, it estimates the +"causal dose-response curve" (or simply the "causal curve"). + +In the current release of the package there are two unique model classes for +constructing the causal dose-response curve: the Generalized Propensity Score (GPS) and the +Targetted Maximum Likelihood Estimation (TMLE) tools. There is also tool +to assess causal mediation effects in the presence of a continuous mediator and treatment. + +`causal-curve` attempts to make the user-experience as painless as possible: + +- This package's API was designed to resemble that of `scikit-learn`, as this is a commonly +used Python predictive modeling framework that most machine learning practioners are familiar with. +- All of the major classes contained in `causal-curve` readily use Pandas DataFrames and Series as +inputs, to make this package more easily integrate with the standard Python data analysis tools. +- A full, end-to-end example of applying the package to a causal inference problem (the analysis of health data) +is provided. In addition to this, there are shorter tutorials for each of the three major classes are available online in the documentation, along with full documentation of all of their parameters, methods, and attributes. + +This package includes a suite of unit and integration tests made using the pytest framework. The +repo containing the latest project code is integrated with TravisCI for continuous integration. Code +coverage is monitored via codecov and is presently above 90%. + + +# Methods + +The `GPS` method was originally described by Hirano [@Hirano:2004], +and expanded by Moodie [@Moodie:2010] and more recently by Galagate [@Galagate:2016]. GPS is +an extension of the standard propensity tool method. It is the treatment assignment density calculated +at a particular treatment (and covariate) value. Similar to the standard propensity score approach, +the GPS random variable is used to balance covariates. At the core of this tool, generalized linear +models are used to estimate the GPS, and generalized additive models are used to estimate the +final causal curve. Compared with the package’s TMLE method, +this GPS method is more computationally efficient, better suited for large datasets, +but produces significantly wider confidence intervals. + + +![Example of a causal curve generated by the GPS tool.\label{fig:example}](welcome_plot.png) + + +The `TMLE` method is based on van der Laan's work on an approach to causal inference that would +employ powerful machine learning approaches to estimate a causal effect [@van_der_Laan:2010] [@van_der_Laan:2006]. +TMLE involves, predicting the outcome from the treatment and covariates using a machine learning model, +then predicting treatment assignment from the covariates, and employs a substitution “targeting” +step correct for covariate imbalance and to estimate an unbiased causal effect. +Currently, there is no implementation of TMLE that is suitable for continuous treatments, so the +implemention in `causal-curve` constructs as series of binary treatment comparisons across the +user-specified range of treatment values, and then connects these binary estimates to construct +the final causal curve. Compared with the package’s GPS method, this TMLE method is double robust +against model misspecification, incorporates more powerful machine learning techniques internally, produces significantly +smaller confidence intervals, however it is less computationally efficient. + +`causal-curve` allows for continuous mediation assessment with the `Mediation` tool. As described +by Imai this approach provides a general approach to mediation analysis that invokes the +potential outcomes / counterfactual framework [@Imai:2010]. While this approach can handle a +continuous mediator and outcome, as put forward by Imai it only allows for a binary treatment. As +mentioned above with the `TMLE` approach, the tool creates a series of binary treatment comparisons +and connects them to show the user how mediation varies as a function of the treatment. An interpretable, +overall mediation percentage is provided as well. + + +# Statement of Need + +While there are a few established Python packages related to causal inference, to the best of +the author's knowledge, there is no Python package available that can provide support for +continuous treatments as `causal-curve` does. Similarly, the author isn't aware of any Python +implementation of a causal mediation analysis for continuous treatments and mediators. Finally, +the tutorials available in the documentation introduce the concept of continuous treatments +and are instructive as to how the results of their analysis should be interpretted. + + +# Acknowledgements + +We acknowledge contributions from Miguel-Angel Luque, Erica Moodie, and Mark van der Laan +during the creation of this project. + + +# References diff --git a/paper/welcome_plot.png b/paper/welcome_plot.png new file mode 100644 index 0000000..ca5a59a Binary files /dev/null and b/paper/welcome_plot.png differ diff --git a/setup.py b/setup.py index 1c15218..b05d47c 100644 --- a/setup.py +++ b/setup.py @@ -5,7 +5,7 @@ setuptools.setup( name="causal-curve", - version="0.2.4", + version="0.3.0", author="Roni Kobrosly", author_email="roni.kobrosly@gmail.com", description="A python library with tools to perform causal inference using \