Merge pull request #11 from CyberAgentAILab/feat/get-started-doc

Update get-started guide
CyberAgentAILab · Jul 12, 2024 · 3f3dcbc · 3f3dcbc
2 parents c820ee4 + f16e23c
commit 3f3dcbc
Show file tree

Hide file tree

Showing 15 changed files with 284 additions and 253 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,56 @@
+
+# Contribution
+Thank you for considering contributing to this project! Here are some guidelines to help you get started.
+
+## How can I contribute to this project?
+### Reporting Bugs
+If you find a bug, please report it by opening an issue in the issue tracker. Provide as much detail as possible to help us understand and reproduce the issue:
+- A clear and descriptive title.
+- A detailed description of the problem.
+- Steps to reproduce the issue.
+- Any error messages or screenshots.
+
+### Suggesting Enhancements
+We welcome suggestions for improvements! To suggest an enhancement:
+- Check the issue tracker to see if someone else has already suggested it.
+- If not, open a new issue and describe your idea clearly.
+- Explain why you believe the enhancement would be beneficial.
+
+### Pull Requests
+Pull requests are welcome! If you plan to make significant changes, please open an issue first to discuss your idea. This helps us ensure that your contribution fits with the project's direction. Follow these steps for a smooth pull request process:
+
+- Fork the repository.
+- Clone your fork to your local machine.
+- Create a new branch: `git checkout -b my-feature-branch`.
+- Make your changes.
+- Commit your changes: `git commit -m 'Add some feature'.
+- Push to the branch: `git push origin my-feature-branch`.
+- Open a pull request in the original repository.
+
+## Development
+Here are the basic commands you can use to develop this package.
+
+### Install Pipenv
+If you don't have `pipenv` installed, you can install it using `pip`:
+
+```sh
+pip install pipenv
+```
+
+### Linting
+We use `ruff` for linting the code. To run the linter, use the following command:
+```sh
+pipenv run lint
+```
+
+### Auto format
+We use `ruff` for formatting the code. To run the formatter, use the following command:
+```sh
+pipenv run format
+```
+
+### Unit test
+We use `unittest` for testing the code. To run the unit tests, use the following command:
+```sh
+pipenv run unittest
+```
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 ## Overview
 
-This an a Python package for building the regression adjusted distribution function estimator proposed in "Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction".
+This a Python package for building the regression adjusted distribution function estimator proposed in "Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction". For the details of this package, see [the documentation](https://cyberagentailab.github.io/python-dte-adjustment/).
 
 ## Installation
 
@@ -17,29 +17,15 @@ This an a Python package for building the regression adjusted distribution funct
     pip install -e .
     ```
 
+## Basic Usage
+Examples of how to use this package are available in [this Get-started Guide](https://cyberagentailab.github.io/python-dte-adjustment/get_started.html).
+
 ## Development
+We welcome contributions to the project! Please review our [Contribution Guide](CONTRIBUTING.md) for details on how to get started.
+
+## License
+
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
 
-### Install Pipenv
-If you don't have `pipenv` installed, you can install it using `pip`:
-
-```sh
-pip install pipenv
-```
-
-### Linting
-We use `ruff` for linting the code. To run the linter, use the following command:
-```sh
-pipenv run lint
-```
-
-### Auto format
-We use `ruff` for formatting the code. To run the formatter, use the following command:
-```sh
-pipenv run format
-```
-
-### Unit test
-We use `unittest` for testing the code. To run the unit tests, use the following command:
-```sh
-pipenv run unittest
-```
+## Maintainers
+- [Tomu Hirata](https://github.com/TomeHirata)
diff --git a/docs/source/_static/dte_empirical.png b/docs/source/_static/dte_empirical.png
diff --git a/docs/source/_static/dte_moment.png b/docs/source/_static/dte_moment.png
diff --git a/docs/source/_static/dte_simple.png b/docs/source/_static/dte_simple.png
diff --git a/docs/source/_static/dte_uniform.png b/docs/source/_static/dte_uniform.png
diff --git a/docs/source/_static/pte_simple.png b/docs/source/_static/pte_simple.png
diff --git a/docs/source/_static/qte.png b/docs/source/_static/qte.png
diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst
@@ -0,0 +1,4 @@
+Contribution Guide
+==================
+
+Regarding how to contribute to this package, please refer to https://github.com/CyberAgentAILab/python-dte-adjustment/CONTRIBUTING.md for more details.
diff --git a/docs/source/get_started.rst b/docs/source/get_started.rst
@@ -43,11 +43,11 @@ Generate data for training cumulative distribution function:
       quadratic_term = np.dot(X**2, gamma)
       
       # Outcome equation
-      Y = D + linear_term + quadratic_term + U
+      Y = 5 * D + linear_term + quadratic_term + U
       
       return X, D, Y
 
-  n = 100  # Sample size
+  n = 1000  # Sample size
   X, D, Y = generate_data(n)
 
 Then, let's build an empirical cumulative distribution function (CDF).
@@ -63,13 +63,14 @@ Distributional treatment effect (DTE) can be computed easily in the following co
 
 .. code-block:: python
 
-  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=np.sort(Y), variance_type="simple")
+  locations = np.linspace(Y.min(), Y.max(), 20)
+  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="simple")
 
 A convenience function is available to visualize distribution effects. This method can be used for other distribution parameters including Probability Treatment Effect (PTE) and Quantile Treatment Effect (QTE).
 
 .. code-block:: python
 
-  plot(np.sort(Y), dte, lower_bound, upper_bound, title="DTE of simple estimator")
+  plot(locations, dte, lower_bound, upper_bound, title="DTE of simple estimator")
 
 .. image:: _static/dte_empirical.png
    :alt: DTE of empirical estimator
@@ -92,8 +93,8 @@ DTE can be computed and visualized in the following code.
 
 .. code-block:: python
 
-  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=np.sort(Y), variance_type="simple")
-  plot(np.sort(Y), dte, lower_bound, upper_bound, title="DTE of adjusted estimator with simple confidence band")
+  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="simple")
+  plot(locations, dte, lower_bound, upper_bound, title="DTE of adjusted estimator with simple confidence band")
 
 .. image:: _static/dte_simple.png
    :alt: DTE of adjusted estimator with simple confidence band
@@ -105,8 +106,8 @@ Confidence bands can be computed in different ways. In the following code, we us
 
 .. code-block:: python
 
-  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=np.sort(Y), variance_type="moment")
-  plot(np.sort(Y), dte, lower_bound, upper_bound, title="DTE of adjusted estimator with moment confidence band")
+  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="moment")
+  plot(locations, dte, lower_bound, upper_bound, title="DTE of adjusted estimator with moment confidence band")
 
 .. image:: _static/dte_moment.png
    :alt: DTE of adjusted estimator with moment confidence band
@@ -118,8 +119,8 @@ Also, an uniform confidence band is used when "uniform" is specified for the "va
 
 .. code-block:: python
 
-  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=np.sort(Y), variance_type="uniform")
-  plot(np.sort(Y), dte, lower_bound, upper_bound, title="DTE of adjusted estimator with uniform confidence band")
+  dte, lower_bound, upper_bound = estimator.predict_dte(target_treatment_arm=1, control_treatment_arm=0, locations=locations, variance_type="uniform")
+  plot(locations, dte, lower_bound, upper_bound, title="DTE of adjusted estimator with uniform confidence band")
 
 .. image:: _static/dte_uniform.png
    :alt: DTE of adjusted estimator with uniform confidence band
@@ -131,7 +132,6 @@ To compute PTE, we can use "predict_pte" method.
 
 .. code-block:: python
 
-  locations = np.linspace(Y.min(), Y.max(), 20)
   pte, lower_bound, upper_bound = estimator.predict_pte(target_treatment_arm=1, control_treatment_arm=0, width=1, locations=locations, variance_type="simple")
   plot(locations, pte, lower_bound, upper_bound, chart_type="bar", title="PTE of adjusted estimator with simple confidence band")
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -3,20 +3,27 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
-dte_adj Documentation
+dte_adj
 ===================================
 
+This a Python package for building the regression adjusted distribution function estimator proposed in "Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction".
+
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1
    :caption: Contents:
 
    installation
    get_started
    modules
+   contributing
 
 Indices and tables
-==================
+~~~~~~~~~~~~~~~~~~
 
 * :ref:`genindex`
 * :ref:`modindex`
-* :ref:`search`
+* :ref:`search`
+
+License
+~~~~~~~
+MIT License
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -3,13 +3,25 @@ Installation Guide
 
 This package can be installed either through PyPI or source code.
 
+Requirement
+~~~~~~~~~~~
+
+You need to use Python version 3.6 or higher to use this package.
+
+
 Install from PyPI
+~~~~~~~~~~~~~~~~~
+
+For installing the package from PyPI, please use the following command.
 
 .. code-block:: bash
 
    pip install dte_adj
 
 Install from source code
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+For installing the package from the source code, please use the following commands.
 
 .. code-block:: bash
 

diff --git a/dte_adj/__init__.py b/dte_adj/__init__.py
@@ -282,25 +282,32 @@ def _compute_qtes(
         outcomes: np.array,
     ) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
         """Compute expected QTEs."""
-        treatment_cumulative, _ = self._compute_cumulative_distribution(
-            np.full(outcomes.shape, target_treatment_arm),
-            outcomes,
-            confoundings,
-            treatment_arms,
-            outcomes,
-        )
-        control_cumulative, _ = self._compute_cumulative_distribution(
-            np.full(outcomes.shape, control_treatment_arm),
-            outcomes,
-            confoundings,
-            treatment_arms,
-            outcomes,
-        )
+        locations = np.sort(outcomes)
+
+        def find_quantile(quantile, arm):
+            low, high = 0, locations.shape[0] - 1
+            result = -1
+            while low <= high:
+                mid = (low + high) // 2
+                val, _ = self._compute_cumulative_distribution(
+                    np.full((1), arm),
+                    np.full((1), locations[mid]),
+                    confoundings,
+                    treatment_arms,
+                    outcomes,
+                )
+                if val[0] <= quantile:
+                    result = locations[mid]
+                    low = mid + 1
+                else:
+                    high = mid - 1
+            return result
+
         result = np.zeros(quantiles.shape)
         for i, q in enumerate(quantiles):
-            treatment_idx = find_le(treatment_cumulative, q)
-            control_idx = find_le(control_cumulative, q)
-            result[i] = outcomes[treatment_idx] - outcomes[control_idx]
+            result[i] = find_quantile(q, target_treatment_arm) - find_quantile(
+                q, control_treatment_arm
+            )
 
         return result
 
@@ -415,15 +422,15 @@ def _compute_cumulative_distribution(
         d_confounding = {}
         d_outcome = {}
         n_obs = outcomes.shape[0]
-        n_loc = outcomes.shape[0]
+        n_loc = locations.shape[0]
         for arm in unique_treatment_arm:
             selected_confounding = confoundings[treatment_arms == arm]
             selected_outcome = outcomes[treatment_arms == arm]
             sorted_indices = np.argsort(selected_outcome)
             d_confounding[arm] = selected_confounding[sorted_indices]
             d_outcome[arm] = selected_outcome[sorted_indices]
-        cumulative_distribution = np.zeros(outcomes.shape)
-        for i, (outcome, arm) in enumerate(zip(outcomes, target_treatment_arms)):
+        cumulative_distribution = np.zeros(locations.shape)
+        for i, (outcome, arm) in enumerate(zip(locations, target_treatment_arms)):
             cumulative_distribution[i] = (
                 find_le(d_outcome[arm], outcome) + 1
             ) / d_outcome[arm].shape[0]
@@ -518,10 +525,10 @@ def _compute_cumulative_distribution(
             np.ndarray: Estimated cumulative distribution values.
         """
         n_obs = outcomes.shape[0]
-        n_loc = outcomes.shape[0]
-        cumulative_distribution = np.zeros(outcomes.shape)
+        n_loc = locations.shape[0]
+        cumulative_distribution = np.zeros(locations.shape)
         superset_prediction = np.zeros((n_obs, n_loc))
-        for i, (location, arm) in enumerate(zip(outcomes, target_treatment_arms)):
+        for i, (location, arm) in enumerate(zip(locations, target_treatment_arms)):
             confounding_in_arm = confoundings[treatment_arms == arm]
             outcome_in_arm = outcomes[treatment_arms == arm]
             subset_prediction = np.zeros(outcome_in_arm.shape[0])