Skip to content

Commit

Permalink
add and modify documentation to reflect update in plotting
Browse files Browse the repository at this point in the history
  • Loading branch information
mbi6245 committed Sep 25, 2024
1 parent 2b6fe42 commit 409561b
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 37 deletions.
40 changes: 6 additions & 34 deletions docs/getting_started/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,50 +10,22 @@ Example
import scipy.stats
from ensemble.ensemble_model import EnsembleModel, EnsembleFitter
# creates an ensemble distribution composed of the normal and gumbel distributions both sharing
# the same mean and variance; the normal distribution can be thought of as contributing a
# quarter of the "height" of the density curve to the ensemble's density, and the gumbel as
# contributing the remaining 3 quarters
# the same mean and variance
normal_gumbel = EnsembleModel(distributions=["normal", "gumbel"],
weights=[0.25, 0.75],
mean=4,
variance=1)
# fits an EnsembleModel object to standard normal draws. Here, the user has specified a
# distribution (the gumbel) that is not reflected in the truth. The model typically correctly
# identifies this and will give weights close to 1 for the normal, and 0 for the gumbel
# distribution (the gumbel) that is not reflected in the truth. Try on your own to see how the
# model reflects this!
std_norm_draws = scipy.stats.norm.rvs(0, 1, size=100)
model = EnsembleFitter(["normal", "gumbel"], "L2").fit(std_norm_draws)
fitted_weights = model.weights
fitted_distribution = model.ensemble_distribution
Using both the draws from the standard normal and the fitted :code:`EnsembleModel` object from
above, we can also plot the results with the help of the :code:`matplotlib` package. There are many
things that you may want to plot, but 2 useful plots that will be demonstrated below are a density
histogram overlaid with the ensemble PDF, as well as a comparison of the eCDF and the ensemble's
CDF.
# default plotting function for a demo visualization
normal_gumbel.plot()
Plotting
--------

.. code-block:: python
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, 2)
support = np.linspace(np.min(std_norm_draws), np.max(std_norm_draws), 1000)
# plot histogram vs fitted PDF
ax[0].hist(std_norm_draws, density=True, bins=30)
ax[0].plot(support, fitted_distribution.pdf(support))
ax[0].set_xlabel("DATA VALUES (UNITS)")
ax[0].set_ylabel("density")
ax[0].set_title("DATA histogram w/ensemble PDF Overlay")
# plot eCDF vs fitted CDF
stats.ecdf(std_norm_draws).cdf.plot(ax[1])
ax[1].plot(support, fitted_distribution.cdf(support))
ax[1].set_xlabel("DATA VALUES (UNITS)")
ax[1].set_ylabel("density")
ax[1].set_title("Empirical vs Ensemble CDF Comparison")
**Please see :ref:`Plotting` for a practical guide on plotting with ensemble.**
6 changes: 3 additions & 3 deletions docs/user_guide/ensemble_fitting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ Finally, the function of interest for this use case is the :code:`fit()` functio
Example: Fitting an Ensemble
----------------------------

Suppose we have microdata for systolic blood pressure (SBP) from a certain population of young men
in Seattle. Since SBP must be positive, let's use all the distributions (except the exponential)
with a positive support to fit this data.
Suppose we have microdata for systolic blood pressure (SBP) from a certain population of young
people in Seattle. Since SBP must be positive, let's use all the distributions (except the
exponential) with a positive support to fit this data.

.. code-block:: python
Expand Down
1 change: 1 addition & 0 deletions docs/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ User guide

ensemble_fitting
ensemble_model
plotting
concepts

.. note::
Expand Down
43 changes: 43 additions & 0 deletions docs/user_guide/plotting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Plotting
========

Let's reuse the example from :ref:`Example: Fitting an Ensemble` with the SBP. There are many
things that you may want to plot, but 2 useful plots that will be demonstrated below are:

* a density histogram of the SBP overlaid with the ensemble PDF
* a comparison between the eCDF of the SBP data and the ensemble's CDF

.. code-block:: python
import numpy as np
import matplotlib.pyplot as plt
# generate SBP data and fit model as before
data = stats.norm(loc=120, scale=7).rvs(size=100)
model = EnsembleFitter(
distributions=["gamma", "invgamma", "fisk", "lognormal"],
objective="L2"
)
res = model.fit(data)
# set up matplotlib plotting
fig, ax = plt.subplots(1, 2)
support = np.linspace(np.min(data), np.max(data), 1000)
# plot histogram w/fitted PDF
ax[0].hist(data, density=True, bins=30)
ax[0].plot(support, fitted_distribution.pdf(support))
ax[0].set_xlabel("SBP (mm/Hg)")
ax[0].set_ylabel("density")
ax[0].set_title("SBP histogram w/ensemble PDF Overlay")
# plot eCDF vs fitted CDF
stats.ecdf(std_norm_draws).cdf.plot(ax[1])
ax[1].plot(support, fitted_distribution.cdf(support))
ax[1].set_xlabel("SBP (mm/Hg)")
ax[1].set_ylabel("density")
ax[1].set_title("Empirical vs Ensemble CDF Comparison")
**What is** :code:`support` **?:** You can think of :code:`support` as the x values (in the space of the
data)for which we will calculate corresponding y values of density for, whether that be the PDF or
CDF.

0 comments on commit 409561b

Please sign in to comment.