generated from ihmeuw-msca/pypkg
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from ihmeuw-msca/bugfix/plots-and-erroring
Bugfix/plots and erroring
- Loading branch information
Showing
27 changed files
with
569 additions
and
398 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -127,3 +127,8 @@ dmypy.json | |
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# Misc. | ||
.DS_Store | ||
*.csv | ||
*.parquet |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,4 +16,4 @@ h6 { | |
font-size: 1rem; | ||
font-weight: 500; | ||
margin: auto; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,4 +12,4 @@ | |
</ul> | ||
<li> | ||
</ul> | ||
</div> | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,27 @@ | ||
================ | ||
Installing pypkg | ||
================ | ||
=================== | ||
Installing ensemble | ||
=================== | ||
|
||
Python version | ||
-------------- | ||
|
||
The package :code:`pypkg` is written in Python | ||
and requires Python 3.10 or later. | ||
The package :code:`ensemble` is written in Python | ||
and requires Python 3.12 or later. | ||
|
||
:code:`pypkg` package is distributed at | ||
`PyPI <https://pypi.org/project/pypkg/>`_. | ||
:code:`ensemble` package is distributed at | ||
.. `PyPI <https://pypi.org/project/ensemble/>`_. | ||
TBD | ||
To install the package: | ||
|
||
.. code:: | ||
pip install pypkg | ||
pip install ensemble | ||
For developers, you can clone the repository and install the package in the | ||
development mode. | ||
|
||
.. code:: | ||
git clone https://github.com/ihmeuw-msca/pypkg.git | ||
cd pypkg | ||
pip install -e ".[test,docs]" | ||
git clone https://github.com/ihmeuw-msca/ensemble.git | ||
cd ensemble | ||
pip install -e ".[test,docs]" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
versions = [ | ||
"0.1.0", | ||
] | ||
] |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
======== | ||
Concepts | ||
======== | ||
|
||
Distributions | ||
------------- | ||
|
||
Each individual distribution in an ensemble is fit to the given mean and variance of the data. This | ||
process typically involves using algebra to isolate the parameters of the distributions with the | ||
sample mean and variance as given, and then solving for the 2 parameter system. You may look within | ||
the :code:`create_scipy_dist()` function to find the equations used. The single exception is the | ||
Fisk distribution, where the form of the PDF necessitates the use of numerical minimization | ||
|
||
EnsembleModel | ||
------------- | ||
|
||
PDF, CDF, PPF | ||
^^^^^^^^^^^^^ | ||
|
||
Methods used for creating the PDF, CDF, and PPF of the EnsembleDistribution object are relatively | ||
"off the shelf" so to speak, generally following the structure and methodology of scipy's | ||
implementation `here <https://github.com/scipy/scipy/blob/v1.14.0/scipy/stats/_distn_infrastructure.py>`_. | ||
In summary, the PDF and CDF can just be weighted linear combinations of the component distributions | ||
while the PPF requires use use of Brent's algorithm to solve for the quantile corresponding to the | ||
correct point in the PDF. | ||
|
||
rvs | ||
^^^ | ||
|
||
A.K.A. scipy's function to generate draws, was not implemented by solving for the PPF, as listed in | ||
the source code above. Instead, since a linear combination of distributions is functionally | ||
equivalent to sampling from individual distributions with probability of sampling from a | ||
distribution dictated by a multinomial distribution, the latter method has been chosen here for | ||
efficiency purposes. | ||
|
||
stats_temp | ||
^^^^^^^^^^ | ||
|
||
A getter function for the mean and variance supplied to the EnsembleDistribution object, does not | ||
supply skewness and kurtosis like scipy's :code:`stats()`. | ||
|
||
EnsembleFitter | ||
-------------- | ||
|
||
The :code:`fit()` function performs fitting of ensemble distributions by minimizing the distances | ||
of the eCDF of given microdata to the CDF of an ensemble distribution subject to some penalty. | ||
Legacy code at IHME implements only the Kolmogorov-Smirnoff distance, but the sum of squares and L1 | ||
norm distance metrics have also been implemented as well. |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
================ | ||
Ensemble Fitting | ||
================ | ||
|
||
In order to fit an ensemble distribution to microdata, use the :code:`EnsembleFitter` object. The | ||
object must be initialized with 2 things. | ||
|
||
*A list of named distributions.* These distributions have "supports" that differ from each other. A | ||
support, for our purposes, can be thought of as the x values that are compatible with some given | ||
distribution. For example, the Normal distribution is supported on the entire real line, so it can | ||
take negative x values, but the Gamma is only supported on (0, :math:`\infty`), so it cannot take | ||
negative values. **Recall: you are not permitted to use distributions with differing supports in the | ||
same ensemble.** | ||
|
||
*A penalty function of choice.* In a nutshell, we are minimizing the distances between the empirical | ||
cumulative distribution function (eCDF) and the CDF of the ensemble subject to said chosen penalty. | ||
The penalties currently implemented are as follows: | ||
|
||
* :code:`"L1"`: L1 norm | ||
* :code:`sum_squares"`: sum of squares | ||
* :code:`"KS"`: the Kolmogorov-Smirnoff distance, A.K.A. infinity norm | ||
|
||
Finally, the function of interest for this use case is the :code:`fit()` function. | ||
|
||
Example: Fitting an Ensemble | ||
---------------------------- | ||
|
||
Suppose we have microdata for systolic blood pressure (SBP) from a certain population of young men | ||
in Seattle. Since SBP must be positive, let's use all the distributions (except the exponential) | ||
with a positive support to fit this data. | ||
|
||
.. code-block:: python | ||
import scipy.stats as stats | ||
from ensemble.model import EnsembleFitter | ||
SBP_vals = stats.norm(loc=120, scale=7).rvs(size=100) | ||
model = EnsembleFitter( | ||
distributions=["gamma", "invgamma", "fisk", "lognormal"], | ||
objective="L2" | ||
) | ||
res = model.fit(SBP_vals) | ||
:code:`res` contains an array of fitted weights as well as an :code:`EnsembleDistribution` object | ||
that has already been initialized with the distributions provided to :code:`model`. They can be | ||
accessed as follows: | ||
|
||
.. code-block:: python | ||
# fitted weights | ||
fitted_weights = res.weights | ||
# fitted ensemble | ||
fitted_ensemble = res.ensemble_distribution |
Oops, something went wrong.