Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New inference 🎨 #203

Merged
merged 3 commits into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion build/conda_environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,6 @@ dependencies:
- conda-forge/linux-64::r-readxl==1.4.1=r42h3ebcfa7_1
- conda-forge/noarch::r-reprex==2.0.2=r42hc72bb7e_1
- conda-forge/linux-64::r-tidyr==1.3.0=r42h38f115c_0
- conda-forge/noarch::r-tigris==2.0.1=r42hc72bb7e_0
- conda-forge/noarch::r-waldo==0.4.0=r42hc72bb7e_1
- conda-forge/noarch::r-broom==1.0.3=r42hc72bb7e_0
- conda-forge/linux-64::r-gdtools==0.3.0=r42he0ce631_0
Expand Down
2 changes: 1 addition & 1 deletion build/local_install.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ local({r <- getOption("repos")

library(devtools)

install.packages(c("covidcast","data.table","vroom","dplyr"), quiet=TRUE, dependencies = TRUE)
install.packages(c("covidcast","data.table","vroom","dplyr"), quiet=TRUE)
# devtools::install_github("hrbrmstr/cdcfluview")

# To run if operating in the container -----
Expand Down
1 change: 0 additions & 1 deletion datasetup/build_US_setup.R
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ state_level <- ifelse(!is.null(config$subpop_setup$state_level) && config$subpop
# tidycensus::census_api_key(key = census_key)



filterUSPS <- c("WY","VT","DC","AK","ND","SD","DE","MT","RI","ME","NH","HI","ID","WV","NE","NM",
"KS","NV","MS","AR","UT","IA","CT","OK","OR","KY","LA","AL","SC","MN","CO","WI",
"MD","MO","IN","TN","MA","AZ","WA","VA","NJ","MI","NC","GA","OH","IL","PA","NY","FL","TX","CA")
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
60 changes: 60 additions & 0 deletions documentation/gitbook/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Home

Welcome to _flepiMoP_ documentation!

The “**FLexible EPIdemic MOdeling Pipeline**” (_flepiMoP_; formerly known as the _COVID Scenario Modeling Pipeline_ or _CSP_) is an open-source software suite designed by researchers in the [Johns Hopkins Infectious Disease Dynamics Group](http://www.iddynamics.jhsph.edu/) and at [UNC Chapel Hill ](https://sph.unc.edu/epid/epidemiology-landing/)to simulate a wide range of compartmental models of infectious disease transmission. The disease transmission and observation models are defined by a no-code configuration file, which allows models of varying complexity to be specified quickly and consistently, from simple problems described by SIR-style models in a single population to more complicated models of multiple pathogen strains transmitting between thousands of connected spatial divisions and age groups.

It was initially designed in early 2020 and was routinely used to provide projections of the emerging COVID-19 epidemic to health authorities worldwide. Currently, _flepiMoP_ provides COVID-19 projections to the US CDC-funded model aggregation sites, the [COVID-19 Forecast Hub](https://covid19forecasthub.org/) and the [COVID-19 Scenario Modeling Hub](https://covid19scenariomodelinghub.org/), influenza projections to [FluSight ](https://www.cdc.gov/flu/weekly/flusight/index.html)and to the [Flu Scenario Modeling Hub](https://fluscenariomodelinghub.org), and RSV projections to the [RSV Scenario Modeling Hub](https://rsvscenariomodelinghub.org/).

However, the pipeline is much more general and can be used to simulate the dynamics of any infection that can be expressed as a [compartmental epidemic model](https://en.wikipedia.org/wiki/Compartmental\_models\_in\_epidemiology). These include applications in chemical reaction kinetics, pharmacokinetics, within-host disease dynamics, or applications in the social sciences.

In addition to producing forward simulations given a specified model and parameter values, the pipeline can also attempt to optimize unknown parameters (e.g., transmission rate, case detection rate, intervention efficacy) to fit the model to datasets the user provides (e.g., hospitalizations due to severe disease) using a Bayesian inference framework. This feature allows the pipeline to be utilized for short-term forecasting or longer-term scenario projections for ongoing epidemics, since it can simultaneously be fit to data for dates in the past and then use best-fit parameters to make projections into the future.

### General description of _flepiMoP_

The main features of _flepiMoP_ are:

* Open-source (GPL v3.0) infectious dynamics modeling software, written in R and Python
* Versatile, no-code design applicable for most compartmental models and outcome observation models, allowing for quick iteration in reaction to epidemic events (e.g., emergence of new variants, vaccines, non-pharmaceutical interventions (NPIs))
* Powerful, just-in-time compiled disease transmission model and distributed inference engine ready for large scale simulations on high-performance computing clusters or cloud workflows
* Adapted to small- and large-scale problems, from a simple SIR model to a complex model structure with hundreds of compartments on thousands of connected populations
* Strong emphasis on mechanistic processes, with a design aimed at leveraging domain knowledge in conjunction with statistical inference
* Portable for Windows WSL, MacOS, and Linux with the provided Docker image and an Anaconda environment

<figure><img src=".gitbook/assets/CSP Overview.png" alt=""><figcaption><p>Overview of the pipeline organization</p></figcaption></figure>

The mathematical model within the pipeline is a _compartmental epidemic model_ embedded within a _well-mixed metapopulation_. A compartmental epidemic model is a model that divides all individuals in a population into a discrete set of states (e.g. “infected”, “recovered”) and tracks – over time – the number of individuals in each state and the rates at which individuals transition between these states. The well-known SIR model is a classic example of such a model, and much more complex versions of this model type have been simulated with this framework (for example, an SEIR-style model in which individuals are further subdivided into multiple age groups and vaccination statuses).

The structure of the desired model, as well as the parameter values and initial conditions, can be specified flexibly by the user in a no-code fashion. The pipeline allows for parameter values to change over time at discrete intervals, which can be used to specify time-dependent aspects of disease transmission and control (such as seasonality or vaccination campaigns).

The model is embedded within a meta-population structure, which consists of a series of distinct subpopulations (e.g. states, provinces, or other communities) in which the model structure is repeated, albeit with potentially different parameter values. The subpopulations can interact, either through the movement of individuals or the influence of individuals in one subpopulation on the transition rate of individuals in another.&#x20;

Within each subpopulation, the population is assumed to be well-mixed, meaning that interactions are assumed to be equally likely between any pair of individuals (since unique identities of individuals are not explicitly tracked). The same model structure can be simulated in a continuous-time deterministic or discrete-time stochastic manner.&#x20;

In addition to the variables described by the compartmental model, the model can track other observable variables (“outcomes”) that are functions of the basic model variables but do not themselves influence the dynamics (i.e., some portion of infections are reported as cases, depending on a testing rate). The model can be run iteratively to tune the values of certain parameters so that these outcome variables best match timeseries data provided by the user for a certain time period.&#x20;

Fitting is done using a Bayesian-like framework, where the user can specify the likelihood of observed outcomes in data given modeled outcomes, and the priors on any parameters to be fit. Multiple data streams (e.g., cases and deaths) can be fit simultaneously. A custom Markov Chain Monte Carlo method is used to sequentially propose and accept or reject parameter values based on the model fit to data, in a way that balances fit quality within each individual subpopulation with that of the total aggregate population, and that takes advantage of parallel computing environments.

The code is written in a combination of [R](https://www.r-project.org/) and [Python](https://www.python.org/), and the vast majority of users only need to interact with the pipeline via the components written in R. It is structured in a modular fashion, such that individual components – such as the epidemic model, the observable variables, the population structure, or the parameters – can be edited or completely replaced without any handling of other parts of the code.&#x20;

When model simulation is combined with fitting to data, the code is designed to run most efficiently on a supercomputing cluster with many cores. We most commonly run the code on [Amazon Web Services](https://aws.amazon.com/) or on high-performance computers using SLURM. However, even relatively large models can be run efficiently on most personal computers. Typically, the memory of the machine will limit the number of compartments (i.e., variables) that can be included in the epidemic model, while the machine’s CPU will determine the speed at which each model run is completed and the number of iterations of the model that can be run during parameter searches when fitting the model to data. While the pipeline can be installed on any computer, it is sometime easier to use an Anaconda environment or the provided [Docker](https://www.docker.com/) container, where all the software dependencies (e.g., standardized R and Python versions along with required packages) are included, independent of the user’s local machine. All the code is maintained on [our GitHub](https://github.com/HopkinsIDD/flepiMoP) and shared with the GNU General Public License v3.0 license. It is build on top of a fully open-source software stack.

This documentation is organized as follows. The [Model Description](gempyor/model-description.md) section describes the mathematical framework for the compartmental epidemic models that can be simulated forward in time by the pipeline. The [Model Inference](model-inference/inference-description.md) section describes the statistical framework for fitting the model to data. The [Data and Parameter](broken-reference) section describes the inputs the user must provide to the pipeline, in terms of the model structure and parameters, the population characteristics, the initial conditions, time-varying interventions, data to be fit, and more. The [How to Run](broken-reference) section provides concrete guidance on setting up and running the model and analyzing the output. The [Quick Start Guide](how-to-run/quick-start-guide.md) provides a simple example model setup. The [Advanced](how-to-run/advanced-run-guides/) section goes into more detail on specific features of the model and the code that are likely to only be of interest to users who want to run more complex models or data fitting routines or substantially edit the code. It includes a subsection describing each file and package used in the pipeline and their interactions during a model run.

Users who wish to jump to running the model themselves can see [Quick Start Guide](how-to-run/quick-start-guide.md).

For questions about the pipeline or to report a bug, please use the “Issues” or "Discussions" feature on [our GitHub](https://github.com/HopkinsIDD/flepiMoP).

### Acknowledgments

_flepiMoP_ is actively developed by its current contributors, including Joseph C Lemaitre, Sara L Loo, Emily Przykucki, Clifton McKee, Claire Smith, Sung-mok Jung, Koji Sato, Pengcheng Fang, Erica Carcelen, Alison Hill, Justin Lessler, and Shaun Truelove, affiliated with the:&#x20;

* Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA for (JCL, JL)
* Johns Hopkins University International Vaccine Access Center, Department of International Health, Baltimore, MD, USA for (SLL, KJ, EC, ST)
* Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA for (CM, CS, JL, ST)
* Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA for (S-m.J, JL)
* Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD, USA for (AH).

The development of this model was supported by from funds the National Science Foundation (`2127976`; ST, CPS, JK, ECL, AH), Centers for Disease Control and Prevention (`200-2016-91781`; ST, CPS, JK, AH, JL, JCL, SL, CM, EC, KS, S-m.J), US Department of Health and Human Services / Department of Homeland Security (ST, CPS, JK, ECL, AH, JL), California Department of Public Health (ST, CPS, JK, ECL, JL), Johns Hopkins University (ST, CPS, JK, ECL, JL), Amazon Web Services (ST, CPS, JK, ECL, AH, JL, JCL), National Institutes of Health (`R01GM140564`; JL, `5R01AI102939`; JCL), and the Swiss National Science Foundation (`200021-172578`; JCL)

We need to also acknowledge past contributions to the development of the COVID Scenario Pipeline, which evolved into _flepiMoP_. These include contributions by Heramb Gupta, Kyra H. Grantz, Hannah R. Meredith, Stephen A. Lauer, Lindsay T. Keegan, Sam Shah, Josh Wills, Kathryn Kaminsky, Javier Perez-Saez, Joshua Kaminsky, and Elizabeth C. Lee.
79 changes: 79 additions & 0 deletions documentation/gitbook/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Table of contents

* [Home](README.md)

## 🦠 gempyor: modeling infectious disease dynamics <a href="#gempyor" id="gempyor"></a>

* [Modeling infectious disease dynamics](gempyor/model-description.md)
* [Model Implementation](gempyor/model-implementation/README.md)
* [flepiMoP's configuration file](gempyor/model-implementation/introduction-to-configuration-files.md)
* [Specifying population structure](gempyor/model-implementation/specifying-population-structure.md)
* [Specifying compartmental model](gempyor/model-implementation/compartmental-model-structure.md)
* [Specifying initial conditions and seeding](gempyor/model-implementation/specifying-initial-conditions-and-seeding.md)
* [Specifying observational model](gempyor/model-implementation/outcomes-for-compartments.md)
* [Specifying time-varying parameter modifications](gempyor/model-implementation/intervention-templates.md)
* [Other configuration options](gempyor/model-implementation/other-configuration-options.md)
* [Code structure](gempyor/model-implementation/code-structure.md)
* [Model Output](gempyor/output-files.md)

## 📈 Model Inference

* [Inference Description](model-inference/inference-description.md)
* [Inference Implementation](model-inference/inference-implementation/README.md)
* [Specifying data source and fitted variables](model-inference/inference-implementation/specifying-data-source-and-fitted-variables.md)
* [(OLD) Configuration options](model-inference/inference-implementation/configuration-options.md)
* [(OLD) Configuration setup](model-inference/inference-implementation/old-configuration-setup.md)
* [Code structure](model-inference/inference-implementation/code-structure.md)
* [Inference Model Output](model-inference/inference-model-output.md)

## 🖥️ More

* [Setting up the model and post-processing](more/setting-up-the-model-and-post-processing/README.md)
* [Config writer](more/setting-up-the-model-and-post-processing/config-writer.md)
* [Diagnostic plotting scripts](more/setting-up-the-model-and-post-processing/plotting-scripts.md)
* [Create a post-processing script](more/setting-up-the-model-and-post-processing/create-a-post-processing-script.md)
* [Reporting](more/setting-up-the-model-and-post-processing/reporting.md)
* [Advanced](more/advanced/README.md)
* [File descriptions](more/advanced/file-descriptions.md)
* [Numerical methods](more/advanced/numerical-methods.md)
* [Additional parameter options](more/advanced/additional-parameter-options.md)
* [Swapping model modules](more/advanced/swapping-model-modules.md)
* [Resuming inference runs](more/advanced/resuming-inference-runs.md)
* [Using plug-ins 🧩\[experimental\]](more/advanced/using-plug-ins-experimental.md)

## 🛠️ How To Run

* [Before any run](how-to-run/before-any-run.md)
* [Quick Start Guide](how-to-run/quick-start-guide.md)
* [Advanced run guides](how-to-run/advanced-run-guides/README.md)
* [Running with Docker locally 🛳](how-to-run/advanced-run-guides/running-with-docker-locally.md)
* [Running locally in a conda environment 🐍](how-to-run/advanced-run-guides/quick-start-guide-conda.md)
* [Running on SLURM HPC](how-to-run/advanced-run-guides/slurm-submission-on-marcc.md)
* [Running on AWS 🌳](how-to-run/advanced-run-guides/running-on-aws.md)
* [Common errors](how-to-run/common-errors.md)
* [Useful commands](how-to-run/useful-commands.md)

## 🗜️ Development

* [Python guidelines for developers](development/python-guidelines-for-developers.md)

## Deprecated pages

* [Running with RStudio Server on AWS EC2](deprecated-pages/running-with-rstudio-server-on-aws-ec2.md)
* [Running with docker on AWS - OLD probably outdated](deprecated-pages/running-with-docker-on-aws/README.md)
* [Provisioning AWS EC2 instance](deprecated-pages/running-with-docker-on-aws/provisioning-aws-ec2-instance.md)
* [AWS Submission Instructions: Influenza](deprecated-pages/running-with-docker-on-aws/aws-submission-instructions-influenza.md)
* [AWS Submission Instructions: COVID-19](deprecated-pages/running-with-docker-on-aws/aws-submission-instructions-covid-19.md)
* [Module specification](deprecated-pages/module-specification.md)
* [Block that don't go anywhere](deprecated-pages/block-that-dont-go-anywhere.md)

## JHU Internal

* [US specific How to Run](jhu-internal/us-specific-how-to-run/README.md)
* [Running with Docker locally (outdated/US specific) 🛳](jhu-internal/us-specific-how-to-run/running-with-docker-locally.md)
* [Running on Rockfish/MARCC - JHU 🪨🐠](jhu-internal/us-specific-how-to-run/slurm-submission-on-marcc.md)
* [Inference scratch](jhu-internal/inference-scratch.md)

## Group 1

* [Page 1](group-1/page-1.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Block that don't go anywhere



<figure><img src="../.gitbook/assets/pipeline-overview.png" alt=""><figcaption></figcaption></figure>
Loading
Loading