Skip to content

R utility tool to extract, process and structure Philippines-specific datasets for use in CoMo Consortium model

License

Notifications You must be signed in to change notification settings

como-ph/comoparams

Repository files navigation

comoparams: R utility tool to extract, process and structure Philippines-specific datasets for use in the CoMo Consortium Model

Lifecycle: maturing Travis build status AppVeyor build status R-CMD-check codecov CodeFactor DOI

The Oxford Modelling Group for Global Health (OMGH) is developing model structures to estimate the impact of potential mitigation strategies. By changing the input data and parameter values, the models can be adjusted to represent a national or subnational setting. In addition, a user friendly application and interface is being developed to enable widespread utility.

The CoMo Philippines (CoMo-PH) group is the Philippines country team that is supporting the use of the model specific to the Philippines as part of the COVID-19 International Modeling Consortium (CoMo Consortium).

This R package has been developed by the CoMo-PH group to support its efforts in the collection of Philippines-specific and Philippines-appropriate data that will define the parameters used in the CoMo Consortium Model as applied to the Philippines. This package facilitates the processing of Philippines national and/or subnational input data and parameter values into data structures appropriate for use with the modelling application and/or other packages developed for the CoMo Consortium Model.

Installation

comoparams is in active development and is currently only available to install via GitHub:

if(!require("remotes")) install.packages("remotes")
remotes::install_github("como-ph/comoparams")

To use, load the package:

library(comoparams)

Usage

comoparams currently has three main function sets: 1) pull data functions; 2) calculate values functions; and, 3) parameter setting functions.

Pull data

comoparams incorporates functions that support the extraction of data relevant to the CoMo Model parameters and specific to the Philippines. Currently, these functions allow for 1) pulling of data from the Philippines Department of Health’s official COVID-19 DataDrop repository; 2) pulling of data from the Philippines Statistics Authority (PSA) 2015 Population Census (2015 POPCEN); and 3) pulling of population, births and deaths data from the World Population Prospects 2019.

Pulling data from the Philippines Department of Health COVID-19 DataDrop

On the 14th of April 2020, the Philippines Department of Health released publicly its data related to COVID-19 initially in Google Sheets format and subsequently as comma-separated value (CSV) files stored in Google Drive. There are 4 functions in comoparams that interacts with the Philippines COVID-19 DataDrop. These functions are wrapper functions utilising the googledrive package functions in R. Each of these functions pulls a specific type of data from the COVID-19 DataDrop repository.

  • fields - data describing the various fields or variables in the different datasets found in COVID-19 DataDrop.

  • cases - data on all COVID-19 cases recorded up to the date the specific dataset has been released.

  • tests - data on tests performed and recorded up to the date the specific dataset has been released.

  • daily - data on health facilities’ daily COVID-19 specific equipment status.

The syntax for this set of pull functions has a common prefix of ph_get_ followed by the short name of the specific dataset listed above.

All four functions in this set has two arguments - version and date. The version argument is used to specify whether to pull the most recent or current data (specified as current) or to pull previous data (specified as archive). The default is current. The date argument is ignored if version is set to current. However, if version is set to archive, the date argument needs to be specified (in format) which will determine up to which exacted data reports to. This argument will only accept dates starting from 14 April 2020 when COVID-19 DataDrop was launched.

The output of these functions is a tibble.

To pull the most current data on cases, we use the function ph_get_cases() as follows:

ph_get_cases()
#> # A tibble: 31,825 x 21
#>    CaseCode   Age AgeGroup Sex   DateSpecimen DateResultRelea… DateRepConf
#>    <chr>    <dbl> <chr>    <chr> <chr>        <chr>            <chr>      
#>  1 C896932     30 30 to 34 Fema… 2020-06-14   ""               2020-06-19 
#>  2 C679761     29 25 to 29 Fema… 2020-06-14   "2020-06-16"     2020-06-19 
#>  3 C345251     38 35 to 39 Fema… 2020-06-11   "2020-06-16"     2020-06-19 
#>  4 C685818     25 25 to 29 Fema… 2020-06-14   "2020-06-16"     2020-06-19 
#>  5 C629593     26 25 to 29 Male  2020-06-15   "2020-06-17"     2020-06-19 
#>  6 C222820     48 45 to 49 Male  2020-06-15   "2020-06-17"     2020-06-19 
#>  7 C228086     31 30 to 34 Fema… 2020-06-09   "2020-06-11"     2020-06-19 
#>  8 C686978     27 25 to 29 Male  2020-06-14   "2020-06-16"     2020-06-19 
#>  9 C730849     34 30 to 34 Male  2020-06-14   "2020-06-16"     2020-06-19 
#> 10 C565692     33 30 to 34 Fema… 2020-06-15   "2020-06-17"     2020-06-19 
#> # … with 31,815 more rows, and 14 more variables: DateDied <chr>,
#> #   DateRecover <chr>, RemovalType <chr>, DateRepRem <chr>, Admitted <chr>,
#> #   RegionRes <chr>, ProvRes <chr>, CityMunRes <chr>, CityMuniPSGC <chr>,
#> #   HealthStatus <chr>, Quarantined <chr>, DateOnset <chr>, Pregnanttab <chr>,
#> #   ValidationStatus <chr>

To pull the data on cases for up to 5 May 2020, we use the function ph_get_cases() as follows:

ph_get_cases(version = "archive", date = "2020-05-05")
#> # A tibble: 9,684 x 18
#>    CaseCode   Age AgeGroup Sex   DateRepConf DateDied DateRecover RemovalType
#>    <chr>    <int> <chr>    <chr> <chr>       <chr>    <chr>       <chr>      
#>  1 C100119     31 30 to 34 Male  2020-04-12  ""       ""          ""         
#>  2 C100264     58 55 to 59 Male  2020-03-29  ""       ""          ""         
#>  3 C100648     34 30 to 34 Fema… 2020-04-16  ""       ""          ""         
#>  4 C100660     43 40 to 44 Fema… 2020-04-02  ""       "2020-04-2… ""         
#>  5 C100776     43 40 to 44 Male  2020-04-01  ""       ""          ""         
#>  6 C101015     79 75 to 79 Male  2020-04-03  ""       ""          ""         
#>  7 C101097     33 30 to 34 Male  2020-03-27  ""       ""          ""         
#>  8 C101232     31 30 to 34 Male  2020-03-21  ""       "2020-03-2… "Recovered"
#>  9 C101376     30 30 to 34 Male  2020-04-11  ""       ""          ""         
#> 10 C101483     40 40 to 44 Fema… 2020-04-14  ""       ""          ""         
#> # … with 9,674 more rows, and 10 more variables: DateRepRem <chr>,
#> #   Admitted <chr>, RegionRes <chr>, ProvRes <chr>, CityMunRes <chr>,
#> #   RegionPSGC <chr>, ProvPSGC <chr>, CityMuniPSGC <chr>, HealthStatus <chr>,
#> #   Quarantined <chr>

To pull the data on cases for up to 1 May 2020, we use the function ph_get_cases() as follows:

ph_get_cases(version = "archive", date = "2020-05-01")
#> # A tibble: 8,772 x 18
#>    CaseCode   Age AgeGroup Sex   DateRepConf DateRecover DateDied RemovalType
#>    <chr>    <int> <chr>    <chr> <chr>       <chr>       <chr>    <chr>      
#>  1 C100119     30 30 to 34 Male  12-Apr-20   ""          ""       ""         
#>  2 C100264     57 55 to 59 Male  29-Mar-20   ""          ""       ""         
#>  3 C100648     33 30 to 34 Fema… 16-Apr-20   ""          ""       ""         
#>  4 C100660     42 40 to 44 Fema… 02-Apr-20   ""          ""       ""         
#>  5 C100776     42 40 to 44 Male  01-Apr-20   ""          ""       ""         
#>  6 C101015     79 75 to 79 Male  03-Apr-20   ""          ""       ""         
#>  7 C101097     33 30 to 34 Male  27-Mar-20   ""          ""       ""         
#>  8 C101232     30 30 to 34 Male  21-Mar-20   "25-Mar-20" ""       "Recovered"
#>  9 C101376     29 25 to 29 Male  11-Apr-20   ""          ""       ""         
#> 10 C101483     40 40 to 44 Fema… 14-Apr-20   ""          ""       ""         
#> # … with 8,762 more rows, and 10 more variables: DateRepRem <chr>,
#> #   Admitted <chr>, RegionRes <chr>, ProvRes <chr>, CityMunRes <chr>,
#> #   RegionPSGC <chr>, ProvPSGC <chr>, CityMuniPSGC <chr>, HealthStatus <chr>,
#> #   Quarantined <chr>

Pulling data from the Philippines Statistics Authority (PSA) 2015 Population Census (2015 POPCEN)

The last Philippines census was in 2015. A 2020 census was planned but due to the COVID-19 pandemic, this is most likely going to be put on hold. The Philippines Statistics Authority (PSA) provides an XLSX file for its updated 2020 population projections based on the 2015 POPCEN. This file is downloadable from the PSA website via this link.

A helper function with the same ph_get_ prefix is provided by the comoparams package - ph_get_psa2015_pop() - which downloads the XLSX file from the PSA website, reads the file and then extracts and re-structures the population data in the file to long format (tidy format).

The function can be called as follows:

linkToFile <- "https://psa.gov.ph/sites/default/files/attachments/hsd/pressrelease/Updated%20Population%20Projections%20based%20on%202015%20POPCEN_0.xlsx"

ph_get_psa2015_pop(file = linkToFile)
#> # A tibble: 87,567 x 6
#>    area         year age_category    total    male  female
#>    <chr>       <dbl> <chr>           <dbl>   <dbl>   <dbl>
#>  1 Philippines  2015 0-4          10803297 5582405 5220892
#>  2 Philippines  2015 5-9          10827294 5588769 5238525
#>  3 Philippines  2015 10-14        10478834 5397638 5081196
#>  4 Philippines  2015 15-19        10176450 5194725 4981725
#>  5 Philippines  2015 20-24         9453737 4788812 4664925
#>  6 Philippines  2015 25-29         8348285 4246632 4101653
#>  7 Philippines  2015 30-34         7331213 3750503 3580710
#>  8 Philippines  2015 35-39         6732896 3442349 3290547
#>  9 Philippines  2015 40-44         5840861 2991059 2849802
#> 10 Philippines  2015 45-49         5276700 2676602 2600098
#> # … with 87,557 more rows

The main limitation of the PSA population projections based on the 2015 POPCEN is that the age grouping only goes up 85+ whilst the CoMo Consortium Model requires population data with an age-structure that goes up to 95+. However, this can potentially be imputed to create the additional older age groupings if needed/wanted.

Pulling population, births and deaths data from the World Population Prospects 2019

The United Nations World Population Prospects 2019 provides population projections for 152 countries with a 5-year age grouping structure required by the CoMo Consortium Model which are available either as XLSX or as CSV files. The comoparams package provides three functions to extract these datasets. These functions follow the same prefix syntax of ph_get_ as the other pull data functions followed by a descriptor of the data that is being pulled. The descriptors used are:

  • wpp2019_pop - Five-year age group structured population by male and female

  • wpp2019_births - Number of births by age of mother in 5-year age groups

  • wpp2019_deaths - Number of deaths by 5-year age groups and by male and female

To pull the five-year age group structured population, we make a call for the following:

linkToFile <- "https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/CSV_FILES/WPP2019_PopulationByAgeSex_Medium.csv"

ph_get_wpp2019_pop(file = linkToFile, location = "Philippines")
#> # A tibble: 21 x 6
#>    area         year age_category    total     male  female
#>    <chr>       <int> <chr>           <dbl>    <dbl>   <dbl>
#>  1 Philippines  2020 0-4 y.o.     10616342 5450633  5165709
#>  2 Philippines  2020 5-9 y.o.     11397952 5846072  5551880
#>  3 Philippines  2020 10-14 y.o.   10906801 5578251  5328550
#>  4 Philippines  2020 15-19 y.o.   10462894 5407659  5055235
#>  5 Philippines  2020 20-24 y.o.   10104334 5191755  4912579
#>  6 Philippines  2020 25-29 y.o.    9479780 4807611  4672169
#>  7 Philippines  2020 30-34 y.o.    8247197 4166778. 4080419
#>  8 Philippines  2020 35-39 y.o.    7254730 3644549  3610181
#>  9 Philippines  2020 40-44 y.o.    6551963 3297341  3254622
#> 10 Philippines  2020 45-49 y.o.    5759264 2883722  2875542
#> # … with 11 more rows

To pull the number of births by age of mother in 5-year age groups, we make a call for the following:

linkToFile <- "https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/2_Fertility/WPP2019_FERT_F06_BIRTHS_BY_AGE_OF_MOTHER.xlsx"

ph_get_wpp2019_births(file = linkToFile, period = 2019)
#> # A tibble: 21 x 4
#>    area        year      age_category   birth
#>    <fct>       <fct>     <fct>          <dbl>
#>  1 Philippines 2015-2020 0-4 y.o.          NA
#>  2 Philippines 2015-2020 5-9 y.o           NA
#>  3 Philippines 2015-2020 10-14 y.o.        NA
#>  4 Philippines 2015-2020 15-19 y.o.   1358136
#>  5 Philippines 2015-2020 20-24 y.o.   2986917
#>  6 Philippines 2015-2020 25-29 y.o.   2594038
#>  7 Philippines 2015-2020 30-34 y.o.   2121630
#>  8 Philippines 2015-2020 35-39 y.o.   1239266
#>  9 Philippines 2015-2020 40-44 y.o.    488424
#> 10 Philippines 2015-2020 45-49 y.o.    100986
#> # … with 11 more rows

To pull the number of deaths by 5-year age groups and by male and female, we make a call for the following:

linkToFile <- "https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/3_Mortality/WPP2019_MORT_F04_1_DEATHS_BY_AGE_BOTH_SEXES.xlsx"

ph_get_wpp2019_deaths(file = linkToFile, period = 2019)
#> # A tibble: 21 x 4
#>    area        year      age_category  death
#>    <chr>       <chr>     <chr>         <dbl>
#>  1 Philippines 2015-2020 0-4 y.o.     306206
#>  2 Philippines 2015-2020 5-9 y.o.      34260
#>  3 Philippines 2015-2020 10-14 y.o.    28539
#>  4 Philippines 2015-2020 15-19 y.o.    59008
#>  5 Philippines 2015-2020 20-24 y.o.    78740
#>  6 Philippines 2015-2020 25-29 y.o.    77908
#>  7 Philippines 2015-2020 30-34 y.o.    81415
#>  8 Philippines 2015-2020 35-39 y.o.    96548
#>  9 Philippines 2015-2020 40-44 y.o.   121537
#> 10 Philippines 2015-2020 45-49 y.o.   160660
#> # … with 11 more rows

Calculate values

For some of the required parameters, further calculation and processing of data is required. Specifically, the cases data needs to be processed into number of cases and number of deaths per day since the start of the COVID-19 pandemic in the Philippines. The cases data also needs to be processed to calculate the infection fatality rate (IFR) and the infection hospitalisation rate (IHR) by 5-year age groups. Two calculate functions are available in comoparams for this purpose. Both functions have the same prefix syntax of ph_calculate_ followed by a descriptor for the type of output calculation it will produce - cases for daily cases and deaths since the first reported case in the Philippines and rates for the IFR and IHR output.

The daily cases, deaths and recoveries output can be produced as follows:

ph_get_cases() %>% ph_calculate_cases()
#> # A tibble: 175 x 4
#>    repDate    cases deaths recovered
#>    <date>     <dbl>  <dbl>     <dbl>
#>  1 2020-01-01     0      0         0
#>  2 2020-01-02     0      0         0
#>  3 2020-01-03     0      0         0
#>  4 2020-01-04     0      0         0
#>  5 2020-01-05     0      0         0
#>  6 2020-01-06     0      0         0
#>  7 2020-01-07     0      0         0
#>  8 2020-01-08     0      0         0
#>  9 2020-01-09     0      0         0
#> 10 2020-01-10     0      0         0
#> # … with 165 more rows

The IFR and IHR output can be produced as follows:

ph_get_cases() %>% ph_calculate_rates()
#> # A tibble: 21 x 8
#>    age_category deaths deathsAdmitted admissions cases     ifr    ihr     hfr
#>    <fct>         <dbl>          <dbl>      <dbl> <dbl>   <dbl>  <dbl>   <dbl>
#>  1 0-5 y.o.         14              7         93   579 0.0242  0.161  0.0753 
#>  2 5-10 y.o.         3              2         32   372 0.00806 0.0860 0.0625 
#>  3 10-15 y.o.        2              2         41   558 0.00358 0.0735 0.0488 
#>  4 15-20 y.o.        9              9        104   895 0.0101  0.116  0.0865 
#>  5 20-25 y.o.        4              3        355  2640 0.00152 0.134  0.00845
#>  6 25-30 y.o.       21              7        591  4097 0.00513 0.144  0.0118 
#>  7 30-35 y.o.       13              7        689  4175 0.00311 0.165  0.0102 
#>  8 35-40 y.o.       34             21        503  3127 0.0109  0.161  0.0417 
#>  9 40-45 y.o.       50             35        453  2804 0.0178  0.162  0.0773 
#> 10 45-50 y.o.       55             34        521  2674 0.0206  0.195  0.0653 
#> # … with 11 more rows

Parameters setting

The final set of functions in the comoparams package is the parameters setting functions. This set contains majority of the functions within comoparams (a total of 19 functions).

The parameter settings functions again uses a common prefix syntax of ph_set_ followed by the descriptor of the parameter that is being defined or set. This syntax applies to 18 of the 19 functions in the set. The descriptors are:

Descriptor Definition
cases Set the cases parameters using the ph_calculate_cases() function to process, calculate and output the cases data in the appropriate parameters format.
severe Set the severity-mortality parameters using the ph_calculate_rates() function to process, calculate and output the cases data in the appropriate parameters format.
population Set the populations parameters using the ph_get_population() function to process and output the population data in the appropriate parameters format.
general Set the general and country parameters
virus Set the virus parameters
hospital Set the hospitalisation parameters
lockdown Set the lockdown intervention parameters
isolation Set the self-isolation intervention parameters
distancing Set the social distancing intervention parameters
handwashing Set the handwashing intervention parameters
work Set the work from home intervention parameters
school Set the schools closure intervention parameters
elderly Set the shielding the elderly intervention parameters
travel Set the travel ban intervention parameters
quarantine Set the voluntary home quarantine parameters
vaccination Set the vaccination intervention parameters
interventions Set all the intervention parameters using all the single intervention parameter setting functions

The functions with this syntax are interactive command line functions that guide the user through the various parameters required and facilitates the user to input values for the needed parameters. In this set, there is a summary function (ph_set_params) that pulls together all the interactive command line functions and guides the user in specifying all the parameters.

Finally, the 19th function in the parameter setting function set is a function that takes the output of ph_set_params() and then converts it into the CoMo Consortium Model parameters template format in an XLSX workbook and saves it as an XLSX file in the specified directory.

ph_create_params(params = ph_set_params(), path = ".")