-
Notifications
You must be signed in to change notification settings - Fork 12
Submission format
Projections should be stored as a parquet file in your model-output/team-model
folder.
The parquet file must use a standardised file name, and contain specific variable names and values which identify the projections you are submitting. The automatic check validates both the filename and file contents to ensure the file is correct.
Each projection file within the subdirectory should have the following name format:
<round_id>-<team>-<model>.parquet
The <round_id>
is defined uniquely for each submission round and disease. It is composed by the season_cycle
, identifying the season and the submission cycle, and the disease
indicator.
The team
and model
in this file name must match the name of the model-output
directory this file is in (and correspond to the team_abbr
and model_abbr
parameters in the metadata file).
The parquet file must be contain only the following columns (in any order). No additional columns are allowed.
column | data type | description |
---|---|---|
round_id |
string | The id of the submission round, e.g. '2024_2025_1_FLU', composed by the season cycle ('2024_2025_1') plus the disease ('FLU'). Will be defined for each round. |
scenario_id |
string | Id of the scenario as described in the round specifications (e.g. 'A', 'B', ...). |
target |
string | One of the targets defined/allowed for the round. |
location |
string | One of the ISO 3166-1 alpha-2 (ISO-2) geocodes for the European country. We provide a geocode file to convert between country names and ISO-2 codes or, if using R, you can use the countrycode package. |
pop_group |
string | The age bin, or another population breakdown identifier, as defined in the round specs. |
horizon |
integer | Values in the horizon column must be an integer indicating the weeks ahead from the origin date corresponding to the predicted value. Each week starts on Monday and ends on Sunday. For more details check the template file for CSV files converting between dates and ISO weeks. |
target_end_date |
date | Target date corresponding to the projected value. Values must be a date in the format YYYY-MM-DD . |
output_type |
string | One of "quantile" or "sample". |
output_type_id |
string | When output_type = "sample" shall be a value from 1 to 300 identifying the stochastic run for sample data. When output_type = "quantile" , one of the 23 accepted quantiles, i.e. 0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 as a string . |
value |
double | The value of the prediction for the given target. |
(*): The origin date of the scenario simulations will be defined for each round and season_cycle and mentioned explicitly in the github Wiki documentation.
The "arrow" library can be used to read/write the parquet files in R and in Python, where "pandas" library can be used as well.
For example, in R you can load "arrow" and then:
library("arrow")
file_name <- ”model-output/team-model/round_id-team-model.parquet”
# To read "parquet" file format
arrow::read_parquet(filename)
# To write "parquet" file format
arrow::write_parquet(df, file_name)
The following code does the same but using Python and "pandas":
import pandas as pd
file_name = 'model-output/team-model/round_id-team-model.parquet'
# To read "parquet" file format:
df = pd.read_parquet(file_name)
# Write "parquet" file format
df.to_parquet(file_name)
You can consult an example model output parquet file for further guidance.
- How to Join RespiCompass
- Preparing to Submit
- Submission Format
- Submitting
- Rounds