Submission format

Projections should be stored as a parquet file in your model-output/team-model folder.

The parquet file must use a standardised file name, and contain specific variable names and values which identify the projections you are submitting. The automatic check validates both the filename and file contents to ensure the file is correct.

File name

Each projection file within the subdirectory should have the following name format:

<round_id>-<team>-<model>.parquet

The <round_id> is defined uniquely for each submission round and disease. It is composed by the season_cycle, identifying the season and the submission cycle, and the disease indicator. The team and model in this file name must match the name of the model-output directory this file is in (and correspond to the team_abbr and model_abbr parameters in the metadata file).

File format

Required variables

The parquet file must be contain only the following columns (in any order). No additional columns are allowed.

column	data type	description
`round_id`	string	The id of the submission round, e.g. '2024_2025_1_FLU', composed by the season cycle ('2024_2025_1') plus the disease ('FLU'). Will be defined for each round.
`scenario_id`	string	Id of the scenario as described in the round specifications (e.g. 'A', 'B', ...).
`target`	string	One of the targets defined/allowed for the round.
`location`	string	One of the ISO 3166-1 alpha-2 (ISO-2) geocodes for the European country. We provide a geocode file to convert between country names and ISO-2 codes or, if using R, you can use the countrycode package.
`pop_group`	string	The age bin, or another population breakdown identifier, as defined in the round specs.
`horizon`	integer	Values in the horizon column must be an integer indicating the weeks ahead from the origin date corresponding to the predicted value. Each week starts on Monday and ends on Sunday. For more details check the template file for CSV files converting between dates and ISO weeks.
`target_end_date`	date	Target date corresponding to the projected value. Values must be a date in the format `YYYY-MM-DD`.
`output_type`	string	One of "quantile" or "sample".
`output_type_id`	string	When `output_type = "sample"` shall be a value from 1 to 300 identifying the stochastic run for sample data. When `output_type = "quantile"`, one of the 23 accepted quantiles, i.e. 0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990 as a string .
`value`	double	The value of the prediction for the given target.

(*): The origin date of the scenario simulations will be defined for each round and season_cycle and mentioned explicitly in the github Wiki documentation.

Parquet file format

The "arrow" library can be used to read/write the parquet files in R and in Python, where "pandas" library can be used as well.

For example, in R you can load "arrow" and then:

library("arrow")

file_name <- ”model-output/team-model/round_id-team-model.parquet”

# To read "parquet" file format
arrow::read_parquet(filename)

# To write "parquet" file format
arrow::write_parquet(df, file_name)

The following code does the same but using Python and "pandas":

import pandas as pd

file_name = 'model-output/team-model/round_id-team-model.parquet'

# To read "parquet" file format:
df = pd.read_parquet(file_name)

# Write "parquet" file format
df.to_parquet(file_name)

Contact us

If you encounter any problems at any stage, or have any questions, please get in touch:

Send us an email
Open a new Issue

Home

How to Join RespiCompass
Preparing to Submit
- Metadata
Submission Format
Submitting
- Submitting using GitHub Website
- Submitting using GitHub Command Line
Rounds
- Season 2024-2025 / Round 1
  - COVID-19
  - Influenza

Provide feedback

Saved searches

Use saved searches to filter your results more quickly