Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add v3.0.0 schema with initial sample schema #72

Merged
merged 20 commits into from
Jun 19, 2024
Merged

add v3.0.0 schema with initial sample schema #72

merged 20 commits into from
Jun 19, 2024

Conversation

annakrystalli
Copy link
Member

@annakrystalli annakrystalli commented Apr 29, 2024

This PR contains the implementation of sample output types detailed here.

In summary, this PR:

  • Introduces new sample output type id schema specification in tasks.json. The main breaking change is the removal of the output_type_id property in sample. Instead, the collection of samples is defined through a new output_type_id_params object.
  • The repository_url and repository_host properties in admin.json have been deprecated in favour of a sigle repository object with separate host, owner and name properties.

Replaces #71 as it does introduce breaking changes, therefore v3.0.0 is more approriate

@annakrystalli annakrystalli marked this pull request as draft April 29, 2024 12:40
@annakrystalli annakrystalli changed the title [WIP] add v3.0.0 schema with initial sample schema add v3.0.0 schema with initial sample schema Apr 29, 2024
@annakrystalli annakrystalli marked this pull request as ready for review April 29, 2024 14:02
@annakrystalli annakrystalli marked this pull request as draft April 29, 2024 14:02
@annakrystalli annakrystalli marked this pull request as ready for review April 30, 2024 16:03
@annakrystalli annakrystalli marked this pull request as draft April 30, 2024 16:03
@annakrystalli annakrystalli marked this pull request as ready for review May 1, 2024 08:40
@annakrystalli annakrystalli requested a review from bsweger May 1, 2024 08:40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since seeing the changes can be tricky when reviewing new schemas, adding a diff output for other reviewers:

image

Text-version:

diff v2.0.1/admin-schema.json v3.0.0/admin-schema.json
3c3
<     "$id": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.1/admin-schema.json",
---
>     "$id": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v3.0.0/admin-schema.json",
10c10
<             "examples": ["https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v0.0.1/admin-schema.json"],
---
>             "examples": ["https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v3.0.0/admin-schema.json"],
37,47c37,57
<         "repository_host": {
<             "description": "The name of the web host for the hub repository",
<             "type": "string",
<             "enum": [
<                 "GitHub"
<             ]
<         },
<         "repository_url": {
<             "description": "The url for the hub repository.",
<             "type": "string",
<             "examples": ["https://github.com/reichlab/covid19-forecast-hub"]
---
>         "repository": {
>             "description": "Object containing details of the hub repository.",
>             "examples": [{
>                 "host": "github",
>                 "owner": "Infectious-Disease-Modeling-Hubs",
>                 "name": "example-simple-forecast-hub"
>             }],
>             "type": "object",
>             "properties": {
>                 "host": {
>                     "description": "The name of the web host for the hub repository",
>                     "type": "string",
>                     "enum": ["github"]
>                 },
>                 "owner": {
>                     "type": "string"
>                 },
>                 "name": {
>                     "type": "string"
>                 }
>             }

Copy link
Contributor

@bsweger bsweger May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diff for tasks-schema.json isn't quite as helpful, so posting in text only:

diff v2.0.1/tasks-schema.json v3.0.0/tasks-schema.json
3c3
<     "$id": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v2.0.1/tasks-schema.json",
---
>     "$id": "https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v3.0.0/tasks-schema.json",
10c10
<             "examples": ["https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v0.0.1/tasks-schema.json"],
---
>             "examples": ["https://raw.githubusercontent.com/Infectious-Disease-Modeling-Hubs/schemas/main/v3.0.0/tasks-schema.json"],
1102,1103c1102,1103
<                                                 "output_type_id": {
<                                                     "description": "Object containing required and optional arrays specifying valid sample values.",
---
>                                                 "output_type_id_params": {
>                                                     "description": "Object containing parameters specifying how samples where drawn.",
1105,1123c1105,1119
<                                                         "required": [
<                                                             1,
<                                                             2,
<                                                             3,
<                                                             4,
<                                                             5,
<                                                             6,
<                                                             7,
<                                                             8,
<                                                             9,
<                                                             10
<                                                         ],
<                                                         "optional": [
<                                                             11,
<                                                             12,
<                                                             13,
<                                                             14,
<                                                             15
<                                                         ]
---
>                                                         "output_type_id_params": {
>                                                             "is_required": true,
>                                                             "type": "integer",
>                                                             "min_samples_per_task": 100,
>                                                             "max_samples_per_task": 100
>                                                         }
>                                                     }, {
>                                                         "output_type_id_params": {
>                                                             "is_required": false,
>                                                             "type": "character",
>                                                             "max_length": 6,
>                                                             "min_samples_per_task": 100,
>                                                             "max_samples_per_task": 500,
>                                                             "compound_taskid_set" : ["origin_date", "horizon", "location", "variant"]
>                                                         }
1127,1137c1123,1125
<                                                         "required": {
<                                                             "description": "Array of unique sample indexes that must be present for submission to be valid. Can be null if no sample indexes are required and all valid sample indexes are specified in the optional property.",
<                                                             "type": [
<                                                                 "array",
<                                                                 "null"
<                                                             ],
<                                                             "uniqueItems": true,
<                                                             "items": {
<                                                                 "type": "integer",
<                                                                 "minimum": 1
<                                                             }
---
>                                                         "is_required": {
>                                                             "description": "Boolean. Whether inclusion of samples are required for the submission to be valid",
>                                                             "type": "boolean"
1139,1140c1127,1151
<                                                         "optional": {
<                                                             "description": "Array of valid but not required unique sample indexes. Can be null if all sample indexes are required and are specified in the required property.",
---
>                                                         "type": {
>                                                             "description": "Data type of sample indices.",
>                                                             "type": "string",
>                                                             "enum": [
>                                                                 "character",
>                                                                 "integer"
>                                                             ]
>                                                         },
>                                                         "max_length": {
>                                                             "description": "Required only if 'type' is 'character'. Positive integer representing the maximum number of characters in a sample index. Ignored if 'type' is 'integer'.",
>                                                             "type": "integer",
>                                                             "minimum": 1
>                                                         },
>                                                         "min_samples_per_task": {
>                                                             "description": "The minimum number of samples per individual task.",
>                                                             "type": "integer",
>                                                             "minimum": 1
>                                                         },
>                                                         "max_samples_per_task": {
>                                                             "description": "The maximum number of samples per individual task.",
>                                                             "type": "integer",
>                                                             "minimum": 1
>                                                         },
>                                                         "compound_taskid_set": {
>                                                             "description": "Optional. Specifies whether validation should factor in the presence of a compound modeling task. Each item of the array must be a task id variable name. If unspecified, defaults to all task ID variables.",
1142,1143c1153
<                                                                 "array",
<                                                                 "null"
---
>                                                                 "array"
1147,1148c1157
<                                                                 "type": "integer",
<                                                                 "minimum": 1
---
>                                                                 "type": "string"
1149a1159
>
1153,1155c1163,1175
<                                                         "required",
<                                                         "optional"
<                                                     ]
---
>                                                         "is_required",
>                                                         "type",
>                                                         "min_samples_per_task",
>                                                         "max_samples_per_task"
>                                                     ],
>                                                     "if": {
>                                                         "properties": {
>                                                             "type": { "const": "character" }
>                                                         }
>                                                     },
>                                                     "then": {
>                                                         "required": ["max_length"]
>                                                     }
1173c1193,1197
<                                                             "description": "The minimum inclusive valid sample value from the predictive distribution"
---
>                                                             "description": "The minimum inclusive valid sample value from the predictive distribution.",
>                                                             "type": [
>                                                                 "number",
>                                                                 "integer"
>                                                             ]
1176c1200,1204
<                                                             "description": "The maximum inclusive valid sample value from the predictive distribution"
---
>                                                             "description": "The maximum inclusive valid sample value from the predictive distribution.",
>                                                             "type": [
>                                                                 "number",
>                                                                 "integer"
>                                                             ]
1185c1213
<                                                 "output_type_id",
---
>                                                 "output_type_id_params",

Copy link
Contributor

@bsweger bsweger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, it's been a few weeks since I've been in the weeds with our proposed sample data changes, but this reflects what I remember. Thanks--very exciting!

v3.0.0/tasks-schema.json Outdated Show resolved Hide resolved
v3.0.0/tasks-schema.json Outdated Show resolved Hide resolved
@elray1
Copy link
Contributor

elray1 commented May 6, 2024

made one minor suggestion but this looks good to me. thanks!

"integer"
]
},
"max_length": {
Copy link
Contributor

@bsweger bsweger May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear on the problem we're trying to solve with max_length, especially when hub admins who want to enforce a specific sample index format will need to write a custom validation anyway.

It feels a bit like we're over-optimizing on a guardrail when we don't know if it's truly needed. One more thing to think about, validate, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

@nickreich nickreich requested review from bsweger and elray1 May 29, 2024 13:32
@annakrystalli annakrystalli merged commit 0f35d3b into main Jun 19, 2024
2 checks passed
@annakrystalli annakrystalli deleted the br-v3.0.0 branch June 19, 2024 05:54
@annakrystalli annakrystalli restored the br-v3.0.0 branch June 19, 2024 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
4 participants