-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small corrections for metadata and partial data #480
Small corrections for metadata and partial data #480
Conversation
Renaming some of the field types to their OPTIMADE name rather than JSON name. Co-authored-by: Andrius Merkys <[email protected]>
This reverts commit 0c5f9cb.
optimade.rst
Outdated
@@ -3684,16 +3684,16 @@ The header object MUST contain the keys: | |||
|
|||
It MUST contain the following key: | |||
|
|||
- :field:`"format"`: String. | |||
- :field:`"version"`: String. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already used the field name "format" in section 3.10 Transmission of large property values. So I do not think we should use it here again with a different meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - it is indeed potentially confusing that these two fields are named the same. However, the use of "format" here reflects the same fieldname in the property definitions, and the design idea is the same: future versions of OPTIMADE may want to make this field more expressive than simply the OPTIMADE version (which is why the next line of the text says how to handle if the string cannot be parsed as a version number). For this, I still think "format" is the best name.
So, can we change the other "format" instead? I find it a bit tricky to come up with a good name. How about "protocol"? "data_protocol"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed "version" back into "format".
I will change the format field in section 3.10 into "file_format" as that seems to be what it describes in the example.
Protocol is more general than file format, so it would probably be better to use that for a method to share larger data items in a manner that is very different from the current partial data format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about "data_format" as an alternative name for the property in Section 3.10? This term is already used in the human-readable portion of the specification, e.g. <...> The default partial data format is named "jsonlines" and is described <...>
?
However, I also find the name "format" in the current context a little bit unintuitive (makes me think about file formats and data types instead of version-like strings). Maybe something like "format_version" would also be viable? (I understand that it might be too late to change it due to "format" already being used in property definitions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right that the naming may not be perfect here, and since we haven't released any of this yet, it isn't too late to change if we can find something that is consistently better. It may require a 'find + sed' through the not-yet-merged property definition source files, but that is not so much work.
data_format
seems more accurate than file_format
, but I think the other "format" field name in that case also needs a qualifier (which is why I tried to get rid of "format" completely, e.g., "protocol").
But, at some level I wonder if these things describing "what is the format of this thing" shouldn't just be the same? The structured way to go would be that format
is a dictionary with a name
and a version
, potentially with the version
part optional. However, version
is in that case also not the ideal name for that subfield, since we want to chop off the patch part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the format consists of two parts.
-
The part we define in Optimade, i.e. the names and definitions of the fields and how these are arranged in data structures
-
The container format, i.e. how data types and data structures are stored in a file / response.
The current format field under “partial_data_links” is most similar to the container format, so we should perhaps rename that field to “container_format”.
Elsewhere, we have however also used the term “response_format” so that is also an option.
The “format” field that we supply in the partial data response mostly corresponds with point 1.
It is meant to describe which version of the optimade specification was used to construct the partial data response. At the moment this field gives the optimade version but not which protocol was used.
Now, we have only one way of returning large data items, so it is not necessary to specify that the partial data protocol is used. If in the future we, however, come up with more methods to return data, it could be useful to have a field specifying the used method. So I think it would be better to keep the version number and the used method separate, like @rartino already suggested.
The structured way to go would be that format is a dictionary with a name and a version
I am also not happy with “property-format” as a field name in the property definitions. As it describes the version of the definition of the property and not how the property is formatted.
data_format
sound a bit vague to me, so I do not like that name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these fixes. The only thing I think needs to be handled before merging this is to agree what to do with the double "format" fields where - as you can see below - I don't agree with changing the name of the field embedded in the data. (But if others chime in and agree with plurals in the type declaration, I suggest that be reverted too).
suggestions Vaitkus Co-authored-by: Antanas Vaitkus <[email protected]>
…ger discussion about this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but we need to figure out in a separate PR what to do with the double use of 'format'.
I still have some small corrections for the metadata PR #463 and partial data PR #467