Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small corrections for metadata and partial data #480

Merged

Conversation

JPBergsma
Copy link
Contributor

I still have some small corrections for the metadata PR #463 and partial data PR #467

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Show resolved Hide resolved
optimade.rst Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
JPBergsma and others added 4 commits June 26, 2023 14:44
optimade.rst Outdated
@@ -3684,16 +3684,16 @@ The header object MUST contain the keys:

It MUST contain the following key:

- :field:`"format"`: String.
- :field:`"version"`: String.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already used the field name "format" in section 3.10 Transmission of large property values. So I do not think we should use it here again with a different meaning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - it is indeed potentially confusing that these two fields are named the same. However, the use of "format" here reflects the same fieldname in the property definitions, and the design idea is the same: future versions of OPTIMADE may want to make this field more expressive than simply the OPTIMADE version (which is why the next line of the text says how to handle if the string cannot be parsed as a version number). For this, I still think "format" is the best name.

So, can we change the other "format" instead? I find it a bit tricky to come up with a good name. How about "protocol"? "data_protocol"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed "version" back into "format".
I will change the format field in section 3.10 into "file_format" as that seems to be what it describes in the example.
Protocol is more general than file format, so it would probably be better to use that for a method to share larger data items in a manner that is very different from the current partial data format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about "data_format" as an alternative name for the property in Section 3.10? This term is already used in the human-readable portion of the specification, e.g. <...> The default partial data format is named "jsonlines" and is described <...>?

However, I also find the name "format" in the current context a little bit unintuitive (makes me think about file formats and data types instead of version-like strings). Maybe something like "format_version" would also be viable? (I understand that it might be too late to change it due to "format" already being used in property definitions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that the naming may not be perfect here, and since we haven't released any of this yet, it isn't too late to change if we can find something that is consistently better. It may require a 'find + sed' through the not-yet-merged property definition source files, but that is not so much work.

data_format seems more accurate than file_format, but I think the other "format" field name in that case also needs a qualifier (which is why I tried to get rid of "format" completely, e.g., "protocol").

But, at some level I wonder if these things describing "what is the format of this thing" shouldn't just be the same? The structured way to go would be that format is a dictionary with a name and a version, potentially with the version part optional. However, version is in that case also not the ideal name for that subfield, since we want to chop off the patch part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the format consists of two parts.

  1. The part we define in Optimade, i.e. the names and definitions of the fields and how these are arranged in data structures

  2. The container format, i.e. how data types and data structures are stored in a file / response.

The current format field under “partial_data_links” is most similar to the container format, so we should perhaps rename that field to “container_format”.
Elsewhere, we have however also used the term “response_format” so that is also an option.

The “format” field that we supply in the partial data response mostly corresponds with point 1.
It is meant to describe which version of the optimade specification was used to construct the partial data response. At the moment this field gives the optimade version but not which protocol was used.

Now, we have only one way of returning large data items, so it is not necessary to specify that the partial data protocol is used. If in the future we, however, come up with more methods to return data, it could be useful to have a field specifying the used method. So I think it would be better to keep the version number and the used method separate, like @rartino already suggested.

The structured way to go would be that format is a dictionary with a name and a version

I am also not happy with “property-format” as a field name in the property definitions. As it describes the version of the definition of the property and not how the property is formatted.

data_format sound a bit vague to me, so I do not like that name.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@rartino rartino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these fixes. The only thing I think needs to be handled before merging this is to agree what to do with the double "format" fields where - as you can see below - I don't agree with changing the name of the field embedded in the data. (But if others chime in and agree with plurals in the type declaration, I suggest that be reverted too).

Copy link
Contributor

@rartino rartino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but we need to figure out in a separate PR what to do with the double use of 'format'.

@rartino rartino merged commit 4269815 into Materials-Consortia:develop Dec 19, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants