-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Narrow JSON type, ensure that to_dict
always returns a dict, and v2 filter / compressor parsing
#2179
Narrow JSON type, ensure that to_dict
always returns a dict, and v2 filter / compressor parsing
#2179
Conversation
Given dictionaries are mutable and supported, is there another reason not to support lists? If you want to keep the typing simple, perhaps |
I would love an immutable dict, but that's not practical in python. So we have no choice but to include dicts in this type. And the goal here is not actually keeping typing simple but rather keeping runtime simple. Picking just 1 concrete type (tuple) for modelling JSON arrays removes complexity, and it's free because we control all the code that returns instances of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall to me - I'm not sure about widening the input types on the parsing functions though, as then the benefit of typing is lost. What was the motivation for setting these all to object
instead of the narrower types that the functions are designed to parse?
the current model is that the parsing functions are designed to handle entirely unknown objects and parse those objects into something with the correct value and type. So these functions should accept any python object, but only return a value when parsing is successful, and otherwise raise an exception. From that POV, placing any constraints on the input type is stealing work from the body of the parsing function :) |
I'm not sure I understand what the downside is of "stealing work from the function"? The advantage of including the narrower types is finding errors before running code. That's kind of the point of typing - you can pass whatever you want in Python, but by adding type hints it's easier to see where you're passing the wrong objects. |
And where these parsing functions are used, they are called with objects that already have a narrow type, e.g., here zarr-python/src/zarr/codecs/blosc.py Lines 95 to 105 in 52d6849
So given these aren't being used with objects that have a general type, I think it is better to give the parsing functions narrower types as input, to make sure we're defining the narrow types consistently throughout the codebase? |
For context, these parsing functions are designed for handling the contents of JSON documents that we are reading from external sources. We have no idea what is inside those JSON documents. We can be pretty confident about that the output of python's JSON deserialization will give us dicts, lists, etc, None, etc but the particular structure is unknown. The job of the Suppose we do |
Ah that makes sense, thanks for the patient explanation 👍 I didn't realise these were used to parse arbitrary JSON. |
no worries! thanks for looking things over carefully. In terms of the design of the |
continuing this conversation in the main thread: I removed the inheritance from @normanrz your eyes would be appreciated here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 @d-v-b - fantastic set of simplifications here!
This seems to be causing |
@@ -315,15 +313,9 @@ async def _create_v2( | |||
chunks=chunks, | |||
order=order, | |||
dimension_separator=dimension_separator, | |||
fill_value=0 if fill_value is None else fill_value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restoring the 0 if fill_value is None else fill_value
does fix the failure in tests/v3/test_array.py::test_serializable_sync_array.
But that said, I like the change here... Setting a default fill value here seems wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, but this is just for v2
metadata, which is allowed to be None. Maybe this isn't the wrong spot to do that.
A typing / metadata cleanup now that #2163 is in.
narrowing the JSON type
In
v3
, the JSON type looks like this:JSON = None | str | int | float | Enum | dict[str, "JSON"] | list["JSON"] | tuple["JSON", ...]
I narrowed this type to
JSON = None | str | int | float | Mapping[str, "JSON"] | tuple["JSON", ...]
list
ANDtuple
; astuple
is immutable, I got rid of thelist
variant.dict[str, JSON]
toMapping[str, JSON]
out of the same preference for immutability.Enum
is a back door around type safety, so I removed it; instead of serializing enums, the relevant classes serialize the enum values. Maybe we could putEnum
back inJSON
if there's a way to tell the type checker that our enums only wrap JSON values.to_dict
methodsAlong with narrowing the JSON type, I ensured that all instances of
Metadata
returndict[str, JSON]
from theirto_dict
methods.type hints
There were a lot of functions in
metadata
that acceptedAny
, whenobject
is a better type hint. I adjusted these accordingly.v2 filters and codecs
We weren't doing any validation of filters and compressors for v2 metadata. This PR adds that missing validation, and I found a buggy test as a result of adding these checks. I also changed
ArrayV2Metadata.filters
andArrayV2Metadata.compressor
to be instances ofnumcodecs.abc.Codec
instead of dicts, bringingArrayV2Metadata
closer toArrayV3Metadata
and simplifying some codec parsing code.TODO: