Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplifying the getattribute mechanism #631

Open
AdrienVannson opened this issue Oct 18, 2024 · 0 comments
Open

Simplifying the getattribute mechanism #631

AdrienVannson opened this issue Oct 18, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@AdrienVannson
Copy link
Contributor

AdrienVannson commented Oct 18, 2024

Summary

Hello,

I have a suggestion that I can probably implement myself if you agree to do it. However, I would like to have your opinion on it before working on it.

Currently, messages redefine their getattribute function. This is used for two reasons:

  • It allows raising an AttributeError when a field that is part of a oneof but is not the active one is accessed
  • It allows replacing the Placeholder values by the default value when the field is accessed.

However, this mechanism has several important issues, and some of them cannot be fixed without switching to a different method:

  • Redefining getattribute is very slow. Each time a field of a message is accessed, this function is called. This is of course the case for the proto fields, but getattribute is also called when other attributes such as the ones containing the metadata are accessed. Plus, each call is slow since getattribute performs queries to dictionaries. In a small benchmark that I did, I found out that a very significant part of the time spent when decoding a message was spent in this function. This is also a problem when using the rust codec.
  • The lack of support of unset messages (as they are automatically replaced by empty messages) makes it impossible to distinguish between an empty message and no message at all. This should be possible since the protobuf documentation states that the presence of message fields should always be tracked, even when the message is not declared as optional (see https://protobuf.dev/programming-guides/field_presence/ ).
  • Match are difficult to type-check since the semantic used by betterproto differs from the one used by the type-checkers, who don’t consider the getattribute function One-of pattern matching not supported by static type analysis tools #601
  • More generally, the behaviour of this mechanism is not intuitive at all in my opinion: reading the value of an attribute may change the value of the message (by adding an empty file). I believe that a simple read should not have this behaviour.
    These issues are so important that they made someone I know prefer removing all these functionalities and deal with the placeholders by hands… which is far from being ideal.
    I would rather do more of less the following:
  • For primitive fields (int, str, float, …), nothing much changes apart from the default value: it becomes 0, “”, 0.0, … instead of PLACEHOLDER
  • Messages fields are always declared as optional (which is coherent with the protobuf documentation). When no value is provided, None is used instead. None and an empty message are therefore different values, which is expected. A consequence is that empty messages are not created on the fly when accessing unset fields, but I think it is better to have this behaviour anyway.
  • The uniqueness of one of fields is not enforced during the construction of the object (since getattribute is no longer used). It becomes the responsibility of the user to construct a valid message. The message can still be validated before being sent, but not during its construction since this can be really expensive. This would be a bit similar to what Pydantic does: the validators are applied in the constructor, but they are not applied on field assignments by default due to the overhead of redefining setattribute.
    The problem with these updates is that they would also require a modification of the rust codec (but it should be possible if handled correctly). Another small drawback is that it would break compatibility for the programs that rely on these functionalities. However, I think that as long as the change is properly documented, it would not be a lot of work to adapt the programs. Plus, it is the right time to make such a change as the v2 is not yet released.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants
@AdrienVannson and others