Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Serialization and Deserialization of Union Types #584

Open
othmane099 opened this issue Mar 29, 2024 · 5 comments
Open

Incorrect Serialization and Deserialization of Union Types #584

othmane099 opened this issue Mar 29, 2024 · 5 comments

Comments

@othmane099
Copy link

I encountered an issue while using the dataclasses-avroschema package for Avro serialization in Python. When attempting to serialize and deserialize a dataclass with a union type using dataclasses_avroschema, the deserialized object doesn't match the expected type.

from dataclasses_avroschema import AvroModel
from dataclasses import dataclass
import typing

@dataclass
class MessageTypeTwo(AvroModel):
    val: typing.Union[None, str]
    class Meta:
        namespace = "Messages.type.two"

@dataclass
class MessageTypeOne(AvroModel):
    class Meta:
        namespace = "Messages.type.one"

@dataclass
class CoreMessage(AvroModel):
    messageBody: typing.Union[
        MessageTypeOne,
        MessageTypeTwo,
    ]

Serialize and deserialize an instance of CoreMessage with an instance of MessageTypeTwo:

mt2 = MessageTypeTwo(val="val")
core_message = CoreMessage(messageBody=mt2)
serialized = core_message.serialize()
deserialized = CoreMessage.deserialize(serialized)
print(deserialized.messageBody)

Expected Result: The print statement should output MessageTypeTwo(val='val').

Actual Result: The print statement outputs MessageTypeOne().

@marcosschroh
Copy link
Owner

@othmane099 This is the expected behavior. You have to set a different dacite config

@othmane099
Copy link
Author

By adding the following dacite_config settings to CoreMessage:

@dataclass
class CoreMessage(AvroModel):
    messageBody: typing.Union[
        MessageTypeOne,
        MessageTypeTwo,
    ]

    class Meta:
        dacite_config = {
            "strict_unions_match": True,
            "strict": True,
        }

The previous example (MessageTypeTwo) works as expected. However, in the case of MessageTypeOne:

mt1 = MessageTypeOne()
core_message = CoreMessage(messageBody=mt1)
serialized = core_message.serialize()
deserialized = CoreMessage.deserialize(serialized)
print(deserialized.messageBody)

an exception dacite.exceptions.StrictUnionMatchError: can not choose between possible Union matches for field "messageBody": MessageTypeOne, MessageTypeTwo

Could the issue be attributed to the absence of fields within the MessageTypeOne class?

@marcosschroh
Copy link
Owner

marcosschroh commented Mar 30, 2024

Not really. In your case if you use the following config it should work

class Meta:
    dacite_config = {
        "strict": True,
    }

In any case, sometimes it is not possible to cover all cases so you will have to play with differentt dacite config

@othmane099
Copy link
Author

othmane099 commented Mar 30, 2024

Thanks for your answer. I encountered another case where the types have same attribute name, for example:

from dataclasses_avroschema import AvroModel
from dataclasses import dataclass
import typing

@dataclass
class MessageTypeTwo(AvroModel):
    val: str

    class Meta:
        namespace = "Messages.type.two"

@dataclass
class MessageTypeOne(AvroModel):
    val: str

    class Meta:
        namespace = "Messages.type.one"

@dataclass
class CoreMessage(AvroModel):
    messageBody: typing.Union[
        MessageTypeOne,
        MessageTypeTwo,
    ]

    class Meta:
        dacite_config = {
            "strict": True,
        }

mt2 = MessageTypeTwo("Hello World")
core_message = CoreMessage(messageBody=mt2)
serialized = core_message.serialize()
deserialized = CoreMessage.deserialize(serialized)
print(deserialized.messageBody)

Expected: MessageTypeTwo(val='Hello World')
Actual: MessageTypeOne(val='Hello World')

@marcosschroh
Copy link
Owner

Yes, it makes sense. It is impossible to determine which class should be created. Under the hood you get a json {"messageBody": {"val": "Hello World"}} and it is impossible to know which class to use. You must define a different strategy or include extra data to distinguish among the objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants