Fix nested union issue in avro schema conversion #119
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The conversion of the union type between Beam schema and avro schema, e.g.:
If we first convert it to Beam schema, and then back to avro, it will cause the following exception:
Caused by: org.apache.avro.AvroRuntimeException: Nested union
This is due to the following code path:
union
avro type, when each field in the union is actually a real type (compared to [null, string] union), Beam will create aOneOfType
type, and the sub types in theOneOfType
will be the list of union inner types. Note during this conversion, Beam also marks each inner type to be nullable, since this is a union.OneOfType
, since each of the sub type is nullable, Beam will create a inner avro union type of[null, subtype]
. Then the outer type will be a union again. Here is why Avro throws nested union exception.The fix is relatively simple: we remove the nullable flag from the sub types, since it's generated by Beam in the beginning, not from the original avro schema. After that, the avro union schema is created correctly. It's the same to the original schema.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.