Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JobEvent with outputs populated fails to write with nullPointerException #2925

Open
seanmullane opened this issue Oct 15, 2024 · 2 comments · May be fixed by #2974
Open

JobEvent with outputs populated fails to write with nullPointerException #2925

seanmullane opened this issue Oct 15, 2024 · 2 comments · May be fixed by #2974
Assignees
Labels
bug Something isn't working
Milestone

Comments

@seanmullane
Copy link

Emitting a JobEvent with input and/or output datasets causes a HTTP500 error in the API, which results from a nullPointerException in Marquez.

Fixing this is important to allow static lineage graphs to be able to be generated without being associated with active runs. This is useful in cases where an integration is not yet available to consume pipeline runs for a given system or where a pipeline is not yet fleshed out but we want to enter the job in Marquez to see how it would relate to other jobs.

The attached code includes a purely json version generated the OpenLineage client which can prompt the bug in Marquez. I also included the python code the json derives from and the Marquez error log.

Environment:

Marquez 0.49.0 running via docker-compose per the Marquez example with --seed
openlineage-python 1.22.0
python 3.11.9

nullPointerException.txt
reproduce_bug.zip

More detail on this from phix on Slack:

It looks like we’re not processing the “outputFacets” on the IO fields without a runId provided. The event should save if you drop that field that’s the empty object for now… We should take a look at the OL spec for this

Copy link

boring-cyborg bot commented Oct 15, 2024

Thanks for opening your first issue in the Marquez project! Please be sure to follow the issue template!

@wslulciuc wslulciuc added the bug Something isn't working label Oct 23, 2024
@wslulciuc wslulciuc added this to the 0.51.0 milestone Oct 23, 2024
@davidsharp7 davidsharp7 self-assigned this Nov 2, 2024
@davidsharp7
Copy link
Member

davidsharp7 commented Nov 8, 2024

Looks like currently in the DatasetFacetsDao.java

  default void insertDatasetFacetsFor(
      @NonNull UUID datasetUuid,
      @NonNull UUID datasetVersionUuid,
      @Nullable UUID runUuid,
      @NonNull Instant lineageEventTime,
      @Nullable String lineageEventType,
      @NonNull LineageEvent.DatasetFacets datasetFacets) {

allows runid and lineageEventType to be null. Simplest solution would be to do the same for

insertInputDatasetFacetsFor
insertOutputDatasetFacetsFor

@davidsharp7 davidsharp7 linked a pull request Nov 10, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants