Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Log dependencies installed in submission environment #28564

Merged
merged 26 commits into from
Mar 7, 2024

Conversation

riteshghorse
Copy link
Contributor

@riteshghorse riteshghorse commented Sep 20, 2023

Saves the submission environment dependencies and stage it. Logs it along with the runtime dependencies.

image

Fixes #28563


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@codecov
Copy link

codecov bot commented Sep 20, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 38.47%. Comparing base (0bbf2c3) to head (80d6d18).
Report is 487 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #28564      +/-   ##
==========================================
+ Coverage   38.23%   38.47%   +0.24%     
==========================================
  Files         696      698       +2     
  Lines      101878   102520     +642     
==========================================
+ Hits        38952    39449     +497     
- Misses      61309    61439     +130     
- Partials     1617     1632      +15     
Flag Coverage Δ
go 54.33% <ø> (+0.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@riteshghorse
Copy link
Contributor Author

Run Python_Integration PreCommit

@riteshghorse
Copy link
Contributor Author

Some unit tests are failing because there is additional staging file now which will always be present. Got the solution. I'll update the PR. Defer review until then.

@tvalentyn tvalentyn marked this pull request as draft September 21, 2023 16:18
@riteshghorse riteshghorse marked this pull request as ready for review September 22, 2023 14:49
@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label python.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@jrmccluskey jrmccluskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mod a logging nit

@riteshghorse
Copy link
Contributor Author

riteshghorse commented Sep 27, 2023

R: @chamikaramj could you comment on the external transform environment. The external_transform environment tests would fail if there is an additional staging file by default.

It complains about no artifact service when it tries to resolve that artifact.

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control

@tvalentyn
Copy link
Contributor

t. The external_transform environment tests would fail if there is an additional staging file by default.

Can we stage job submission dependencies without including them in the runtime environment defintion?

@riteshghorse riteshghorse changed the title [Python] Log dependencies at runtime and at submission environment [Python] Log dependencies installed in submission environment Dec 20, 2023
@riteshghorse
Copy link
Contributor Author

riteshghorse commented Dec 20, 2023

R: @tvalentyn this is ready for review

Changes to note:

  1. Changed the artifact comparison logic to ignore the type payload field since that has unique hashes
  2. The external transform test failure was because of the ExpansionServiceServicer not having artifact service method. So added that. Confirmed this by running a multi-language pipeline successfully - Job Link
  3. The staging logic stays in stager.py since we ulitmately call create_job_resources from python_sdk_dependencies() which is invoked during environment creation.

@riteshghorse
Copy link
Contributor Author

R: @tvalentyn

Barring the lint failure, all tests pass.

Copy link
Contributor

@tvalentyn tvalentyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

tvalentyn

This comment was marked as duplicate.

tvalentyn

This comment was marked as duplicate.

tvalentyn

This comment was marked as duplicate.

@kennknowles
Copy link
Member

Is this ready to merge?

@tvalentyn
Copy link
Contributor

i left one comment, after that it should be ready to merge.

@riteshghorse
Copy link
Contributor Author

Done, I'll merge once the check passes

@riteshghorse riteshghorse merged commit 7497495 into apache:master Mar 7, 2024
76 checks passed
@riteshghorse riteshghorse deleted the sdeps branch March 7, 2024 21:22
hjtran pushed a commit to hjtran/beam that referenced this pull request Apr 4, 2024
…#28564)

* log runtime dependencies

* log submission env dependencies

* rm spare line

* update tests for staged files

* update equals test to ignore artifacts

* handle unit tests, refactor

* unit test sub env staging, convert to string

* change Log to Printf

* change log level to warning

* try env

* add artifact_service method

* correct urn

* fix artifact comparison, file reader

* add mock for python sdk dependencies and update artifact service method

* fix lint

* use magic mock instead of mocking entire function

* update dataflow runner test

* Update sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py

* use debug option to disable

* remove tmp directory mock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task]: Log dependencies at runtime and at submission environment for better debugging
5 participants