Skip to content

Commit

Permalink
Add a subsuite of unit tests for ML package and remove them from 'cov…
Browse files Browse the repository at this point in the history
…erage' GHA workflow. (#31255)

* Add a subsuite of unit tests for ML package

* Move ML compatibility tests to postcommits.

* Exclude ml tests requiring extra deps from coverage for now.

* Don't specify an unnecessary config given that tox task and gradle config already defines the Python version.
  • Loading branch information
tvalentyn authored May 13, 2024
1 parent 296c13a commit c88a863
Show file tree
Hide file tree
Showing 15 changed files with 165 additions and 22 deletions.
1 change: 1 addition & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,6 +261,7 @@ PreCommit Jobs run in a schedule and also get triggered in a PR if relevant sour
| [ PreCommit Python Formatter ](https://github.com/apache/beam/actions/workflows/beam_PreCommit_PythonFormatter.yml) | N/A | `Run PythonFormatter PreCommit`| [![.github/workflows/beam_PreCommit_PythonFormatter.yml](https://github.com/apache/beam/actions/workflows/beam_PreCommit_PythonFormatter.yml/badge.svg?event=schedule)](https://github.com/apache/beam/actions/workflows/beam_PreCommit_PythonFormatter.yml?query=event%3Aschedule) |
| [ PreCommit Python Integration](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Integration.yml) | ['3.8','3.11'] | `Run Python_Integration PreCommit (matrix_element)` | [![.github/workflows/beam_PreCommit_Python_Integration.yml](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Integration.yml/badge.svg?event=schedule)](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Integration.yml?query=event%3Aschedule) |
| [ PreCommit Python Lint ](https://github.com/apache/beam/actions/workflows/beam_PreCommit_PythonLint.yml) | N/A | `Run PythonLint PreCommit` | [![.github/workflows/beam_PreCommit_PythonLint.yml](https://github.com/apache/beam/actions/workflows/beam_PreCommit_PythonLint.yml/badge.svg?event=schedule)](https://github.com/apache/beam/actions/workflows/beam_PreCommit_PythonLint.yml?query=event%3Aschedule) |
| [ PreCommit Python ML ](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_ML.yml) | ['3.8','3.9','3.10','3.11'] | `Run Python_ML PreCommit (matrix_element)`| [![.github/workflows/beam_PreCommit_Python_ML.yml](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_ML.yml/badge.svg?event=schedule)](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_ML.yml?query=event%3Aschedule) |
| [ PreCommit Python PVR Flink ](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_PVR_Flink.yml) | N/A | `Run Python_PVR_Flink PreCommit` | [![.github/workflows/beam_PreCommit_Python_PVR_Flink.yml](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_PVR_Flink.yml/badge.svg?event=schedule)](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_PVR_Flink.yml?query=event%3Aschedule) |
| [ PreCommit Python Runners ](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Runners.yml) | ['3.8','3.9','3.10','3.11'] | `Run Python_Runners PreCommit (matrix_element)`| [![.github/workflows/beam_PreCommit_Python_Runners.yml](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Runners.yml/badge.svg?event=schedule)](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Runners.yml?query=event%3Aschedule) |
| [ PreCommit Python Transforms ](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Transforms.yml) | ['3.8','3.9','3.10','3.11'] | `Run Python_Transforms PreCommit (matrix_element)`| [![.github/workflows/beam_PreCommit_Python_Transforms.yml](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Transforms.yml/badge.svg?event=schedule)](https://github.com/apache/beam/actions/workflows/beam_PreCommit_Python_Transforms.yml?query=event%3Aschedule) |
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/beam_PreCommit_Python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,7 @@ jobs:
with:
gradle-command: :sdks:python:test-suites:tox:py${{steps.set_py_ver_clean.outputs.py_ver_clean}}:preCommitPy${{steps.set_py_ver_clean.outputs.py_ver_clean}}
arguments: |
-Pposargs="--ignore=apache_beam/dataframe/ --ignore=apache_beam/examples/ --ignore=apache_beam/runners/ --ignore=apache_beam/transforms/" \
-PpythonVersion=${{ matrix.python_version }}
-Pposargs="--ignore=apache_beam/dataframe/ --ignore=apache_beam/ml/ --ignore=apache_beam/examples/ --ignore=apache_beam/runners/ --ignore=apache_beam/transforms/"
- name: Archive Python Test Results
uses: actions/upload-artifact@v4
if: failure()
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/beam_PreCommit_Python_Dataframes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,7 @@ jobs:
with:
gradle-command: :sdks:python:test-suites:tox:py${{steps.set_py_ver_clean.outputs.py_ver_clean}}:preCommitPy${{steps.set_py_ver_clean.outputs.py_ver_clean}}
arguments: |
-Pposargs=apache_beam/dataframe/ \
-PpythonVersion=${{ matrix.python_version }}
-Pposargs=apache_beam/dataframe/
- name: Archive Python Test Results
uses: actions/upload-artifact@v4
if: failure()
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/beam_PreCommit_Python_Examples.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,7 @@ jobs:
with:
gradle-command: :sdks:python:test-suites:tox:py${{steps.set_py_ver_clean.outputs.py_ver_clean}}:preCommitPy${{steps.set_py_ver_clean.outputs.py_ver_clean}}
arguments: |
-Pposargs=apache_beam/examples/ \
-PpythonVersion=${{ matrix.python_version }}
-Pposargs=apache_beam/examples/
- name: Archive Python Test Results
uses: actions/upload-artifact@v4
if: failure()
Expand Down
111 changes: 111 additions & 0 deletions .github/workflows/beam_PreCommit_Python_ML.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: PreCommit Python ML tests with ML deps installed
on:
pull_request_target:
branches: [ "master", "release-*" ]
paths: [ "model/**","sdks/python/**","release/**", 'release/trigger_all_tests.json', '.github/trigger_files/beam_PreCommit_Python_ML.json']
issue_comment:
types: [created]
push:
tags: ['v*']
branches: ['master', 'release-*']
paths: [ "model/**","sdks/python/**","release/**",".github/workflows/beam_PreCommit_Python_ML.yml"]
schedule:
- cron: '45 2/6 * * *'
workflow_dispatch:

#Setting explicit permissions for the action to avoid the default permissions which are `write-all` in case of pull_request_target event
permissions:
actions: write
pull-requests: write
checks: write
contents: read
deployments: read
id-token: none
issues: write
discussions: read
packages: read
pages: read
repository-projects: read
security-events: read
statuses: read

# This allows a subsequently queued workflow run to interrupt previous runs
concurrency:
group: '${{ github.workflow }} @ ${{ github.event.issue.number || github.event.pull_request.head.label || github.sha || github.head_ref || github.ref }}-${{ github.event.schedule || github.event.comment.id || github.event.sender.login }}'
cancel-in-progress: true

env:
GRADLE_ENTERPRISE_ACCESS_KEY: ${{ secrets.GE_ACCESS_TOKEN }}
GRADLE_ENTERPRISE_CACHE_USERNAME: ${{ secrets.GE_CACHE_USERNAME }}
GRADLE_ENTERPRISE_CACHE_PASSWORD: ${{ secrets.GE_CACHE_PASSWORD }}

jobs:
beam_PreCommit_Python_ML:
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }} ${{ matrix.python_version }})
runs-on: [self-hosted, ubuntu-20.04, main]
timeout-minutes: 180
strategy:
fail-fast: false
matrix:
job_name: ['beam_PreCommit_Python_ML']
job_phrase: ['Run Python_ML PreCommit']
python_version: ['3.8','3.9','3.10','3.11']
if: |
github.event_name == 'push' ||
github.event_name == 'pull_request_target' ||
(github.event_name == 'schedule' && github.repository == 'apache/beam') ||
github.event_name == 'workflow_dispatch' ||
startsWith(github.event.comment.body, 'Run Python_ML PreCommit')
steps:
- uses: actions/checkout@v4
- name: Setup repository
uses: ./.github/actions/setup-action
with:
comment_phrase: ${{ matrix.job_phrase }} ${{ matrix.python_version }}
github_token: ${{ secrets.GITHUB_TOKEN }}
github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }} ${{ matrix.python_version }})
- name: Setup environment
uses: ./.github/actions/setup-environment-action
with:
java-version: default
python-version: ${{ matrix.python_version }}
- name: Set PY_VER_CLEAN
id: set_py_ver_clean
run: |
PY_VER=${{ matrix.python_version }}
PY_VER_CLEAN=${PY_VER//.}
echo "py_ver_clean=$PY_VER_CLEAN" >> $GITHUB_OUTPUT
- name: Run pythonPreCommit
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:python:test-suites:tox:py${{steps.set_py_ver_clean.outputs.py_ver_clean}}:testPy${{steps.set_py_ver_clean.outputs.py_ver_clean}}ML
arguments: |
-Pposargs=apache_beam/ml/
- name: Archive Python Test Results
uses: actions/upload-artifact@v4
if: failure()
with:
name: Python ${{ matrix.python_version }} Test Results
path: '**/pytest*.xml'
- name: Publish Python Test Results
uses: EnricoMi/publish-unit-test-result-action@v2
if: always()
with:
commit: '${{ env.prsha || env.GITHUB_SHA }}'
comment_mode: ${{ github.event_name == 'issue_comment' && 'always' || 'off' }}
files: '**/pytest*.xml'
3 changes: 1 addition & 2 deletions .github/workflows/beam_PreCommit_Python_Runners.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,7 @@ jobs:
with:
gradle-command: :sdks:python:test-suites:tox:py${{steps.set_py_ver_clean.outputs.py_ver_clean}}:preCommitPy${{steps.set_py_ver_clean.outputs.py_ver_clean}}
arguments: |
-Pposargs=apache_beam/runners/ \
-PpythonVersion=${{ matrix.python_version }}
-Pposargs=apache_beam/runners/
- name: Archive Python Test Results
uses: actions/upload-artifact@v4
if: failure()
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/beam_PreCommit_Python_Transforms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -95,8 +95,7 @@ jobs:
with:
gradle-command: :sdks:python:test-suites:tox:py${{steps.set_py_ver_clean.outputs.py_ver_clean}}:preCommitPy${{steps.set_py_ver_clean.outputs.py_ver_clean}}
arguments: |
-Pposargs=apache_beam/transforms/ \
-PpythonVersion=${{ matrix.python_version }}
-Pposargs=apache_beam/transforms/
- name: Archive Python Test Results
uses: actions/upload-artifact@v4
if: failure()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -649,7 +649,7 @@ def __init__(
self._inference_fn = inference_fn
self._load_pipeline_args = load_pipeline_args if load_pipeline_args else {}
self._batching_kwargs = {}
self._framework = "torch"
self._framework = "pt"
self._env_vars = kwargs.get('env_vars', {})
if min_batch_size is not None:
self._batching_kwargs['min_batch_size'] = min_batch_size
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def test_framework_detection_torch(self):
inference_fn=fake_inference_fn_tensor)
batched_examples = [torch.tensor(1), torch.tensor(10), torch.tensor(100)]
inference_runner.run_inference(batched_examples, fake_model)
self.assertEqual(inference_runner._framework, "torch")
self.assertEqual(inference_runner._framework, "pt")

def test_framework_detection_tensorflow(self):
fake_model = FakeTFTensorModel()
Expand Down
4 changes: 4 additions & 0 deletions sdks/python/apache_beam/ml/inference/onnx_inference_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@
from apache_beam.testing.util import assert_that
from apache_beam.testing.util import equal_to

if bool(1): # lint doesn't like an unconditional `raise`.
raise unittest.SkipTest(
'TODO: fix https://github.com/apache/beam/issues/31254')

# Protect against environments where onnx and pytorch library is not available.
# pylint: disable=wrong-import-order, wrong-import-position, ungrouped-imports
try:
Expand Down
19 changes: 19 additions & 0 deletions sdks/python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -486,6 +486,25 @@ def get_portability_package_data():
# urllib 2.x is a breaking change for the headless chrome tests
'urllib3<2,>=1.21.1'
],
# Optional dependencies to unit-test ML functionality.
# We don't expect users to install this extra. Users should install
# necessary dependencies individually, or we should create targeted
# extras. Keeping the bounds open as much as possible so that we
# can find out early when using Beam with new versions doesn't work.
'ml_test': [
'datatable',
'embeddings',
'onnxruntime',
'sentence-transformers',
'skl2onnx',
'tensorflow',
'tensorflow-hub',
'tensorflow_transform',
'tf2onnx',
'torch',
'transformers',
'xgboost<2.0', # https://github.com/apache/beam/issues/31252
],
'aws': ['boto3>=1.9,<2'],
'azure': [
'azure-storage-blob>=12.3.2,<13',
Expand Down
3 changes: 3 additions & 0 deletions sdks/python/test-suites/tox/common.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ test.dependsOn "testPython${pythonVersionSuffix}"
toxTask "testPy${pythonVersionSuffix}Cloud", "py${pythonVersionSuffix}-cloud", "${posargs}"
test.dependsOn "testPy${pythonVersionSuffix}Cloud"

toxTask "testPy${pythonVersionSuffix}ML", "py${pythonVersionSuffix}-ml", "${posargs}"
test.dependsOn "testPy${pythonVersionSuffix}ML"

// toxTask "testPy${pythonVersionSuffix}Dask", "py${pythonVersionSuffix}-dask", "${posargs}"
// test.dependsOn "testPy${pythonVersionSuffix}Dask"

Expand Down
18 changes: 10 additions & 8 deletions sdks/python/test-suites/tox/py38/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ toxTask "testPy38pandas-20", "py38-pandas-20", "${posargs}"
test.dependsOn "testPy38pandas-20"
postCommitPyDep.dependsOn "testPy38pandas-20"


// TODO(https://github.com/apache/beam/issues/30908): Revise what are we testing

// Create a test task for each minor version of pytorch
toxTask "testPy38pytorch-19", "py38-pytorch-19", "${posargs}"
test.dependsOn "testPy38pytorch-19"
Expand All @@ -110,21 +113,22 @@ postCommitPyDep.dependsOn "testPy38pytorch-113"
// run on precommit
toxTask "testPy38pytorch-200", "py38-pytorch-200", "${posargs}"
test.dependsOn "testPy38pytorch-200"
preCommitPyCoverage.dependsOn "testPy38pytorch-200"
postCommitPyDep.dependsOn "testPy38pytorch-200"

toxTask "testPy38tft-113", "py38-tft-113", "${posargs}"
test.dependsOn "testPy38tft-113"
preCommitPyCoverage.dependsOn "testPy38tft-113"
postCommitPyDep.dependsOn "testPy38tft-113"

// TODO(https://github.com/apache/beam/issues/25796) - uncomment onnx tox task once onnx supports protobuf 4.x.x
// Create a test task for each minor version of onnx
// toxTask "testPy38onnx-113", "py38-onnx-113", "${posargs}"
// test.dependsOn "testPy38onnx-113"
// postCommitPyDep.dependsOn "testPy38onnx-113"

// Create a test task for each minor version of tensorflow
toxTask "testPy38tensorflow-212", "py38-tensorflow-212", "${posargs}"
test.dependsOn "testPy38tensorflow-212"
preCommitPyCoverage.dependsOn "testPy38tensorflow-212"
postCommitPyDep.dependsOn "testPy38tensorflow-212"

// Create a test task for each minor version of transformers
toxTask "testPy38transformers-428", "py38-transformers-428", "${posargs}"
Expand All @@ -135,25 +139,23 @@ toxTask "testPy38transformers-429", "py38-transformers-429", "${posargs}"
test.dependsOn "testPy38transformers-429"
postCommitPyDep.dependsOn "testPy38transformers-429"

// run on precommit
toxTask "testPy38transformers-430", "py38-transformers-430", "${posargs}"
test.dependsOn "testPy38transformers-430"
preCommitPyCoverage.dependsOn "testPy38transformers-430"
postCommitPyDep.dependsOn "testPy38transformers-430"

toxTask "testPy38embeddingsMLTransform", "py38-embeddings", "${posargs}"
test.dependsOn "testPy38embeddingsMLTransform"
preCommitPyCoverage.dependsOn "testPy38embeddingsMLTransform"
postCommitPyDep.dependsOn "testPy38embeddingsMLTransform"

// Part of MLTransform embeddings test suite but requires tensorflow hub, which we need to test on
// mutliple versions so keeping this suite separate.
toxTask "testPy38TensorflowHubEmbeddings-014", "py38-TFHubEmbeddings-014", "${posargs}"
test.dependsOn "testPy38TensorflowHubEmbeddings-014"
postCommitPyDep.dependsOn "testPy38TensorflowHubEmbeddings-014"

// run on precommit
toxTask "testPy38TensorflowHubEmbeddings-015", "py38-TFHubEmbeddings-015", "${posargs}"
test.dependsOn "testPy38TensorflowHubEmbeddings-015"
preCommitPyCoverage.dependsOn "testPy38TensorflowHubEmbeddings-015"
postCommitPyDep.dependsOn "testPy38TensorflowHubEmbeddings-015"

toxTask "whitespacelint", "whitespacelint", "${posargs}"

Expand Down
4 changes: 2 additions & 2 deletions sdks/python/test-suites/tox/pycommon/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@
*/

/**
* Unit tests for commont Python components.
* Python gradle tasks that run using a default, typically lowest supported,
* Python version.
*/

// TODO(https://github.com/apache/beam/issues/20209): See if we can avoid hardcoding python version here.
plugins { id 'org.apache.beam.module' }
applyPythonNature()

Expand Down
8 changes: 8 additions & 0 deletions sdks/python/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,13 @@ commands =
python apache_beam/examples/complete/autocomplete_test.py
bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}"

[testenv:py{38,39,310,311}-ml]
# Don't set TMPDIR to avoid "AF_UNIX path too long" errors in certain tests.
setenv =
extras = test,gcp,dataframe,ml_test
commands =
bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}"

[testenv:py{38,39,310,311}-dask]
extras = test,dask
commands =
Expand All @@ -99,6 +106,7 @@ setenv =
PYTHONPATH = {toxinidir}
platform = linux
passenv = GIT_*,BUILD_*,ghprb*,CHANGE_ID,BRANCH_NAME,JENKINS_*,CODECOV_*,GITHUB_*
# NOTE: we could add ml_test to increase the collected code coverage metrics, but it would make the suite slower.
extras = test,gcp,interactive,dataframe,aws
commands =
bash {toxinidir}/scripts/run_pytest.sh {envname} "{posargs}" "--cov-report=xml --cov=. --cov-append"
Expand Down

0 comments on commit c88a863

Please sign in to comment.