[ci] Introduce a test orchestrator written in bazel #23505

pamaury · 2024-06-05T13:33:12Z

Prototype of the test orchestrator as discussed in the software WG. The implementation follows what we have discussed: opentitan_test calls a ci_orchetrator function with the list of all execution environments. This function then decides which environment(s) to run in CI and returns this list. Then opentitan_test marks every test not in that list as skip_in_ci.

This PR also removes the old complicated logic in the azure pipeline and fpga runner script. This is done by moving the FPGA test job to its own template file to avoid duplication.

jwnrt

Where do you think would be the best place to document this?

I can imagine confusion if somebody adds a new test or execution environment and wonders why it's not running in CI

rules/opentitan/ci.bzl

jwnrt · 2024-06-05T17:20:18Z

How should the orchestrator interact with the broken tag? Should it look for this and try to run the next available environment that isn’t broken?

pamaury · 2024-06-05T20:18:44Z

How should the orchestrator interact with the broken tag? Should it look for this and try to run the next available environment that isn’t broken?

Yes that's my thinking (although it's not implemented yet). Otherwise it wiuld require a lot of manual fiddling for broken tests. Thanks for raising this point :)

engdoreis

LGTM, will approve when is marked as Ready to review.

rules/opentitan/ci.bzl

azure-pipelines.yml

pamaury · 2024-08-14T12:57:18Z

One of the issue I am running into is that it can happen that some jobs end up having no tests to run due to the orchestration. Unfortunately, bazel will error out if the test tag filters produces 0 tests and I don't think this behaviour can be turned off.

jwnrt · 2024-08-14T13:04:53Z

Is there a way to ask Bazel for a list of tests before running them? Then we can have a pre-step which exits early if there are none

pamaury · 2024-08-14T13:15:49Z

Is there a way to ask Bazel for a list of tests before running them? Then we can have a pre-step which exits early if there are none

We can of course replace the tag filters by a query to create a list and then run it. The downside is that the test tag filter syntax is not supported by the query command (as far as I know) so it would require some parsing which is not ideal.

I can see one possible alternative: currently, even with the CI orchestrator, we still split the tests accross different jobs which is error-prone. We could make the CI orchestration a two-step process:

first tag some test with skip_in_ci as we currently do, to avoid running tests that bring no value
then we tag each non-skipped test with a unique tag to identify on which FPGA job it should run.
This would effectively move the "job filtering" to the orchestrator which is easier to do in starlark.

The advantage would be that each CI job now becomes trivial to filter: all test with a specific tag, not skipped in ci and not broken. It would also ensure that we don't run duplicate tests accross (or forget some tests) jobs due to logic errors in the filtering.

jwnrt · 2024-08-14T13:21:27Z

That sounds good, but I had actually thought that was what your current implementation does. Each test is tagged by its execution environment right? And the Azure jobs run the tests with that exec env tag -broken,-skip_in_ci.

Wouldn’t this still have the same problem though? The job applies these filters but gets zero tests back and the job fails?

pamaury · 2024-08-14T13:23:26Z

Currently it is not quite the case that each job corresponds to an execution environment because we have manufacturing tests that run in their own job.

To make it clear what I am suggesting, there would be bazel function:

def ci_fpga_job(env,tags):
   """Return the FPGA job on which this tests should run, based on its the execution environment
   and its tags.
  """
   if "cw340_sival_rom_ext" in tag and not "manuf" in tags:
      return "fpga_job_blabla"
   # and so on

and in rules/defs.bzl we could call this function to add the tag to the list. Then the azure file simply specifies the (unique) tag of each fpga job. For example we have some FPGA job filters like cw310_rom_with_fake_keys,cw310_rom_with_real_keys,-manuf.

Or maybe we should just keep one job per exec_env and call it a day :)

nbdd0121 · 2024-08-14T13:33:54Z

What's the reason that we currently run manufacturing tests as separate jobs? Probably just because we want to split into more jobs for parallelism?

jwnrt · 2024-08-14T13:54:02Z

I think it might be because they have custom execution environments since they run at unusual OTP stages (test_unlocked, rma, etc.). If we mix them into other tests, we might be reprogramming bitstreams more times in the middle of the run which will slow things down slightly.

jwnrt

This looks really good bar the hyperdebug update problem and the broken tests. Very nice work.

jwnrt · 2024-08-28T14:43:26Z

ci/fpga-job.yml

+  # never detected. But if we run the tool another time, the device list is queried again
+  # and opentitantool can finish the update. The device will now reboot in normal mode
+  # and work for the hyperdebug job.
+  # FIXME: only execute for hyper310 interface


I wonder if opentitanlib should accept update-firmware for all platforms but make it a no-op if not hyperdebug? Erroring seems a bit extreme.

Even then, I'm not sure how to manually specify --interface hyperdebug_dfu for these...

If the update step is really quick at detecting when no update is required, maybe we can move the update step into the commands for the execution environments?

Yes we discussed that in a previous PR and decided to do it later... This would be very convenient for this PR!

I have now implemented the update step: it only runs for the hyper310 interface. I will let the CI run to make sure that it works.

sw/device/tests/BUILD

pamaury · 2024-08-29T14:21:30Z

@cfrantz I have marked the relevant rom_ext e2e tests are broken in this PR. I suggest we mark them as unbroken when we backport your fixes in #24430

The commit fundamentally changes how tests are considered for running in CI. Previously, any test not marked as "manual", "broken" and "skip_in_ci" would be run in CI in all execution environment for which it is defined. We then came up with some clever bazel queries to try and avoid running them in multiple execution environment when not necessary but this is quite fragile. The approach is very different: when creating the tests using opentitan_test, we call a new function, called "ci_orchestrator" with the list of all exec_env for which it is defined. This function then returns a subset of that list that should be marked with skip_in_ci. The current implementation is that there is a list of exec_env that are mutually-exclusive (_ONLY_RUN_ONE_IN_CI_SORTED) and all but one will be skipped: the only one to run will the highest one in the list. With this approach, we are guaranteed to never run duplicate tests. If more complicated logic is required, it can always be added to this function. As a byproduct of this, we can significantly simplify the filtering logic when running FPGS jobs. Signed-off-by: Amaury Pouly <[email protected]>

The --test_tag_filters option of `bazel test` is very convenient but is only available for testing and not for querying. This script takes as input a list of tags in the same fort as test_tag_filters and runs an equivalent bazel query to list all tests that match. Signed-off-by: Amaury Pouly <[email protected]>

azure-pipelines.yml

nbdd0121 · 2024-09-26T13:57:36Z

ci/fpga-job.yml

+      ci/scripts/run-bazel-test-query.sh \
+        "${{ parameters.target_pattern_file }}" \
+        "${{ parameters.tag_filters }}",-manual,-broken,-skip_in_ci \
+        //... @manufacturer_test_hooks//...
+      # Run FPGA tests.
+      if [ -s "${{ parameters.target_pattern_file }}" ]; then
+        ci/scripts/run-fpga-tests.sh "${{ parameters.interface }}" "${{ parameters.target_pattern_file }}" || { res=$?; echo "To reproduce failures locally, follow the instructions at https://opentitan.org/book/doc/getting_started/setup_fpga.html#reproducing-fpga-ci-failures-locally"; exit "${res}"; }
+      else
+        echo "No tests to run after filtering"
+      fi


Can this logic be part of run-fpga-tests.sh?

I would prefer not because this way, we have two scripts for two different purposes: run-bazel-test-query does the query, and run-fpga-tests run the tests. You can more easily repurpose the scripts this way. Or I would make it third script if you really want to (run-fpga-tests-query).

Signed-off-by: Amaury Pouly <[email protected]>

Those tests can run in the ROM_EXT stage which should be preferred. Signed-off-by: Amaury Pouly <[email protected]>

Previously we only tagged then with the exec_env name but for the CI it is convenient to have a global way of querying all FPGA tests. Signed-off-by: Amaury Pouly <[email protected]>

With the filter tags filtering logic, it is very easily to create a situation where some tests are run in several jobs, or to forget running tests because of a non-exhaustive set of filters. This commit adds a final CI job that checks the above condition. This is achieved by downaloding the list of tests run by every job and then making sure that there are no duplicates. We also check for missing tests by running a query on all non-broken, non-manual, non-skip_in_ci FPGA tests. Signed-off-by: Amaury Pouly <[email protected]>

Those tests require the host interface to support serial break but the CW310 used in the CI do not support it so they always fail. These tests will need to be changed to the hyperdebug interface. Signed-off-by: Amaury Pouly <[email protected]>

The test rv_dm_mem_access_dev_rv_dm_delayed_enabled is broken. Signed-off-by: Amaury Pouly <[email protected]>

github-actions · 2024-10-24T10:41:55Z

Successfully created backport PR for earlgrey_1.0.0:

Cherry-pick to earlgrey_1.0.0: [ci] Introduce a test orchestrator written in bazel #24868

pamaury requested review from cfrantz, nbdd0121, moidx and jwnrt June 5, 2024 13:34

jwnrt reviewed Jun 5, 2024

View reviewed changes

rules/opentitan/ci.bzl Show resolved Hide resolved

pamaury force-pushed the ci_orchestrator branch from 32f6857 to 64a53d6 Compare June 5, 2024 21:40

engdoreis reviewed Jun 6, 2024

View reviewed changes

rules/opentitan/ci.bzl Outdated Show resolved Hide resolved

pamaury force-pushed the ci_orchestrator branch 8 times, most recently from 96a67c1 to 68c086b Compare June 7, 2024 08:27

nbdd0121 reviewed Jun 7, 2024

View reviewed changes

rules/opentitan/ci.bzl Outdated Show resolved Hide resolved

azure-pipelines.yml Show resolved Hide resolved

pamaury force-pushed the ci_orchestrator branch 2 times, most recently from 8ca0bf3 to b0b60ab Compare August 14, 2024 12:53

pamaury force-pushed the ci_orchestrator branch 3 times, most recently from 1b91773 to 5265492 Compare August 20, 2024 13:20

pamaury force-pushed the ci_orchestrator branch 4 times, most recently from 036d729 to 1f4b9ba Compare August 27, 2024 13:13

jwnrt approved these changes Aug 28, 2024

View reviewed changes

nbdd0121 reviewed Aug 29, 2024

View reviewed changes

sw/device/tests/BUILD Outdated Show resolved Hide resolved

pamaury force-pushed the ci_orchestrator branch from 5cfc401 to cd585a2 Compare August 29, 2024 14:12

pamaury added 2 commits August 29, 2024 16:28

pamaury force-pushed the ci_orchestrator branch 3 times, most recently from 1042296 to 394531d Compare August 30, 2024 17:58

rswarbrick removed their request for review August 31, 2024 08:26

pamaury force-pushed the ci_orchestrator branch from 394531d to fe8d7ac Compare September 18, 2024 14:40

pamaury changed the title ~~[WIP][ci] Introduce a test orchestrator written in bazel~~ [ci] Introduce a test orchestrator written in bazel Sep 26, 2024

nbdd0121 reviewed Sep 26, 2024

View reviewed changes

pamaury force-pushed the ci_orchestrator branch from fe8d7ac to 1122705 Compare September 30, 2024 12:16

pamaury added 6 commits September 30, 2024 14:18

[ci] Reorganize FPGA tests and put common fpga job code in a template

072fd99

Signed-off-by: Amaury Pouly <[email protected]>

[sysrst_ctrl,sival] Move tests to FPGA SiVal ROM_EXT

8e828bf

Those tests can run in the ROM_EXT stage which should be preferred. Signed-off-by: Amaury Pouly <[email protected]>

[bazel] Automatically tag all FPGA tests with 'fpga'

046356b

Previously we only tagged then with the exec_env name but for the CI it is convenient to have a global way of querying all FPGA tests. Signed-off-by: Amaury Pouly <[email protected]>

[rom_ext,e2e] Mark some tests are broken

8a211ea

Those tests require the host interface to support serial break but the CW310 used in the CI do not support it so they always fail. These tests will need to be changed to the hyperdebug interface. Signed-off-by: Amaury Pouly <[email protected]>

[tests] Mark rv_dm_mem_access_rv_dm_delayed_enable as broken

1c44deb

The test rv_dm_mem_access_dev_rv_dm_delayed_enabled is broken. Signed-off-by: Amaury Pouly <[email protected]>

pamaury force-pushed the ci_orchestrator branch from 1122705 to 1c44deb Compare September 30, 2024 12:18

nbdd0121 approved these changes Sep 30, 2024

View reviewed changes

pamaury merged commit c5787fb into lowRISC:master Oct 1, 2024
43 checks passed

jwnrt added the CherryPick:earlgrey_1.0.0 This PR should be cherry-picked to earlgrey_1.0.0 label Oct 24, 2024

github-actions bot mentioned this pull request Oct 24, 2024

Cherry-pick to earlgrey_1.0.0: [ci] Introduce a test orchestrator written in bazel #24868

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ci] Introduce a test orchestrator written in bazel #23505

[ci] Introduce a test orchestrator written in bazel #23505

pamaury commented Jun 5, 2024 •

edited

Loading

jwnrt left a comment

jwnrt commented Jun 5, 2024

pamaury commented Jun 5, 2024 •

edited

Loading

engdoreis left a comment

pamaury commented Aug 14, 2024

jwnrt commented Aug 14, 2024

pamaury commented Aug 14, 2024 •

edited

Loading

jwnrt commented Aug 14, 2024

pamaury commented Aug 14, 2024 •

edited

Loading

nbdd0121 commented Aug 14, 2024

jwnrt commented Aug 14, 2024

jwnrt left a comment

jwnrt Aug 28, 2024

pamaury Aug 28, 2024

pamaury Sep 18, 2024

pamaury commented Aug 29, 2024

nbdd0121 Sep 26, 2024

pamaury Sep 26, 2024

github-actions bot commented Oct 24, 2024

[ci] Introduce a test orchestrator written in bazel #23505

[ci] Introduce a test orchestrator written in bazel #23505

Conversation

pamaury commented Jun 5, 2024 • edited Loading

jwnrt left a comment

Choose a reason for hiding this comment

jwnrt commented Jun 5, 2024

pamaury commented Jun 5, 2024 • edited Loading

engdoreis left a comment

Choose a reason for hiding this comment

pamaury commented Aug 14, 2024

jwnrt commented Aug 14, 2024

pamaury commented Aug 14, 2024 • edited Loading

jwnrt commented Aug 14, 2024

pamaury commented Aug 14, 2024 • edited Loading

nbdd0121 commented Aug 14, 2024

jwnrt commented Aug 14, 2024

jwnrt left a comment

Choose a reason for hiding this comment

jwnrt Aug 28, 2024

Choose a reason for hiding this comment

pamaury Aug 28, 2024

Choose a reason for hiding this comment

pamaury Sep 18, 2024

Choose a reason for hiding this comment

pamaury commented Aug 29, 2024

nbdd0121 Sep 26, 2024

Choose a reason for hiding this comment

pamaury Sep 26, 2024

Choose a reason for hiding this comment

github-actions bot commented Oct 24, 2024

pamaury commented Jun 5, 2024 •

edited

Loading

pamaury commented Jun 5, 2024 •

edited

Loading

pamaury commented Aug 14, 2024 •

edited

Loading

pamaury commented Aug 14, 2024 •

edited

Loading