enable ASAN/UBSAN in pandas CI #55102

WillAyd · 2023-09-11T21:11:41Z

WillAyd · 2023-09-11T21:12:23Z

.github/actions/build_pandas/action.yml

@@ -25,8 +25,8 @@ runs:
    - name: Build Pandas
      run: |
        if [[ ${{ inputs.editable }} == "true" ]]; then
-          pip install -e . --no-build-isolation -v
+          pip install -e . --no-build-isolation -v  --config-settings=setup-args="-Db_sanitize=address,undefined"


Probably don't want to hard code this - is there a way with GHA to only do this for certain action invocations @lithomas1 ?

You can define an input above in the workflow and pass a variable from the job when you want to enable these flags

I can push to your branch if you need any help with this, but it should be as Matt stated.

Feel free to push

WillAyd · 2023-09-11T21:14:27Z

.github/workflows/unit-tests.yml

@@ -157,19 +157,25 @@ jobs:
    - name: Build Pandas
      id: build
      uses: ./.github/actions/build_pandas
+      env:
+        CFLAGS: "$CFLAGS -fno-sanitize-recover=all"


Meson has the config option for b_sanitize=address,undefined but I don't think it set this. UBSAN has non-fatal errors, so without this things just get printed to stderr and pytest still continues.

Looks like NumPy does something similar with halt_on_error=1, but that didn't seem to stop pytest from continuing as I tried this locally

https://github.com/numpy/numpy/pull/24208/files

WillAyd · 2023-09-12T00:27:53Z

.github/workflows/unit-tests.yml

@@ -154,22 +154,36 @@ jobs:
      with:
        environment-file: ci/deps/${{ matrix.env_file }}

+    - name: Set sanitizer flags
+      run: |
+        echo "CFLAGS=$CFLAGS -fno-sanitize-recover=all" >> "$GITHUB_ENV"


Having trouble setting this with GHA. There may also be a way to pass the flag directly through meson python? @lithomas1 any idea?

lithomas1 · 2023-09-12T00:43:40Z

.github/workflows/unit-tests.yml

    - name: Build Pandas
      id: build
      uses: ./.github/actions/build_pandas
+      env:
+        CFLAGS: "$CFLAGS"


IIRC, this doesn't set CFLAGS inside of the action. It just makes CFLAGS available under env.CFLAGS .

Why don't you try setting CFLAGs in action.yml directly if sanitize = true?

Ah great idea - much cleaner

WillAyd · 2023-09-12T03:42:28Z

Somewhat working now. Guessing we need a way for pytest-xdist to fail and signal the test it failed on. Right now looks like a worker just crashed

rgommers · 2023-09-12T08:42:53Z

This looks pretty good to me. Let me Cc @ngoldbaum, who implemented the NumPy CI job and has more experience with these sanitizers than I have.

ngoldbaum · 2023-09-12T15:27:28Z

Guessing we need a way for pytest-xdist to fail and signal the test it failed on.

Ah that's probably why the halt_on_error=1 didn't work, I wasn't running with pytest-xdist locally when I was setting up the numpy config.

mroeschke · 2023-09-12T15:28:15Z

Somewhat working now. Guessing we need a way for pytest-xdist to fail and signal the test it failed on. Right now looks like a worker just crashed

IMO I would add 1 new testing job in the matrix (e.g. that uses a 3.11 dependency file) that runs the tests with sanitize=True and python xdist with -n 0

WillAyd · 2023-09-12T19:04:25Z

In that case are you still planning to run against the entire test base or a subset of modules? I think removing multiple workers would slow down our CI a good deal? But maybe this gets to a state where it only runs when C/Cython files are touched?

WillAyd · 2023-09-12T19:22:29Z

Worth noting I tried -n 0 locally a few times and it didn't make a difference. Not sure if the mere installation of pytest-xdist changes that. Needs further investigation

lithomas1 · 2023-09-12T20:16:33Z

In that case are you still planning to run against the entire test base or a subset of modules? I think removing multiple workers would slow down our CI a good deal? But maybe this gets to a state where it only runs when C/Cython files are touched?

We should do a minimal run, kind of like the npdev situation. So only us, numpy, and arrow installed.
Maybe that helps with the runtime?

mroeschke · 2023-09-12T21:49:20Z

Since the GHA Ubuntu runners only have 2 cores, I think running the entire test suite (even with all the dependencies) with -n 0 will be that significant, thought I don't know the impact that will have on the debugger.

FWIW that's how the Windows tests run currently and they take 15ish minutes longer than the non xdist runs

WillAyd · 2023-09-12T21:59:05Z

thought I don't know the impact that will have on the debugger.

In theory the average runtime of ASAN would be 2x (see https://github.com/google/sanitizers/wiki/AddressSanitizer), though since we are not detecting leaks but also adding UBSAN I'm not sure how that all evens out

WillAyd · 2023-09-14T18:24:21Z

A lot of the datetime stuff in this PR is hacked together just to appease UBSAN, but there are definitely quite a few code paths where datetime conversions can lead to undefined behavior.

The current ASAN failure looks like it comes from matplotlib, so @lithomas1 is probably right in that we need to pare this down to a smaller set of packages that we know can be clean

mroeschke · 2023-12-19T20:10:40Z

Gotcha. Or rather, if we introduce undefined behavior and address violations, this job will hopefully fail correct? Just want to ensure there's a definitive job failure -> rectification -> job success path for this job

WillAyd · 2023-12-19T20:16:32Z

Yes exactly - this will fail when either of those are detected

WillAyd · 2023-12-19T20:17:28Z

The error messaging you see in CI is something that could be improved. It just "fails" right now but that feedback gets lost along the way from the crashed process. I think that can be tackled in a follow up

WillAyd · 2023-12-19T20:21:30Z

This is what happens today if either of these pops up:

https://github.com/pandas-dev/pandas/actions/runs/7066657149/job/19241456914#step:8:61

mroeschke · 2023-12-19T20:25:55Z

This is what happens today if either of these pops up:

https://github.com/pandas-dev/pandas/actions/runs/7066657149/job/19241456914#step:8:61

Ah OK. Could you at least include a test_args: "-v" in the job configuration in unit-test.yml? At least then the last test should be printed before the job fails so when it fails we don't have to rerun to figure out where this fails

WillAyd · 2023-12-19T21:21:52Z

OK sure. Here is what that looks like:

https://github.com/pandas-dev/pandas/actions/runs/7267349987/job/19800967532?pr=55102#step:8:18053

Ends up being too much for GHA to show but if you go to the raw logs you will see the error:

2023-12-19T21:11:17.6390164Z ../../pandas/_libs/src/vendored/ujson/python/objToJSON.c:2066:3: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
2023-12-19T21:12:44.8157270Z pandas/tests/io/test_common.py::TestCommonIOCapabilities::test_write_missing_parent_directory[to_json-os-OSError-json] 
2023-12-19T21:12:44.8158122Z [gw3] node down: Not properly terminated
2023-12-19T21:12:44.8158390Z 
2023-12-19T21:12:44.8158505Z replacing crashed worker gw3
2023-12-19T21:12:44.8158882Z INTERNALERROR> Traceback (most recent call last):
2023-12-19T21:12:44.8159868Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/_pytest/main.py", line 271, in wrap_session
2023-12-19T21:12:44.8160796Z INTERNALERROR>     session.exitstatus = doit(config, session) or 0
2023-12-19T21:12:44.8161413Z INTERNALERROR>                          ^^^^^^^^^^^^^^^^^^^^^
2023-12-19T21:12:44.8162455Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/_pytest/main.py", line 325, in _main
2023-12-19T21:12:44.8163512Z INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
2023-12-19T21:12:44.8164538Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/pluggy/_hooks.py", line 493, in __call__
2023-12-19T21:12:44.8165536Z INTERNALERROR>     return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
2023-12-19T21:12:44.8166476Z INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-19T21:12:44.8167481Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/pluggy/_manager.py", line 115, in _hookexec
2023-12-19T21:12:44.8168482Z INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
2023-12-19T21:12:44.8169138Z INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-19T21:12:44.8170131Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/pluggy/_callers.py", line 152, in _multicall
2023-12-19T21:12:44.8170977Z INTERNALERROR>     return outcome.get_result()
2023-12-19T21:12:44.8171378Z INTERNALERROR>            ^^^^^^^^^^^^^^^^^^^^
2023-12-19T21:12:44.8172287Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/pluggy/_result.py", line 114, in get_result
2023-12-19T21:12:44.8173179Z INTERNALERROR>     raise exc.with_traceback(exc.__traceback__)
2023-12-19T21:12:44.8174405Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/pluggy/_callers.py", line 77, in _multicall
2023-12-19T21:12:44.8175220Z INTERNALERROR>     res = hook_impl.function(*args)
2023-12-19T21:12:44.8175633Z INTERNALERROR>           ^^^^^^^^^^^^^^^^^^^^^^^^^
2023-12-19T21:12:44.8176555Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/xdist/dsession.py", line 123, in pytest_runtestloop
2023-12-19T21:12:44.8177342Z INTERNALERROR>     self.loop_once()
2023-12-19T21:12:44.8178296Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/xdist/dsession.py", line 148, in loop_once
2023-12-19T21:12:44.8179020Z INTERNALERROR>     call(**kwargs)
2023-12-19T21:12:44.8179889Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/xdist/dsession.py", line 273, in worker_collectionfinish
2023-12-19T21:12:44.8180703Z INTERNALERROR>     self.sched.schedule()
2023-12-19T21:12:44.8181588Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 339, in schedule
2023-12-19T21:12:44.8182381Z INTERNALERROR>     self._reschedule(node)
2023-12-19T21:12:44.8183278Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 321, in _reschedule
2023-12-19T21:12:44.8184099Z INTERNALERROR>     self._assign_work_unit(node)
2023-12-19T21:12:44.8185057Z INTERNALERROR>   File "/home/runner/micromamba/envs/test/lib/python3.11/site-packages/xdist/scheduler/loadscope.py", line 259, in _assign_work_unit
2023-12-19T21:12:44.8186005Z INTERNALERROR>     worker_collection = self.registered_collections[node]
2023-12-19T21:12:44.8186635Z INTERNALERROR>                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
2023-12-19T21:12:44.8187150Z INTERNALERROR> KeyError: <WorkerController gw7>

mroeschke · 2023-12-19T21:24:55Z

Great thanks!

lithomas1 · 2023-12-19T21:36:02Z

This is what happens today if either of these pops up:
https://github.com/pandas-dev/pandas/actions/runs/7066657149/job/19241456914#step:8:61

Ah OK. Could you at least include a test_args: "-v" in the job configuration in unit-test.yml? At least then the last test should be printed before the job fails so when it fails we don't have to rerun to figure out where this fails

Hm, I wonder if there's a better way to do this.
(My main worry is that this make the logs very hard to scroll through, e.g. if I had to look through the logs because of a flaky non-ASAN related test)

Do you know if e.g. something like PYTEST_CURRENT_TEST might help?
https://docs.pytest.org/en/7.1.x/example/simple.html#pytest-current-test-env

WillAyd · 2023-12-19T21:55:15Z

Is there anything particular to CI that we know of that does not redirect stderr from pytest-xdist to the logs? If you run things locally you get the error and a stacktrace. While that isn't 1-to-1 to the test name it is pretty helpful to figure out what is going on so might be the best medium

mroeschke · 2023-12-19T22:03:21Z

Is there anything particular to CI that we know of that does not redirect stderr from pytest-xdist to the logs?

If I understand your question, this is a pytest xdist limitation: https://pytest-xdist.readthedocs.io/en/stable/known-limitations.html#output-stdout-and-stderr-from-workers

I know the -v isn't elegant, but I would prefer to have a way to narrow down what is causing the failure before merging this in.

lithomas1 · 2023-12-19T23:51:17Z

I know the -v isn't elegant, but I would prefer to have a way to narrow down what is causing the failure before merging this in.

I agree. I don't mean to block this PR, but maybe we should look into the reporting a little more?

At the very least, I think I could settle for a solution where we get pytest to print the filenames of the tests.
We could have this if we turned off pytest-xdist.
(there seems to be an issue with pytest-xdist where it swallows the filenames.
pytest-dev/pytest-xdist#450)

Would this work?

WillAyd · 2023-12-21T01:28:32Z

OK here is what @lithomas1 suggestion looks like:

https://github.com/pandas-dev/pandas/actions/runs/7281808581/job/19843041899?pr=55102#step:8:590

Out of the two options so far I would prefer to go that route. It looks like turning off pytest-xdist for the ASAN build had little to no effect on the overall runtime

pyproject.toml

lithomas1 · 2023-12-21T01:42:19Z

OK here is what @lithomas1 suggestion looks like:

https://github.com/pandas-dev/pandas/actions/runs/7281808581/job/19843041899?pr=55102#step:8:590

Out of the two options so far I would prefer to go that route. It looks like turning off pytest-xdist for the ASAN build had little to no effect on the overall runtime

Ok, this looks correct to me, at a first glance. Just to double check, the failing test happens in test_common.py, right?
(and not in pandas/tests/extension/json/test_json.py)

This reverts commit 677da0e.

WillAyd · 2023-12-21T02:00:00Z

Yea it does happen in test_common. I suppose the downside to this one is you don't get exactly the test that failed, but the first one I see failing locally is pandas/tests/io/test_common.py::TestCommonIOCapabilities::test_write_missing_parent_directory[to_json-os-OSError-json]

lithomas1

LGTM (pending resolution of Matt's last comment)!

Excited to see this finally go in.

mroeschke · 2023-12-21T18:36:20Z

Awesome! Thanks @WillAyd

* enable ASAN/UBSAN in pandas CI * try input * try removing sanitize * try no CFLAGS * try GH string substituion * change flags in build script * quotes * update script run * single_cpu updates * asan checks for datetime funcs * try smaller config * checkpoint * bool fixup * reverts * known UB marker * Finished marking tests with known UB * dedicated CI job * identifier fix * fixes * more test skip * try quotes * simplify ci * try CFLAGS * preload args * skip single_cpu tests * wording * removed unneeded marker * float set implementations * Revert "float set implementations" This reverts commit 6266422. * change marker name * dedicated actions file * consolidated into matrix * fixup * typos * fixups * add qt? * intentional UB with verbose * disable pytest-xdist * original issue * remove UB * Revert "remove UB" This reverts commit 677da0e. * merge fixup * remove UB --------- Co-authored-by: Thomas Li <[email protected]>

enable ASAN/UBSAN in pandas CI

66d83d1

WillAyd requested a review from mroeschke as a code owner September 11, 2023 21:11

WillAyd commented Sep 11, 2023

View reviewed changes

lithomas1 added the Build Library building on various platforms label Sep 11, 2023

WillAyd added 4 commits September 11, 2023 19:44

try input

7aa2e7a

try removing sanitize

a5b3808

try no CFLAGS

7b58c6d

try GH string substituion

18111b0

WillAyd commented Sep 12, 2023

View reviewed changes

lithomas1 reviewed Sep 12, 2023

View reviewed changes

WillAyd added 4 commits September 11, 2023 21:32

change flags in build script

438cdfa

quotes

b18cf9d

update script run

69cb6f6

single_cpu updates

6f5fb11

WillAyd added 2 commits September 13, 2023 20:44

Merge branch 'main' into pandas-asan

eb258ca

asan checks for datetime funcs

663d6d4

WillAyd requested a review from MarcoGorelli as a code owner September 14, 2023 00:45

WillAyd added 3 commits September 15, 2023 08:01

try smaller config

466056d

Merge remote-tracking branch 'upstream/main' into pandas-asan

91f2e17

checkpoint

d4074ca

WillAyd added 2 commits December 19, 2023 15:36

Merge branch 'main' into pandas-asan

c59703d

intentional UB with verbose

02bf20d

WillAyd added 4 commits December 20, 2023 18:00

disable pytest-xdist

01070f3

Merge remote-tracking branch 'upstream/main' into pandas-asan

9f1adbc

original issue

57ed286

remove UB

677da0e

mroeschke reviewed Dec 21, 2023

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

Revert "remove UB"

af0150a

This reverts commit 677da0e.

lithomas1 approved these changes Dec 21, 2023

View reviewed changes

WillAyd added 2 commits December 20, 2023 22:11

merge fixup

4647f12

remove UB

cba79f6

mroeschke approved these changes Dec 21, 2023

View reviewed changes

mroeschke merged commit 8f32ea5 into pandas-dev:main Dec 21, 2023
83 checks passed

WillAyd deleted the pandas-asan branch January 2, 2024 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable ASAN/UBSAN in pandas CI #55102

enable ASAN/UBSAN in pandas CI #55102

WillAyd commented Sep 11, 2023 •

edited by mroeschke

Loading

WillAyd Sep 11, 2023

mroeschke Sep 11, 2023

lithomas1 Sep 11, 2023

WillAyd Sep 11, 2023

WillAyd Sep 11, 2023

WillAyd Sep 12, 2023

lithomas1 Sep 12, 2023 •

edited

Loading

WillAyd Sep 12, 2023

WillAyd commented Sep 12, 2023

rgommers commented Sep 12, 2023

ngoldbaum commented Sep 12, 2023

mroeschke commented Sep 12, 2023

WillAyd commented Sep 12, 2023

WillAyd commented Sep 12, 2023

lithomas1 commented Sep 12, 2023

mroeschke commented Sep 12, 2023

WillAyd commented Sep 12, 2023

WillAyd commented Sep 14, 2023 •

edited

Loading

mroeschke commented Dec 19, 2023

WillAyd commented Dec 19, 2023

WillAyd commented Dec 19, 2023

WillAyd commented Dec 19, 2023

mroeschke commented Dec 19, 2023

WillAyd commented Dec 19, 2023

mroeschke commented Dec 19, 2023

lithomas1 commented Dec 19, 2023

WillAyd commented Dec 19, 2023

mroeschke commented Dec 19, 2023

lithomas1 commented Dec 19, 2023

WillAyd commented Dec 21, 2023

lithomas1 commented Dec 21, 2023

WillAyd commented Dec 21, 2023

lithomas1 left a comment

mroeschke commented Dec 21, 2023

enable ASAN/UBSAN in pandas CI #55102

enable ASAN/UBSAN in pandas CI #55102

Conversation

WillAyd commented Sep 11, 2023 • edited by mroeschke Loading

WillAyd Sep 11, 2023

Choose a reason for hiding this comment

mroeschke Sep 11, 2023

Choose a reason for hiding this comment

lithomas1 Sep 11, 2023

Choose a reason for hiding this comment

WillAyd Sep 11, 2023

Choose a reason for hiding this comment

WillAyd Sep 11, 2023

Choose a reason for hiding this comment

WillAyd Sep 12, 2023

Choose a reason for hiding this comment

lithomas1 Sep 12, 2023 • edited Loading

Choose a reason for hiding this comment

WillAyd Sep 12, 2023

Choose a reason for hiding this comment

WillAyd commented Sep 12, 2023

rgommers commented Sep 12, 2023

ngoldbaum commented Sep 12, 2023

mroeschke commented Sep 12, 2023

WillAyd commented Sep 12, 2023

WillAyd commented Sep 12, 2023

lithomas1 commented Sep 12, 2023

mroeschke commented Sep 12, 2023

WillAyd commented Sep 12, 2023

WillAyd commented Sep 14, 2023 • edited Loading

mroeschke commented Dec 19, 2023

WillAyd commented Dec 19, 2023

WillAyd commented Dec 19, 2023

WillAyd commented Dec 19, 2023

mroeschke commented Dec 19, 2023

WillAyd commented Dec 19, 2023

mroeschke commented Dec 19, 2023

lithomas1 commented Dec 19, 2023

WillAyd commented Dec 19, 2023

mroeschke commented Dec 19, 2023

lithomas1 commented Dec 19, 2023

WillAyd commented Dec 21, 2023

lithomas1 commented Dec 21, 2023

WillAyd commented Dec 21, 2023

lithomas1 left a comment

Choose a reason for hiding this comment

mroeschke commented Dec 21, 2023

WillAyd commented Sep 11, 2023 •

edited by mroeschke

Loading

lithomas1 Sep 12, 2023 •

edited

Loading

WillAyd commented Sep 14, 2023 •

edited

Loading