Skip to content

Commit

Permalink
enable ASAN/UBSAN in pandas CI (#55102)
Browse files Browse the repository at this point in the history
* enable ASAN/UBSAN in pandas CI

* try input

* try removing sanitize

* try no CFLAGS

* try GH string substituion

* change flags in build script

* quotes

* update script run

* single_cpu updates

* asan checks for datetime funcs

* try smaller config

* checkpoint

* bool fixup

* reverts

* known UB marker

* Finished marking tests with known UB

* dedicated CI job

* identifier fix

* fixes

* more test skip

* try quotes

* simplify ci

* try CFLAGS

* preload args

* skip single_cpu tests

* wording

* removed unneeded marker

* float set implementations

* Revert "float set implementations"

This reverts commit 6266422.

* change marker name

* dedicated actions file

* consolidated into matrix

* fixup

* typos

* fixups

* add qt?

* intentional UB with verbose

* disable pytest-xdist

* original issue

* remove UB

* Revert "remove UB"

This reverts commit 677da0e.

* merge fixup

* remove UB

---------

Co-authored-by: Thomas Li <[email protected]>
  • Loading branch information
WillAyd and lithomas1 authored Dec 21, 2023
1 parent 8ce6740 commit 8f32ea5
Show file tree
Hide file tree
Showing 15 changed files with 88 additions and 5 deletions.
11 changes: 9 additions & 2 deletions .github/actions/build_pandas/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ inputs:
editable:
description: Whether to build pandas in editable mode (default true)
default: true
meson_args:
description: Extra flags to pass to meson
required: false
cflags_adds:
description: Items to append to the CFLAGS variable
required: false
runs:
using: composite
steps:
Expand All @@ -24,11 +30,12 @@ runs:

- name: Build Pandas
run: |
export CFLAGS="$CFLAGS ${{ inputs.cflags_adds }}"
if [[ ${{ inputs.editable }} == "true" ]]; then
pip install -e . --no-build-isolation -v --no-deps \
pip install -e . --no-build-isolation -v --no-deps ${{ inputs.meson_args }} \
--config-settings=setup-args="--werror"
else
pip install . --no-build-isolation -v --no-deps \
pip install . --no-build-isolation -v --no-deps ${{ inputs.meson_args }} \
--config-settings=setup-args="--werror"
fi
shell: bash -el {0}
9 changes: 8 additions & 1 deletion .github/actions/run-tests/action.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
name: Run tests and report results
inputs:
preload:
description: Preload arguments for sanitizer
required: false
asan_options:
description: Arguments for Address Sanitizer (ASAN)
required: false
runs:
using: composite
steps:
- name: Test
run: ci/run_tests.sh
run: ${{ inputs.asan_options }} ${{ inputs.preload }} ci/run_tests.sh
shell: bash -el {0}

- name: Publish test results
Expand Down
19 changes: 18 additions & 1 deletion .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,14 @@ jobs:
- name: "Pyarrow Nightly"
env_file: actions-311-pyarrownightly.yaml
pattern: "not slow and not network and not single_cpu"
- name: "ASAN / UBSAN"
env_file: actions-311-sanitizers.yaml
pattern: "not slow and not network and not single_cpu and not skip_ubsan"
asan_options: "ASAN_OPTIONS=detect_leaks=0"
preload: LD_PRELOAD=$(gcc -print-file-name=libasan.so)
meson_args: --config-settings=setup-args="-Db_sanitize=address,undefined"
cflags_adds: -fno-sanitize-recover=all
pytest_workers: -1 # disable pytest-xdist as it swallows stderr from ASAN
fail-fast: false
name: ${{ matrix.name || format('ubuntu-latest {0}', matrix.env_file) }}
env:
Expand All @@ -105,7 +113,7 @@ jobs:
PANDAS_COPY_ON_WRITE: ${{ matrix.pandas_copy_on_write || '0' }}
PANDAS_CI: ${{ matrix.pandas_ci || '1' }}
TEST_ARGS: ${{ matrix.test_args || '' }}
PYTEST_WORKERS: 'auto'
PYTEST_WORKERS: ${{ matrix.pytest_workers || 'auto' }}
PYTEST_TARGET: ${{ matrix.pytest_target || 'pandas' }}
# Clipboard tests
QT_QPA_PLATFORM: offscreen
Expand Down Expand Up @@ -174,16 +182,25 @@ jobs:
- name: Build Pandas
id: build
uses: ./.github/actions/build_pandas
with:
meson_args: ${{ matrix.meson_args }}
cflags_adds: ${{ matrix.cflags_adds }}

- name: Test (not single_cpu)
uses: ./.github/actions/run-tests
if: ${{ matrix.name != 'Pypy' }}
with:
preload: ${{ matrix.preload }}
asan_options: ${{ matrix.asan_options }}
env:
# Set pattern to not single_cpu if not already set
PATTERN: ${{ env.PATTERN == '' && 'not single_cpu' || matrix.pattern }}

- name: Test (single_cpu)
uses: ./.github/actions/run-tests
with:
preload: ${{ matrix.preload }}
asan_options: ${{ matrix.asan_options }}
env:
PATTERN: 'single_cpu'
PYTEST_WORKERS: 0
Expand Down
32 changes: 32 additions & 0 deletions ci/deps/actions-311-sanitizers.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: pandas-dev
channels:
- conda-forge
dependencies:
- python=3.11

# build dependencies
- versioneer[toml]
- cython>=0.29.33
- meson[ninja]=1.2.1
- meson-python=0.13.1

# test dependencies
- pytest>=7.3.2
- pytest-cov
- pytest-xdist>=2.2.0
- pytest-localserver>=0.7.1
- pytest-qt>=4.2.0
- boto3
- hypothesis>=6.46.1
- pyqt>=5.15.9

# required dependencies
- python-dateutil
- numpy<2
- pytz

# pandas dependencies
- pip

- pip:
- "tzdata>=2022.7"
2 changes: 2 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -3206,6 +3206,7 @@ def test_from_out_of_bounds_ns_datetime(
assert item.asm8.dtype == exp_dtype
assert dtype == exp_dtype

@pytest.mark.skip_ubsan
def test_out_of_s_bounds_datetime64(self, constructor):
scalar = np.datetime64(np.iinfo(np.int64).max, "D")
result = constructor(scalar)
Expand Down Expand Up @@ -3241,6 +3242,7 @@ def test_from_out_of_bounds_ns_timedelta(
assert item.asm8.dtype == exp_dtype
assert dtype == exp_dtype

@pytest.mark.skip_ubsan
@pytest.mark.parametrize("cls", [np.datetime64, np.timedelta64])
def test_out_of_s_bounds_timedelta64(self, constructor, cls):
scalar = cls(np.iinfo(np.int64).max, "D")
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/groupby/test_cumulative.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ def test_groupby_cumprod():
tm.assert_series_equal(actual, expected)


@pytest.mark.skip_ubsan
def test_groupby_cumprod_overflow():
# GH#37493 if we overflow we return garbage consistent with numpy
df = DataFrame({"key": ["b"] * 4, "value": 100_000})
Expand Down
10 changes: 9 additions & 1 deletion pandas/tests/io/parser/common/test_float.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,14 @@ def test_scientific_no_exponent(all_parsers_all_precisions):
tm.assert_frame_equal(df_roundtrip, df)


@pytest.mark.parametrize("neg_exp", [-617, -100000, -99999999999999999])
@pytest.mark.parametrize(
"neg_exp",
[
-617,
-100000,
pytest.param(-99999999999999999, marks=pytest.mark.skip_ubsan),
],
)
def test_very_negative_exponent(all_parsers_all_precisions, neg_exp):
# GH#38753
parser, precision = all_parsers_all_precisions
Expand All @@ -51,6 +58,7 @@ def test_very_negative_exponent(all_parsers_all_precisions, neg_exp):
tm.assert_frame_equal(result, expected)


@pytest.mark.skip_ubsan
@xfail_pyarrow # AssertionError: Attributes of DataFrame.iloc[:, 0] are different
@pytest.mark.parametrize("exp", [999999999999999999, -999999999999999999])
def test_too_many_exponent_digits(all_parsers_all_precisions, exp, request):
Expand Down
2 changes: 2 additions & 0 deletions pandas/tests/scalar/timedelta/methods/test_round.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ def test_round_invalid(self):
with pytest.raises(ValueError, match=msg):
t1.round(freq)

@pytest.mark.skip_ubsan
def test_round_implementation_bounds(self):
# See also: analogous test for Timestamp
# GH#38964
Expand All @@ -86,6 +87,7 @@ def test_round_implementation_bounds(self):
with pytest.raises(OutOfBoundsTimedelta, match=msg):
Timedelta.max.round("s")

@pytest.mark.skip_ubsan
@given(val=st.integers(min_value=iNaT + 1, max_value=lib.i8max))
@pytest.mark.parametrize(
"method", [Timedelta.round, Timedelta.floor, Timedelta.ceil]
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timedelta/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -966,6 +966,7 @@ def test_td_op_timedelta_timedeltalike_array(self, op, arr):


class TestTimedeltaComparison:
@pytest.mark.skip_ubsan
def test_compare_pytimedelta_bounds(self):
# GH#49021 don't overflow on comparison with very large pytimedeltas

Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timedelta/test_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,7 @@ def test_timedelta_hash_equality(self):
ns_td = Timedelta(1, "ns")
assert hash(ns_td) != hash(ns_td.to_pytimedelta())

@pytest.mark.skip_ubsan
@pytest.mark.xfail(
reason="pd.Timedelta violates the Python hash invariant (GH#44504).",
)
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timestamp/methods/test_tz_localize.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@


class TestTimestampTZLocalize:
@pytest.mark.skip_ubsan
def test_tz_localize_pushes_out_of_bounds(self):
# GH#12677
# tz_localize that pushes away from the boundary is OK
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timestamp/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -822,6 +822,7 @@ def test_barely_out_of_bounds(self):
with pytest.raises(OutOfBoundsDatetime, match=msg):
Timestamp("2262-04-11 23:47:16.854775808")

@pytest.mark.skip_ubsan
def test_bounds_with_different_units(self):
out_of_bounds_dates = ("1677-09-21", "2262-04-12")

Expand Down
1 change: 1 addition & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -1140,6 +1140,7 @@ def test_to_datetime_dt64s_out_of_ns_bounds(self, cache, dt, errors):
assert ts.unit == "s"
assert ts.asm8 == dt

@pytest.mark.skip_ubsan
def test_to_datetime_dt64d_out_of_bounds(self, cache):
dt64 = np.datetime64(np.iinfo(np.int64).max, "D")

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -523,6 +523,7 @@ markers = [
"db: tests requiring a database (mysql or postgres)",
"clipboard: mark a pd.read_clipboard test",
"arm_slow: mark a test as slow for arm64 architecture",
"skip_ubsan: Tests known to fail UBSAN check",
]

[tool.mypy]
Expand Down
1 change: 1 addition & 0 deletions scripts/tests/data/deps_minimum.toml
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,7 @@ markers = [
"db: tests requiring a database (mysql or postgres)",
"clipboard: mark a pd.read_clipboard test",
"arm_slow: mark a test as slow for arm64 architecture",
"skip_ubsan: tests known to invoke undefined behavior",
]

[tool.mypy]
Expand Down

0 comments on commit 8f32ea5

Please sign in to comment.