Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable ASAN/UBSAN in pandas CI #55102

Merged
merged 57 commits into from
Dec 21, 2023
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
66d83d1
enable ASAN/UBSAN in pandas CI
WillAyd Sep 11, 2023
7aa2e7a
try input
WillAyd Sep 11, 2023
a5b3808
try removing sanitize
WillAyd Sep 12, 2023
7b58c6d
try no CFLAGS
WillAyd Sep 12, 2023
18111b0
try GH string substituion
WillAyd Sep 12, 2023
438cdfa
change flags in build script
WillAyd Sep 12, 2023
b18cf9d
quotes
WillAyd Sep 12, 2023
69cb6f6
update script run
WillAyd Sep 12, 2023
6f5fb11
single_cpu updates
WillAyd Sep 12, 2023
eb258ca
Merge branch 'main' into pandas-asan
WillAyd Sep 14, 2023
663d6d4
asan checks for datetime funcs
WillAyd Sep 14, 2023
466056d
try smaller config
WillAyd Sep 15, 2023
91f2e17
Merge remote-tracking branch 'upstream/main' into pandas-asan
WillAyd Sep 15, 2023
d4074ca
checkpoint
WillAyd Sep 15, 2023
aeff50e
Merge branch 'main' into pandas-asan
WillAyd Oct 26, 2023
e303ba1
bool fixup
WillAyd Oct 27, 2023
4220d82
Merge branch 'main' into pandas-asan
WillAyd Nov 16, 2023
46d1034
reverts
WillAyd Nov 16, 2023
89706a4
Merge branch 'main' into pandas-asan
WillAyd Nov 17, 2023
929c731
known UB marker
WillAyd Nov 17, 2023
b01242b
Merge branch 'main' into pandas-asan
lithomas1 Nov 28, 2023
6483e07
Finished marking tests with known UB
WillAyd Dec 2, 2023
de13605
Merge remote-tracking branch 'upstream/main' into pandas-asan
WillAyd Dec 2, 2023
b87a210
dedicated CI job
WillAyd Dec 2, 2023
77d1e00
Merge remote-tracking branch 'upstream/main' into pandas-asan
WillAyd Dec 2, 2023
46ec023
identifier fix
WillAyd Dec 2, 2023
8695dca
fixes
WillAyd Dec 2, 2023
05319ae
more test skip
WillAyd Dec 2, 2023
6d76a57
try quotes
WillAyd Dec 2, 2023
f5dd440
simplify ci
WillAyd Dec 2, 2023
12aa1d1
try CFLAGS
WillAyd Dec 2, 2023
628d1c2
preload args
WillAyd Dec 2, 2023
1de633e
skip single_cpu tests
WillAyd Dec 2, 2023
3e295c5
wording
WillAyd Dec 2, 2023
252197e
Merge remote-tracking branch 'upstream/main' into pandas-asan
WillAyd Dec 5, 2023
d5809b8
removed unneeded marker
WillAyd Dec 5, 2023
6266422
float set implementations
WillAyd Dec 5, 2023
b68a533
Revert "float set implementations"
WillAyd Dec 5, 2023
47dc305
Merge branch 'main' into pandas-asan
WillAyd Dec 6, 2023
636b8dd
Merge remote-tracking branch 'upstream/main' into pandas-asan
WillAyd Dec 13, 2023
a03ad1e
change marker name
WillAyd Dec 15, 2023
656edb1
dedicated actions file
WillAyd Dec 15, 2023
2aabda1
consolidated into matrix
WillAyd Dec 15, 2023
a9f2419
Merge remote-tracking branch 'upstream/main' into pandas-asan
WillAyd Dec 15, 2023
3056e5f
fixup
WillAyd Dec 15, 2023
89b2b80
typos
WillAyd Dec 15, 2023
d591b78
fixups
WillAyd Dec 16, 2023
6442066
add qt?
WillAyd Dec 16, 2023
c59703d
Merge branch 'main' into pandas-asan
WillAyd Dec 19, 2023
02bf20d
intentional UB with verbose
WillAyd Dec 19, 2023
01070f3
disable pytest-xdist
WillAyd Dec 20, 2023
9f1adbc
Merge remote-tracking branch 'upstream/main' into pandas-asan
WillAyd Dec 20, 2023
57ed286
original issue
WillAyd Dec 20, 2023
677da0e
remove UB
WillAyd Dec 20, 2023
af0150a
Revert "remove UB"
WillAyd Dec 21, 2023
4647f12
merge fixup
WillAyd Dec 21, 2023
cba79f6
remove UB
WillAyd Dec 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions .github/actions/build_pandas/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ inputs:
editable:
description: Whether to build pandas in editable mode (default true)
default: true
meson_args:
description: Extra flags to pass to meson
required: false
cflags_adds:
WillAyd marked this conversation as resolved.
Show resolved Hide resolved
description: Items to append to the CFLAGS variable
required: false
runs:
using: composite
steps:
Expand All @@ -24,9 +30,10 @@ runs:

- name: Build Pandas
run: |
export CFLAGS="$CFLAGS ${{ inputs.cflags_adds }}"
if [[ ${{ inputs.editable }} == "true" ]]; then
pip install -e . --no-build-isolation -v --no-deps
pip install -e . --no-build-isolation -v --no-deps ${{ inputs.meson_args }}
WillAyd marked this conversation as resolved.
Show resolved Hide resolved
else
pip install . --no-build-isolation -v --no-deps
pip install . --no-build-isolation -v --no-deps ${{ inputs.meson_args }}
fi
shell: bash -el {0}
39 changes: 39 additions & 0 deletions .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,45 @@ jobs:
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-musl
cancel-in-progress: true

ASAN_UBSAN:
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
runs-on: ubuntu-22.04
timeout-minutes: 90
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Conda
uses: ./.github/actions/setup-conda
with:
environment-file: ci/deps/actions-311-numpydev.yaml
mroeschke marked this conversation as resolved.
Show resolved Hide resolved

- name: Build Pandas
id: build
uses: ./.github/actions/build_pandas
with:
meson_args: --config-settings=setup-args="-Db_sanitize=address,undefined"
cflags_adds: -fno-sanitize-recover=all
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this switch ASAN will in some cases recover and continue, which doesn't help CI to actually error out when issues occur (at least not with pytest-xdist)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get more predictable/consistent results without xdist, I think we should not run the tests in parallel


- name: Test (not single_cpu)
uses: ./.github/actions/run-tests
env:
PATTERN: "not slow and not network and not single_cpu and not known_ub"
PYTEST_WORKERS: 'auto'
PYTEST_TARGET: 'pandas'
ASAN_OPTIONS: detect_leaks=0
LD_PRELOAD: $(gcc -print-file-name=libasan.so)

- name: Test (single_cpu)
uses: ./.github/actions/run-tests
env:
PATTERN: "single_cpu and not known_ub"
PYTEST_WORKERS: 0
PYTEST_TARGET: 'pandas'
ASAN_OPTIONS: detect_leaks=0
LD_PRELOAD: $(gcc -print-file-name=libasan.so)

python-dev:
# This job may or may not run depending on the state of the next
# unreleased Python version. DO NOT DELETE IT.
Expand Down
2 changes: 2 additions & 0 deletions pandas/tests/frame/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -3197,6 +3197,7 @@ def test_from_out_of_bounds_ns_datetime(
assert item.asm8.dtype == exp_dtype
assert dtype == exp_dtype

@pytest.mark.known_ub
def test_out_of_s_bounds_datetime64(self, constructor):
scalar = np.datetime64(np.iinfo(np.int64).max, "D")
result = constructor(scalar)
Expand Down Expand Up @@ -3232,6 +3233,7 @@ def test_from_out_of_bounds_ns_timedelta(
assert item.asm8.dtype == exp_dtype
assert dtype == exp_dtype

@pytest.mark.known_ub
@pytest.mark.parametrize("cls", [np.datetime64, np.timedelta64])
def test_out_of_s_bounds_timedelta64(self, constructor, cls):
scalar = cls(np.iinfo(np.int64).max, "D")
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/groupby/test_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -1391,6 +1391,7 @@ def test_groupby_apply_to_series_name():
tm.assert_series_equal(result, expected)


@pytest.mark.known_ub
@pytest.mark.parametrize("dropna", [True, False])
def test_apply_na(dropna):
# GH#28984
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/groupby/test_cumulative.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ def test_groupby_cumprod():
tm.assert_series_equal(actual, expected)


@pytest.mark.known_ub
def test_groupby_cumprod_overflow():
# GH#37493 if we overflow we return garbage consistent with numpy
df = DataFrame({"key": ["b"] * 4, "value": 100_000})
Expand Down
10 changes: 9 additions & 1 deletion pandas/tests/io/parser/common/test_float.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,14 @@ def test_scientific_no_exponent(all_parsers_all_precisions):
tm.assert_frame_equal(df_roundtrip, df)


@pytest.mark.parametrize("neg_exp", [-617, -100000, -99999999999999999])
@pytest.mark.parametrize(
"neg_exp",
[
-617,
-100000,
pytest.param(-99999999999999999, marks=pytest.mark.known_ub),
],
)
def test_very_negative_exponent(all_parsers_all_precisions, neg_exp):
# GH#38753
parser, precision = all_parsers_all_precisions
Expand All @@ -51,6 +58,7 @@ def test_very_negative_exponent(all_parsers_all_precisions, neg_exp):
tm.assert_frame_equal(result, expected)


@pytest.mark.known_ub
@xfail_pyarrow # AssertionError: Attributes of DataFrame.iloc[:, 0] are different
@pytest.mark.parametrize("exp", [999999999999999999, -999999999999999999])
def test_too_many_exponent_digits(all_parsers_all_precisions, exp, request):
Expand Down
2 changes: 2 additions & 0 deletions pandas/tests/scalar/timedelta/methods/test_round.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ def test_round_invalid(self):
with pytest.raises(ValueError, match=msg):
t1.round(freq)

@pytest.mark.known_ub
def test_round_implementation_bounds(self):
# See also: analogous test for Timestamp
# GH#38964
Expand All @@ -86,6 +87,7 @@ def test_round_implementation_bounds(self):
with pytest.raises(OutOfBoundsTimedelta, match=msg):
Timedelta.max.round("s")

@pytest.mark.known_ub
@given(val=st.integers(min_value=iNaT + 1, max_value=lib.i8max))
@pytest.mark.parametrize(
"method", [Timedelta.round, Timedelta.floor, Timedelta.ceil]
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timedelta/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -966,6 +966,7 @@ def test_td_op_timedelta_timedeltalike_array(self, op, arr):


class TestTimedeltaComparison:
@pytest.mark.known_ub
def test_compare_pytimedelta_bounds(self):
# GH#49021 don't overflow on comparison with very large pytimedeltas

Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timedelta/test_timedelta.py
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,7 @@ def test_timedelta_hash_equality(self):
ns_td = Timedelta(1, "ns")
assert hash(ns_td) != hash(ns_td.to_pytimedelta())

@pytest.mark.known_ub
@pytest.mark.xfail(
reason="pd.Timedelta violates the Python hash invariant (GH#44504).",
)
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timestamp/methods/test_tz_localize.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@


class TestTimestampTZLocalize:
@pytest.mark.known_ub
def test_tz_localize_pushes_out_of_bounds(self):
# GH#12677
# tz_localize that pushes away from the boundary is OK
Expand Down
1 change: 1 addition & 0 deletions pandas/tests/scalar/timestamp/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -815,6 +815,7 @@ def test_barely_out_of_bounds(self):
with pytest.raises(OutOfBoundsDatetime, match=msg):
Timestamp("2262-04-11 23:47:16.854775808")

@pytest.mark.known_ub
def test_bounds_with_different_units(self):
out_of_bounds_dates = ("1677-09-21", "2262-04-12")

Expand Down
1 change: 1 addition & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -1140,6 +1140,7 @@ def test_to_datetime_dt64s_out_of_ns_bounds(self, cache, dt, errors):
assert ts.unit == "s"
assert ts.asm8 == dt

@pytest.mark.known_ub
def test_to_datetime_dt64d_out_of_bounds(self, cache):
dt64 = np.datetime64(np.iinfo(np.int64).max, "D")

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -524,6 +524,7 @@ markers = [
"clipboard: mark a pd.read_clipboard test",
"arm_slow: mark a test as slow for arm64 architecture",
"arraymanager: mark a test to run with ArrayManager enabled",
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
"known_ub: tests known to invoke undefined behavior",
mroeschke marked this conversation as resolved.
Show resolved Hide resolved
]

[tool.mypy]
Expand Down
1 change: 1 addition & 0 deletions scripts/tests/data/deps_minimum.toml
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,7 @@ markers = [
"clipboard: mark a pd.read_clipboard test",
"arm_slow: mark a test as slow for arm64 architecture",
"arraymanager: mark a test to run with ArrayManager enabled",
"known_ub: tests that trigger known undefined behavior",
]

[tool.mypy]
Expand Down
Loading