Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault error in tests #65

Closed
SarahAlidoost opened this issue Nov 22, 2024 · 5 comments · Fixed by #66
Closed

Segmentation fault error in tests #65

SarahAlidoost opened this issue Nov 22, 2024 · 5 comments · Fixed by #66

Comments

@SarahAlidoost
Copy link
Member

SarahAlidoost commented Nov 22, 2024

This is related to #62.
We found out using parallel=True in xarray.open_mfdataset in fapar_lai.py returns segmentation fault errors on linux and macos, see GA action ubuntu and mac.

Example log of GA action on linux
Run hatch run fast-test
Creating environment: default
Installing project in development mode
Checking dependencies
============================= test session starts ==============================
platform linux -- Python 3.11.10, pytest-[8](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:9).3.3, pluggy-1.5.0
rootdir: /home/runner/work/zampy/zampy
configfile: pyproject.toml
testpaths: tests
plugins: anyio-4.6.2.post1, cov-6.0.0, mock-3.14.0
collected 85 items / 6 deselected / 7[9](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:10) selected

tests/test_cds_utils.py ..............                                   [ 17%]
tests/test_converter.py ......                                           [ 25%]
tests/test_dataset_protocol.py ......                                    [ 32%]
tests/test_datasets/test_cams.py ....                                    [ 37%]
tests/test_datasets/test_era5.py ....                                    [ 43%]
Fatal Python error: Segmentation fault

Current thread 0x00007f947e3fe640 (most recent call first):
  Garbage-collecting
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/backends/netCDF4_.py", line 83 in __setitem__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/array/core.py", line 4557 in load_store_chunk
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/array/core.py", line 4575 in store_chunk
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/_task_spec.py", line 651 in __call__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 229 in execute_task
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 243 in <listcomp>
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/opt/hostedtoolcache/Python/3.11.[10](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:11)/x64/lib/python3.11/concurrent/futures/thread.py", line 58 in run
  File "/opt/hostedtoolcache/Python/3.[11](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:12).10/x64/lib/python3.11/concurrent/futures/thread.py", line 83 in _worker
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 982 in run
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f947f3ff640 (most recent call first):
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/backends/locks.py", line 64 in __enter__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/backends/locks.py", line 231 in __enter__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/backends/netCDF4_.py", line 115 in _getitem
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/core/indexing.py", line 1018 in explicit_indexing_adapter
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/backends/netCDF4_.py", line 104 in __getitem__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/core/indexing.py", line 657 in get_duck_array
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/core/indexing.py", line 794 in get_duck_array
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/core/indexing.py", line 583 in get_duck_array
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/core/indexing.py", line 578 in __array__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/array/core.py", line [12](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:13)2 in getter
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/_task_spec.py", line 651 in __call__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 229 in execute_task
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 243 in <listcomp>
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 243 in batch_execute_tasks
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/concurrent/futures/thread.py", line 58 in run
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/concurrent/futures/thread.py", line 83 in _worker
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 982 in run
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f9486921640 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 331 in wait
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 629 in wait
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f9487922640 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 331 in wait
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 629 in wait
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f94adc7cb80 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/threading.py", line 327 in wait
  File "/opt/hostedtoolcache/Python/3.11.10/x64/lib/python3.11/queue.py", line 171 in get
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 140 in queue_get
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/local.py", line 505 in get_async
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/threaded.py", line 91 in get
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/base.py", line 397 in compute_as_if_collection
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/dask/array/core.py", line 1282 in store
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/namedarray/daskmanager.py", line 249 in store
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/backends/common.py", line 277 in sync
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/backends/api.py", line 1881 in to_netcdf
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/xarray/core/dataset.py", line 2346 in to_netcdf
  File "/home/runner/work/zampy/zampy/src/zampy/datasets/cds_utils.py", line 394 in convert_to_zampy
  File "/home/runner/work/zampy/zampy/src/zampy/datasets/ecmwf_dataset.py", line 99 in ingest
  File "/home/runner/work/zampy/zampy/tests/test_datasets/test_era5_land.py", line 90 in ingest_dummy_data
  File "/home/runner/work/zampy/zampy/tests/test_datasets/test_era5_land.py", line [13](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:14)5 in test_convert
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/python.py", line [15](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:16)9 in pytest_pyfunc_call
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/python.py", line [16](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:17)27 in runtest
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/runner.py", line [17](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:18)4 in pytest_runtest_call
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/main.py", line 337 in _main
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/lib/python3.11/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/runner/.local/share/hatch/env/virtual/zampy/_HbVvQu8/zampy/bin/pytest", line 8 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt[19](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:20)937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, cftime._cftime, yaml._yaml, psutil._psutil_linux, psutil._psutil_posix, markupsafe._speedups, scipy._lib._ccallback_c, charset_normalizer.md, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.fftpack.convolve, zstandard.backend_c, requests.packages.charset_normalizer.md, requests.packages.chardet.md, PIL._imaging, kiwisolver._cext, rasterio._err, rasterio._filepath, rasterio._version, rasterio._env, rasterio.crs, rasterio._transform, rasterio._base, rasterio._features, rasterio._warp, rasterio._io, rasterio._vsiopener, tornado.speedups, msgpack._cmsgpack, pyproj._compat, pyproj._context, pyproj._network, pyproj._version, pyproj._geod, pyproj.list, pyproj._crs, pyproj.database, pyproj._transformer, pyproj._sync, netCDF4._netCDF4 (total: 1[20](https://github.com/EcoExtreML/zampy/actions/runs/11859581857/job/33052860101?pr=62#step:6:21))
Segmentation fault (core dumped)
tests/test_datasets/test_era5_land.py ...
Error: Process completed with exit code 139.
@SarahAlidoost
Copy link
Member Author

SarahAlidoost commented Nov 22, 2024

Check https://docs.xarray.dev/en/stable/whats-new.html for issues related to segmentation fault. Also check pydata/xarray#4100.

About using parallel=True in xarray.open_mfdataset in fapar_lai.py, the default scheduler in dask.config should be checked as threads scheduler does not work for NetCDF files. Additionally, when addressing this issue, check this comment.

@SarahAlidoost SarahAlidoost changed the title Using option parallel in xarray.open_mfdataset Segmentation fault error in tests Nov 25, 2024
@BSchilperoort
Copy link
Contributor

By switching to the h5netcdf engine this issue can probably be avoided
pydata/xarray#9779

@BSchilperoort
Copy link
Contributor

Also, if it only happens during the loading step, moving to Zarr instead of netCDF would help as well. Although that would require a bit of an overhaul of the code...

@SarahAlidoost
Copy link
Member Author

also, check this GA action that is failed on macos latest due segfault error.

@BSchilperoort
Copy link
Contributor

BSchilperoort commented Nov 26, 2024

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x000000017024b000 (most recent call first):
  File "/Users/runner/Library/Application Support/hatch/env/virtual/zampy/L-k9lf1A/zampy/lib/python3.11/site-packages/xarray/backends/netCDF4_.py", line 82 in __setitem__

Yeah this is quite clear. It's using thread based concurrency, and the netCDF4 backend is not thread-safe :/

The problem with this is that it will not always cause errors, but only when certain conditions align, e.g. two netCDF4 processes trying to access the same (part) of the file at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants