Skip to content

Commit

Permalink
Removed dask pinning (#570)
Browse files Browse the repository at this point in the history
* removed dask pinning

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed top level error tests

* wip

* testing against 3.12

* fix _get_backing_files; wip fix .attrs

* wip

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip

* wip

* all tests should be fixed (dask-expr installed, but backend disabled)

* attempt fix python3.12 tests

* improved search for dask backing files

* fix test backing files

* add comment

* fix ruff

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* wip pickled dask graphs for tests

* removing pickled tests; requiring minimum version of dask instead

* forgot to re-disable dask-expr

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
  • Loading branch information
LucaMarconato and pre-commit-ci[bot] authored Jun 25, 2024
1 parent 279f817 commit 04d7782
Show file tree
Hide file tree
Showing 28 changed files with 217 additions and 162 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.10
- name: Set up Python 3.12
uses: actions/setup-python@v4
with:
python-version: "3.10"
python-version: "3.12"
cache: pip
- name: Install build dependencies
run: python -m pip install --upgrade pip wheel twine build
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,10 @@ jobs:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Python 3.10
- name: Set up Python 3.12
uses: actions/setup-python@v4
with:
python-version: "3.10"
python-version: "3.12"

- name: Install hatch
run: pip install hatch
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ jobs:
strategy:
fail-fast: false
matrix:
python: ["3.9", "3.10"]
python: ["3.9", "3.12"]
os: [ubuntu-latest]
include:
- os: macos-latest
python: "3.9"
- os: macos-latest
python: "3.10"
python: "3.12"
pip-flags: "--pre"
name: "Python 3.10 (pre-release)"
name: "Python 3.12 (pre-release)"

env:
OS: ${{ matrix.os }}
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/notebooks
Submodule notebooks updated 46 files
+4 −0 .gitignore
+8 −8 .pre-commit-config.yaml
+1 −1 README.md
+ _static/img/interchangeability_labels_polygons.png
+ _static/img/models1.png
+ _static/img/models2.png
+ _static/img/speed.png
+ _static/img/table.png
+ _static/img/visium_hd.jpg
+ _static/img/xenium.png
+5 −1 conf.py
+59 −24 datasets/README.md
+99 −6 notebooks.md
+2 −2 notebooks/developers_resources/storage_format/multiple_elements.ipynb
+186 −138 notebooks/examples/aggregation.ipynb
+48 −39 notebooks/examples/alignment_using_landmarks.ipynb
+ notebooks/examples/attachments/joins_small.png
+ notebooks/examples/attachments/multi_table_example.png
+ notebooks/examples/attachments/napari_visium_hd.png
+ notebooks/examples/attachments/table_slide.png
+216 −142 notebooks/examples/densenet.ipynb
+1 −1 notebooks/examples/densenet_utils.py
+45 −0 notebooks/examples/generate_toc.py
+505 −0 notebooks/examples/labels_shapes_interchangeability.ipynb
+1,049 −0 notebooks/examples/models1.ipynb
+5,498 −0 notebooks/examples/models2.ipynb
+368 −70 notebooks/examples/napari_rois.ipynb
+120 −116 notebooks/examples/spatial_query.ipynb
+664 −0 notebooks/examples/speed_up_illustration.ipynb
+266 −59 notebooks/examples/squidpy_integration.ipynb
+2,544 −0 notebooks/examples/tables.ipynb
+75 −82 notebooks/examples/technology_cosmx.ipynb
+172 −0 notebooks/examples/technology_curio.ipynb
+1 −1 notebooks/examples/technology_merfish.ipynb
+126 −258 notebooks/examples/technology_mibitof.ipynb
+415 −0 notebooks/examples/technology_stereoseq.ipynb
+144 −87 notebooks/examples/technology_visium.ipynb
+1,080 −0 notebooks/examples/technology_visium_hd.ipynb
+750 −0 notebooks/examples/technology_xenium.ipynb
+246 −220 notebooks/paper_reproducibility/00_xenium_and_visium.ipynb
+638 −487 notebooks/paper_reproducibility/01_xenium_and_visium.ipynb
+16 −8 notebooks/paper_reproducibility/02_xenium_and_visium.ipynb
+1 −1 notebooks/paper_reproducibility/03_annotate_visium_cell2location.ipynb
+20 −20 notebooks/paper_reproducibility/lundeberg.ipynb
+1 −1 notebooks/paper_reproducibility/nsclc_cosmx.ipynb
+3 −0 pyproject.toml
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ dependencies = [
"anndata>=0.9.1",
"click",
"dask-image",
"dask<=2024.2.1",
"dask>=2024.2.1",
"fsspec<=2023.6",
"geopandas>=0.14",
"multiscale_spatial_image>=1.0.0",
Expand All @@ -36,6 +36,7 @@ dependencies = [
"pooch",
"pyarrow",
"rich",
"setuptools",
"shapely>=2.0.1",
"spatial_image>=1.1.0",
"scikit-image",
Expand Down
15 changes: 15 additions & 0 deletions src/spatialdata/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,20 @@
from __future__ import annotations

import dask

dask.config.set({"dataframe.query-planning": False})
from dask.dataframe import DASK_EXPR_ENABLED

# Setting `dataframe.query-planning` to False is effective only if run before `dask.dataframe` is initialized. In
# the case in which the user had initilized `dask.dataframe` before, we would have DASK_EXPER_ENABLED set to `True`.
# Here we check that this does not happen.
if DASK_EXPR_ENABLED:
raise RuntimeError(
"Unsupported backend: dask-expr has been detected as the backend of dask.dataframe. Please "
"use:\nimport dask\ndask.config.set({'dataframe.query-planning': False})\nbefore importing "
"dask.dataframe to disable dask-expr. The support is being worked on, for more information please see"
"https://github.com/scverse/spatialdata/pull/570"
)
from importlib.metadata import version

__version__ = version("spatialdata")
Expand Down
8 changes: 4 additions & 4 deletions src/spatialdata/_core/_deepcopy.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from anndata import AnnData
from dask.array.core import Array as DaskArray
from dask.array.core import from_array
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from xarray import DataArray
Expand Down Expand Up @@ -64,9 +64,9 @@ def _(element: DataArray) -> DataArray:

@deepcopy.register(DataTree)
def _(element: DataTree) -> DataTree:
# the complexity here is due to the fact that the parsers don't accept MultiscaleSpatialImage types and that we need
# to convert the DataTree to a MultiscaleSpatialImage. This will be simplified once we support
# multiscale_spatial_image 1.0.0
# TODO: now that multiscale_spatial_image 1.0.0 is supported, this code can probably be simplified. Check
# https://github.com/scverse/spatialdata/pull/587/files#diff-c74ebf49cb8cbddcfaec213defae041010f2043cfddbded24175025b6764ef79
# to understand the original motivation.
model = get_model(element)
for key in element:
ds = element[key].ds
Expand Down
2 changes: 1 addition & 1 deletion src/spatialdata/_core/_elements.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from warnings import warn

from anndata import AnnData
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from geopandas import GeoDataFrame

from spatialdata._types import Raster_T
Expand Down
2 changes: 1 addition & 1 deletion src/spatialdata/_core/centroids.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import dask.array as da
import pandas as pd
import xarray as xr
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from shapely import MultiPolygon, Point, Polygon
Expand Down
2 changes: 1 addition & 1 deletion src/spatialdata/_core/data_extent.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

import numpy as np
import pandas as pd
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from shapely import MultiPolygon, Point, Polygon
Expand Down
2 changes: 1 addition & 1 deletion src/spatialdata/_core/operations/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def transform_to_data_extent(
Notes
-----
- The data extent is the smallest rectangle that contains all the images and geometries.
- MultiscaleSpatialImage objects will be converted to SpatialImage objects.
- DataTree objects (multiscale images) will be converted to DataArray (single-scale images) objects.
- This helper function will be deprecated when https://github.com/scverse/spatialdata/issues/308 is closed,
as this function will be easily recovered by `transform_to_coordinate_system()`
"""
Expand Down
2 changes: 1 addition & 1 deletion src/spatialdata/_core/operations/aggregate.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import geopandas as gpd
import numpy as np
import pandas as pd
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from scipy import sparse
Expand Down
4 changes: 2 additions & 2 deletions src/spatialdata/_core/operations/rasterize.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import dask_image.ndinterp
import datashader as ds
import numpy as np
from dask.array.core import Array as DaskArray
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.array import Array as DaskArray
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from shapely import Point
Expand Down
4 changes: 2 additions & 2 deletions src/spatialdata/_core/operations/rasterize_bins.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from numpy.random import default_rng
from scipy.sparse import csc_matrix
from skimage.transform import estimate_transform
from spatial_image import SpatialImage
from xarray import DataArray

from spatialdata._core.query.relational_query import get_values
from spatialdata.models import Image2DModel, get_table_keys
Expand All @@ -30,7 +30,7 @@ def rasterize_bins(
col_key: str,
row_key: str,
value_key: str | list[str] | None = None,
) -> SpatialImage:
) -> DataArray:
"""
Rasterizes grid-like binned shapes/points annotated by a table (e.g. Visium HD data).
Expand Down
13 changes: 8 additions & 5 deletions src/spatialdata/_core/operations/transform.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,10 @@
import dask_image.ndinterp
import numpy as np
from dask.array.core import Array as DaskArray
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from shapely import Point
from spatial_image import SpatialImage
from xarray import DataArray

from spatialdata._core.spatialdata import SpatialData
Expand Down Expand Up @@ -177,6 +176,9 @@ def _set_transformation_for_transformed_elements(

d = get_transformation(element, get_all=True)
assert isinstance(d, dict)
if DEFAULT_COORDINATE_SYSTEM not in d:
raise RuntimeError(f"Coordinate system {DEFAULT_COORDINATE_SYSTEM} not found in element")
assert isinstance(d, dict)
assert len(d) == 1
assert isinstance(d[DEFAULT_COORDINATE_SYSTEM], Identity)
remove_transformation(element, remove_all=True)
Expand Down Expand Up @@ -389,9 +391,7 @@ def _(
raster_translation = raster_translation_single_scale
# we set a dummy empty dict for the transformation that will be replaced with the correct transformation for
# each scale later in this function, when calling set_transformation()
transformed_dict[k] = SpatialImage(
transformed_dask, dims=xdata.dims, name=xdata.name, attrs={TRANSFORM_KEY: {}}
)
transformed_dict[k] = DataArray(transformed_dask, dims=xdata.dims, name=xdata.name, attrs={TRANSFORM_KEY: {}})

# mypy thinks that schema could be ShapesModel, PointsModel, ...
transformed_data = DataTree.from_dict(transformed_dict)
Expand Down Expand Up @@ -435,6 +435,9 @@ def _(
transformed = data.drop(columns=list(axes)).copy()
# dummy transformation that will be replaced by _adjust_transformation()
transformed.attrs[TRANSFORM_KEY] = {DEFAULT_COORDINATE_SYSTEM: Identity()}
# TODO: the following line, used in place of the line before, leads to an incorrect aggregation result. Look into
# this! Reported here: ...
# transformed.attrs = {TRANSFORM_KEY: {DEFAULT_COORDINATE_SYSTEM: Identity()}}
assert isinstance(transformed, DaskDataFrame)
for ax in axes:
indices = xtransformed["dim"] == ax
Expand Down
2 changes: 1 addition & 1 deletion src/spatialdata/_core/query/relational_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
import numpy as np
import pandas as pd
from anndata import AnnData
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from xarray import DataArray
Expand Down
2 changes: 1 addition & 1 deletion src/spatialdata/_core/query/spatial_query.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import dask.array as da
import dask.dataframe as dd
import numpy as np
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.dataframe import DataFrame as DaskDataFrame
from datatree import DataTree
from geopandas import GeoDataFrame
from shapely.geometry import MultiPolygon, Polygon
Expand Down
18 changes: 7 additions & 11 deletions src/spatialdata/_core/spatialdata.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,14 @@
import pandas as pd
import zarr
from anndata import AnnData
from dask.dataframe import DataFrame as DaskDataFrame
from dask.dataframe import read_parquet
from dask.dataframe.core import DataFrame as DaskDataFrame
from dask.delayed import Delayed
from datatree import DataTree
from geopandas import GeoDataFrame
from multiscale_spatial_image.multiscale_spatial_image import MultiscaleSpatialImage
from ome_zarr.io import parse_url
from ome_zarr.types import JSONDict
from shapely import MultiPolygon, Polygon
from spatial_image import SpatialImage
from xarray import DataArray

from spatialdata._core._elements import Images, Labels, Points, Shapes, Tables
Expand Down Expand Up @@ -97,9 +95,7 @@ class SpatialData:
-----
The SpatialElements are stored with standard types:
- images and labels are stored as :class:`spatial_image.SpatialImage` or
:class:`multiscale_spatial_image.MultiscaleSpatialImage` objects, which are respectively equivalent to
:class:`xarray.DataArray` and to a :class:`datatree.DataTree` of :class:`xarray.DataArray` objects.
- images and labels are stored as :class:`xarray.DataArray` or :class:`datatree.DataTree` objects.
- points are stored as :class:`dask.dataframe.DataFrame` objects.
- shapes are stored as :class:`geopandas.GeoDataFrame`.
- the table are stored as :class:`anndata.AnnData` objects, with the spatial coordinates stored in the obsm
Expand Down Expand Up @@ -856,8 +852,8 @@ def transform_element_to_coordinate_system(
else:
# When maintaining positioning is true, and if the element has a transformation to target_coordinate_system
# (this may not be the case because it could be that the element is not directly mapped to that coordinate
# system), then the transformation to the target coordinate system is not needed # because the data is now
# already transformed; here we remove such transformation.
# system), then the transformation to the target coordinate system is not needed
# because the data is now already transformed; here we remove such transformation.
d = get_transformation(transformed, get_all=True)
assert isinstance(d, dict)
if target_coordinate_system in d:
Expand Down Expand Up @@ -1595,7 +1591,7 @@ def add_image(
def add_labels(
self,
name: str,
labels: SpatialImage | MultiscaleSpatialImage,
labels: DataArray | DataTree,
storage_options: JSONDict | list[JSONDict] | None = None,
overwrite: bool = False,
) -> None:
Expand Down Expand Up @@ -1743,8 +1739,8 @@ def h(s: str) -> str:
descr += f"{h(attr + 'level1.1')}{k!r}: {descr_class} " f"shape: {v.shape} (2D shapes)"
elif attr == "points":
length: int | None = None
if len(v.dask.layers) == 1:
name, layer = v.dask.layers.items().__iter__().__next__()
if len(v.dask) == 1:
name, layer = v.dask.items().__iter__().__next__()
if "read-parquet" in name:
t = layer.creation_info["args"]
assert isinstance(t, tuple)
Expand Down
Loading

0 comments on commit 04d7782

Please sign in to comment.