Add data tree and dataset formats to linear regression #566

veni-vidi-vici-dormivi · 2024-11-21T09:05:22Z

I implement handling of DataTrees and xr.Datasets as predictor formats.

Closes stack predictors and targets for xarray objects #355
Tests added
Fully documented, including CHANGELOG.rst

veni-vidi-vici-dormivi · 2024-11-21T09:06:00Z

Inlcudes #361

mathause

Looks good - some suggestions.

mesmer/stats/_linear_regression.py

mathause · 2024-11-21T09:12:42Z

mesmer/stats/_linear_regression.py


        if "intercept" in exclude:
            prediction = xr.zeros_like(params.intercept)
        else:
            prediction = params.intercept

+        # if predictors is a DataTree, rename all data variables to "pred" to avoid conflicts
+        if isinstance(predictors, DataTree) and not predictors.equals(DataTree()):


Suggested change

if isinstance(predictors, DataTree) and not predictors.equals(DataTree()):

if isinstance(predictors, DataTree) and not predictors.is_empty:

?

No because is_empty only checks the node, so root, which can be empty while there are other datasets in the datatree. And we still need to check if it is all empty for the test without any predictors.

mathause · 2024-11-21T09:13:34Z

mesmer/stats/_linear_regression.py

+        # if predictors is a DataTree, rename all data variables to "pred" to avoid conflicts
+        if isinstance(predictors, DataTree) and not predictors.equals(DataTree()):
+            predictors = map_over_subtree(
+                lambda ds: ds.rename({var: "pred" for var in ds.data_vars})


Are we already sure there is only one da on the node? Or will this give a cryptic error message?

mathause · 2024-11-21T09:16:10Z

mesmer/stats/_linear_regression.py

+        prediction = (
+            _extract_single_dataarray_from_dt(prediction)
+            if isinstance(prediction, DataTree)
+            else prediction
+        )


Maybe don't make this a ternary operation if it does not nicely fit on one line

Suggested change

prediction = (

_extract_single_dataarray_from_dt(prediction)

if isinstance(prediction, DataTree)

else prediction

)

if isinstance(prediction, DataTree):

prediction = _extract_single_dataarray_from_dt(prediction)

mathause · 2024-11-21T09:18:52Z

mesmer/stats/_linear_regression.py

@@ -229,14 +260,32 @@ def _fit_linear_regression_xr(
        raise ValueError("dim cannot currently be 'predictor'.")

    for key, pred in predictors.items():
+        pred = (


mathause · 2024-11-21T09:21:15Z

tests/unit/test_linear_regression.py

+def to_dict(data_dict):
+    return data_dict


Is that doing anything? Maybe add a comment. # no op so all three options have a conversion function (or so)

mathause · 2024-11-21T09:22:56Z

tests/unit/test_linear_regression.py

+    return DataTree.from_dict(data_dict)
+
+
+def to_xr_dataset(data_dict):


Suggested change

def to_xr_dataset(data_dict):

def to_dataset(data_dict):

?

mathause · 2024-11-21T10:02:49Z

tests/unit/test_linear_regression.py

@@ -79,24 +93,74 @@ def test_lr_params():


 @pytest.mark.parametrize("as_2D", [True, False])
-def test_lr_predict(as_2D):
+@pytest.mark.parametrize("data_structure", [to_dict, to_datatree, to_xr_dataset])


Good approach!

Actually it's less clear in the code with the data_structure. Maybe rename to to_data_type?. Or maybe data_type is too close to dtype - could use data_cls?

Alternatively you could write

def convert_to(dct, data_type): if data_type == "dict": return dct ... # and @pytest.mark.parametrize("data_type", ["dict", "datatree", "dataset"]) def test_(...): ... pred = convert_to({"tas": tas}, data_type)

mathause · 2024-11-21T10:05:15Z

tests/unit/test_linear_regression.py

+    )
+    lr.params = params if as_2D else params.squeeze()
+
+    tas = xr.DataArray([0, 1, 2], dims="time").rename("tas")


Suggested change

tas = xr.DataArray([0, 1, 2], dims="time").rename("tas")

tas = xr.DataArray([0, 1, 2], dims="time", name="tas")

codecov · 2024-11-21T11:20:15Z

Codecov Report

Attention: Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.

Project coverage is 77.83%. Comparing base (b80d031) to head (05eb6cc).

Files with missing lines	Patch %	Lines
mesmer/stats/_linear_regression.py	95.65%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
+ Coverage   77.75%   77.83%   +0.08%     
==========================================
  Files          49       49              
  Lines        2967     2978      +11     
==========================================
+ Hits         2307     2318      +11     
  Misses        660      660

Flag	Coverage Δ
unittests	`77.83% <96.42%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

Co-authored-by: Mathias Hauser <[email protected]>

veni-vidi-vici-dormivi added 4 commits November 20, 2024 21:17

implement datatree and dataset in linear regression

8a707fa

tests

88b6b75

nits

d8740fe

linting

546236d

veni-vidi-vici-dormivi requested a review from mathause November 21, 2024 09:09

mathause reviewed Nov 21, 2024

View reviewed changes

veni-vidi-vici-dormivi added 2 commits November 21, 2024 12:01

nits

521067e

fixes

1e101e5

Update mesmer/stats/_linear_regression.py

05eb6cc

Co-authored-by: Mathias Hauser <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data tree and dataset formats to linear regression #566

Add data tree and dataset formats to linear regression #566

veni-vidi-vici-dormivi commented Nov 21, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented Nov 21, 2024

mathause left a comment

mathause Nov 21, 2024

veni-vidi-vici-dormivi Nov 22, 2024

mathause Nov 21, 2024

mathause Nov 21, 2024

mathause Nov 21, 2024

mathause Nov 21, 2024

mathause Nov 21, 2024

mathause Nov 21, 2024

mathause Nov 21, 2024

mathause Nov 21, 2024

codecov bot commented Nov 21, 2024 •

edited

Loading

	if isinstance(predictors, DataTree) and not predictors.equals(DataTree()):
	if isinstance(predictors, DataTree) and not predictors.is_empty:

		return DataTree.from_dict(data_dict)


		def to_xr_dataset(data_dict):

	tas = xr.DataArray([0, 1, 2], dims="time").rename("tas")
	tas = xr.DataArray([0, 1, 2], dims="time", name="tas")

Add data tree and dataset formats to linear regression #566

Are you sure you want to change the base?

Add data tree and dataset formats to linear regression #566

Conversation

veni-vidi-vici-dormivi commented Nov 21, 2024 • edited Loading

veni-vidi-vici-dormivi commented Nov 21, 2024

mathause left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 21, 2024 • edited Loading

Codecov Report

veni-vidi-vici-dormivi commented Nov 21, 2024 •

edited

Loading

codecov bot commented Nov 21, 2024 •

edited

Loading