Skip to content

Commit

Permalink
SNOW-1805842: Add plotly integ tests and Interoperability doc (#2725)
Browse files Browse the repository at this point in the history
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.

Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1805842

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.
- [x] I acknowledge that I have ensured my changes to be thread-safe.
Follow the link for more information: [Thread-safe Developer
Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development)

3. Please describe how your code solves the related issue.

Adding Plotly express interoperability tests and a docs page for
guaranteed interoperable APIs.

---------

Signed-off-by: Labanya Mukhopadhyay <[email protected]>
Co-authored-by: Mahesh Vashishtha <[email protected]>
Co-authored-by: Hazem Elmeleegy <[email protected]>
  • Loading branch information
3 people authored Dec 14, 2024
1 parent 412f80c commit d6653f5
Show file tree
Hide file tree
Showing 4 changed files with 272 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/source/modin/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,6 @@ For your convenience, here is all the :doc:`Supported APIs <supported/index>`
window
groupby
resampling
interoperability
numpy
performance
58 changes: 58 additions & 0 deletions docs/source/modin/interoperability.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Interoperability with third party libraries
=============================================

Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function
inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method
works in Snowpark pandas as well.

Snowpark pandas supports the `dataframe interchange protocol <https://data-apis.org/dataframe-protocol/latest/>`_, which
some libraries use to interoperate with Snowpark pandas to the same level of support as pandas.

The following table is structured as follows: The first column contains a method name.
The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these
methods, we validate that passing in a Snowpark pandas dataframe as the dataframe input parameter behaves equivalently
to passing in a pandas dataframe.

.. note::
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.

Plotly.express module methods

.. note::
Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol.

+-------------------------+---------------------------------------------+--------------------------------------------+
| Method name | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``scatter`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``line`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``area`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``timeline`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``violin`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``bar`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``histogram`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``pie`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``treemap`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``sunburst`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``icicle`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``scatter_matrix`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``funnel`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``density_heatmap`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``boxplot`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``imshow`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,8 @@ def run(self):
"scipy", # Snowpark pandas 3rd party library testing
"statsmodels", # Snowpark pandas 3rd party library testing
"scikit-learn==1.5.2", # Snowpark pandas scikit-learn tests
# plotly version restricted due to foreseen change in query counts in version 6.0.0+
"plotly<6.0.0", # Snowpark pandas 3rd party library testing
],
"localtest": [
"pandas",
Expand Down
211 changes: 211 additions & 0 deletions tests/integ/modin/interoperability/plotly/test_plotly.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
#
# Copyright (c) 2012-2024 Snowflake Computing Inc. All rights reserved.
#

import modin.pandas as pd
import numpy as np
import plotly.express as px
import pytest
import pandas as native_pd

import snowflake.snowpark.modin.plugin # noqa: F401
from tests.integ.utils.sql_counter import sql_count_checker
from tests.integ.modin.utils import eval_snowpark_pandas_result

# Integration tests for plotly.express module (https://plotly.com/python-api-reference/plotly.express.html).
# To add tests for additional APIs,
# - Call the method with Snowpark pandas and native pandas df input and get the JSON representation with
# `to_plotly_json()`.
# - Assert correctness of the plot produced using `assert_plotly_equal` function defined below.


def assert_plotly_equal(expect, got):
# referenced from cudf plotly integration test
# https://github.com/rapidsai/cudf/blob/main/python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_plotly.py#L10

assert type(expect) == type(got)
if isinstance(expect, dict):
assert expect.keys() == got.keys()
for k in expect.keys():
assert_plotly_equal(expect[k], got[k])
elif isinstance(got, list):
assert len(expect) == len(got)
for i in range(len(expect)):
assert_plotly_equal(expect[i], got[i])
elif isinstance(expect, np.ndarray):
if isinstance(expect[0], float):
np.testing.assert_allclose(expect, got)
else:
assert (expect == got).all()
else:
assert expect == got


@pytest.fixture()
def test_dfs():
nsamps = 50
rng = np.random.default_rng(seed=42)
data = {
"x": rng.random(nsamps),
"y": rng.random(nsamps),
"category": rng.integers(0, 5, nsamps),
"category2": rng.integers(0, 5, nsamps),
}
snow_df = pd.DataFrame(data)
native_df = native_pd.DataFrame(data)
return snow_df, native_df


@sql_count_checker(query_count=1)
def test_scatter(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.scatter(df, x="x", y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_line(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.line(df, x="category", y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_area(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.area(df, x="category", y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_timeline():
native_df = native_pd.DataFrame(
[
dict(Task="Job A", Start="2009-01-01", Finish="2009-02-28"),
dict(Task="Job B", Start="2009-03-05", Finish="2009-04-15"),
dict(Task="Job C", Start="2009-02-20", Finish="2009-05-30"),
]
)
snow_df = pd.DataFrame(native_df)
eval_snowpark_pandas_result(
snow_df,
native_df,
lambda df: px.timeline(
df, x_start="Start", x_end="Finish", y="Task"
).to_plotly_json(),
comparator=assert_plotly_equal,
)


@sql_count_checker(query_count=1)
def test_violin(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.violin(df, y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_bar(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.bar(df, x="category", y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_histogram(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.histogram(df, x="category").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_pie(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.pie(df, values="category", names="category2").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_treemap(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.treemap(df, names="category", values="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_sunburst(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.sunburst(df, names="category", values="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_icicle(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.icicle(df, names="category", values="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_scatter_matrix(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.scatter_matrix(df, dimensions=["category"]).to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_funnel(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.funnel(df, x="x", y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_density_heatmap(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.density_heatmap(df, x="x", y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=1)
def test_box(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.box(df, x="category", y="y").to_plotly_json(),
comparator=assert_plotly_equal
)


@sql_count_checker(query_count=4)
def test_imshow(test_dfs):
eval_snowpark_pandas_result(
*test_dfs,
lambda df: px.imshow(df, x=df.columns, y=df.index).to_plotly_json(),
comparator=assert_plotly_equal
)

0 comments on commit d6653f5

Please sign in to comment.