-
Notifications
You must be signed in to change notification settings - Fork 118
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
SNOW-1805842: Add plotly integ tests and Interoperability doc (#2725)
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1805842 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [x] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://github.com/snowflakedb/snowpark-python/blob/main/CONTRIBUTING.md#thread-safe-development) 3. Please describe how your code solves the related issue. Adding Plotly express interoperability tests and a docs page for guaranteed interoperable APIs. --------- Signed-off-by: Labanya Mukhopadhyay <[email protected]> Co-authored-by: Mahesh Vashishtha <[email protected]> Co-authored-by: Hazem Elmeleegy <[email protected]>
- Loading branch information
1 parent
412f80c
commit d6653f5
Showing
4 changed files
with
272 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
Interoperability with third party libraries | ||
============================================= | ||
|
||
Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function | ||
inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method | ||
works in Snowpark pandas as well. | ||
|
||
Snowpark pandas supports the `dataframe interchange protocol <https://data-apis.org/dataframe-protocol/latest/>`_, which | ||
some libraries use to interoperate with Snowpark pandas to the same level of support as pandas. | ||
|
||
The following table is structured as follows: The first column contains a method name. | ||
The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these | ||
methods, we validate that passing in a Snowpark pandas dataframe as the dataframe input parameter behaves equivalently | ||
to passing in a pandas dataframe. | ||
|
||
.. note:: | ||
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no. | ||
|
||
Plotly.express module methods | ||
|
||
.. note:: | ||
Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol. | ||
|
||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| Method name | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``scatter`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``line`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``area`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``timeline`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``violin`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``bar`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``histogram`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``pie`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``treemap`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``sunburst`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``icicle`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``scatter_matrix`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``funnel`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``density_heatmap`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``boxplot`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ | ||
| ``imshow`` | Y | | | ||
+-------------------------+---------------------------------------------+--------------------------------------------+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
211 changes: 211 additions & 0 deletions
211
tests/integ/modin/interoperability/plotly/test_plotly.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,211 @@ | ||
# | ||
# Copyright (c) 2012-2024 Snowflake Computing Inc. All rights reserved. | ||
# | ||
|
||
import modin.pandas as pd | ||
import numpy as np | ||
import plotly.express as px | ||
import pytest | ||
import pandas as native_pd | ||
|
||
import snowflake.snowpark.modin.plugin # noqa: F401 | ||
from tests.integ.utils.sql_counter import sql_count_checker | ||
from tests.integ.modin.utils import eval_snowpark_pandas_result | ||
|
||
# Integration tests for plotly.express module (https://plotly.com/python-api-reference/plotly.express.html). | ||
# To add tests for additional APIs, | ||
# - Call the method with Snowpark pandas and native pandas df input and get the JSON representation with | ||
# `to_plotly_json()`. | ||
# - Assert correctness of the plot produced using `assert_plotly_equal` function defined below. | ||
|
||
|
||
def assert_plotly_equal(expect, got): | ||
# referenced from cudf plotly integration test | ||
# https://github.com/rapidsai/cudf/blob/main/python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_plotly.py#L10 | ||
|
||
assert type(expect) == type(got) | ||
if isinstance(expect, dict): | ||
assert expect.keys() == got.keys() | ||
for k in expect.keys(): | ||
assert_plotly_equal(expect[k], got[k]) | ||
elif isinstance(got, list): | ||
assert len(expect) == len(got) | ||
for i in range(len(expect)): | ||
assert_plotly_equal(expect[i], got[i]) | ||
elif isinstance(expect, np.ndarray): | ||
if isinstance(expect[0], float): | ||
np.testing.assert_allclose(expect, got) | ||
else: | ||
assert (expect == got).all() | ||
else: | ||
assert expect == got | ||
|
||
|
||
@pytest.fixture() | ||
def test_dfs(): | ||
nsamps = 50 | ||
rng = np.random.default_rng(seed=42) | ||
data = { | ||
"x": rng.random(nsamps), | ||
"y": rng.random(nsamps), | ||
"category": rng.integers(0, 5, nsamps), | ||
"category2": rng.integers(0, 5, nsamps), | ||
} | ||
snow_df = pd.DataFrame(data) | ||
native_df = native_pd.DataFrame(data) | ||
return snow_df, native_df | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_scatter(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.scatter(df, x="x", y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_line(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.line(df, x="category", y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_area(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.area(df, x="category", y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_timeline(): | ||
native_df = native_pd.DataFrame( | ||
[ | ||
dict(Task="Job A", Start="2009-01-01", Finish="2009-02-28"), | ||
dict(Task="Job B", Start="2009-03-05", Finish="2009-04-15"), | ||
dict(Task="Job C", Start="2009-02-20", Finish="2009-05-30"), | ||
] | ||
) | ||
snow_df = pd.DataFrame(native_df) | ||
eval_snowpark_pandas_result( | ||
snow_df, | ||
native_df, | ||
lambda df: px.timeline( | ||
df, x_start="Start", x_end="Finish", y="Task" | ||
).to_plotly_json(), | ||
comparator=assert_plotly_equal, | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_violin(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.violin(df, y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_bar(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.bar(df, x="category", y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_histogram(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.histogram(df, x="category").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_pie(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.pie(df, values="category", names="category2").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_treemap(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.treemap(df, names="category", values="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_sunburst(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.sunburst(df, names="category", values="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_icicle(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.icicle(df, names="category", values="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_scatter_matrix(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.scatter_matrix(df, dimensions=["category"]).to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_funnel(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.funnel(df, x="x", y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_density_heatmap(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.density_heatmap(df, x="x", y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=1) | ||
def test_box(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.box(df, x="category", y="y").to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) | ||
|
||
|
||
@sql_count_checker(query_count=4) | ||
def test_imshow(test_dfs): | ||
eval_snowpark_pandas_result( | ||
*test_dfs, | ||
lambda df: px.imshow(df, x=df.columns, y=df.index).to_plotly_json(), | ||
comparator=assert_plotly_equal | ||
) |