Skip to content

Commit

Permalink
Merge branch 'main' into SNOW-1830033-dataframe-api-bcrs
Browse files Browse the repository at this point in the history
  • Loading branch information
sfc-gh-aalam authored Dec 25, 2024
2 parents 196aefb + 69c41a2 commit fe98abd
Show file tree
Hide file tree
Showing 27 changed files with 3,389 additions and 1,295 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,14 @@
#### New Features

- Added support for the following functions in `functions.py`
- `array_reverse`
- `divnull`
- `map_cat`
- `map_contains_key`
- `map_keys`
- `nullifzero`
- `snowflake_cortex_sentiment`
- Added `Catalog` class to manage snowflake objects. It can be accessed via `Session.catalog`.

#### Improvements

Expand Down Expand Up @@ -38,6 +43,7 @@
- %j: Day of the year as a zero-padded decimal number.
- %X: Locale’s appropriate time representation.
- %%: A literal '%' character.
- Added support for `Series.between`.

#### Bug Fixes

Expand All @@ -48,6 +54,7 @@
- Updated integration testing for `session.lineage.trace` to exclude deleted objects
- Added documentation for `DataFrame.map`.
- Improve performance of `DataFrame.apply` by mapping numpy functions to snowpark functions if possible.
- Added documentation on the extent of Snowpark pandas interoperability with scikit-learn

## 1.26.0 (2024-12-05)

Expand Down
105 changes: 100 additions & 5 deletions docs/source/modin/interoperability.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
===========================================
Interoperability with third party libraries
=============================================
===========================================

Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function
inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method
Expand All @@ -8,15 +9,17 @@ works in Snowpark pandas as well.
Snowpark pandas supports the `dataframe interchange protocol <https://data-apis.org/dataframe-protocol/latest/>`_, which
some libraries use to interoperate with Snowpark pandas to the same level of support as pandas.

The following table is structured as follows: The first column contains a method name.
plotly.express
==============

The following table is structured as follows: The first column contains the name of a method in the ``plotly.express`` module.
The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these
methods, we validate that passing in a Snowpark pandas dataframe as the dataframe input parameter behaves equivalently
to passing in a pandas dataframe.
operations, we validate that passing in Snowpark pandas dataframes or series as the data inputs behaves equivalently
to passing in pandas dataframes or series.

.. note::
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.

Plotly.express module methods

.. note::
Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol.
Expand Down Expand Up @@ -56,3 +59,95 @@ Plotly.express module methods
+-------------------------+---------------------------------------------+--------------------------------------------+
| ``imshow`` | Y | |
+-------------------------+---------------------------------------------+--------------------------------------------+


scikit-learn
============

We break down scikit-learn interoperability by categories of scikit-learn
operations.

For each category, we provide a table of interoperability with the following
structure: The first column describes a scikit-learn operation that may include
multiple method calls. The second column is a flag for whether or not
interoperability is guaranteed with Snowpark pandas. For each of these methods,
we validate that passing in Snowpark pandas objects behaves equivalently to
passing in pandas objects.

.. note::
``Y`` stands for yes, i.e., interoperability is guaranteed with this method, and ``N`` stands for no.

.. note::
While some scikit-learn methods accept Snowpark pandas inputs, their
performance with Snowpark pandas inputs is often much worse than their
performance with native pandas inputs. Generally we recommend converting
Snowpark pandas inputs to pandas with ``to_pandas()`` before passing them
to scikit-learn.


Classification
--------------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Fitting a ``LinearDiscriminantAnalysis`` | Y | |
| classifier with the ``fit()`` method and | | |
| classifying data with the ``predict()`` | | |
| method. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Regression
----------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Fitting a ``LogisticRegression`` model | Y | |
| with the ``fit()`` method and predicting | | |
| results with the ``predict()`` method. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+

Clustering
----------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Clustering method | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| ``KMeans.fit()`` | Y | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Dimensionality reduction
------------------------

+--------------------------------------------+---------------------------------------------+---------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation|
+--------------------------------------------+---------------------------------------------+---------------------------------+
| Getting the principal components of a | Y | |
| numerical dataset with ``PCA.fit()``. | | |
+--------------------------------------------+---------------------------------------------+---------------------------------+


Model selection
------------------------

+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Choosing parameters for a | Y | ``RandomizedSearchCV`` causes Snowpark pandas |
| ``LogisticRegression`` model with | | to issue many queries. We strongly recommend |
| ``RandomizedSearchCV.fit()``. | | converting Snowpark pandas inputs to pandas |
| | | before using ``RandomizedSearchCV`` |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+

Preprocessing
-------------

+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
| Scaling training data with | Y | |
| ``MaxAbsScaler.fit_transform()``. | | |
+--------------------------------------------+---------------------------------------------+-----------------------------------------------+
2 changes: 1 addition & 1 deletion docs/source/modin/supported/series_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ Methods
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``backfill`` | P | | ``N`` if param ``downcast`` is set. |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``between`` | N | | |
| ``between`` | Y | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
| ``between_time`` | N | | |
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
Expand Down
67 changes: 67 additions & 0 deletions docs/source/snowpark/catalog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
=============
Catalog
=============
Catalog module for Snowpark.

.. currentmodule:: snowflake.snowpark.catalog

.. rubric:: Catalog

.. autosummary::
:toctree: api/

Catalog.databaseExists
Catalog.database_exists
Catalog.dropDatabase
Catalog.dropSchema
Catalog.dropTable
Catalog.dropView
Catalog.drop_database
Catalog.drop_schema
Catalog.drop_table
Catalog.drop_view
Catalog.getCurrentDatabase
Catalog.getCurrentSchema
Catalog.getDatabase
Catalog.getProcedure
Catalog.getSchema
Catalog.getTable
Catalog.getUserDefinedFunction
Catalog.getView
Catalog.get_current_database
Catalog.get_current_schema
Catalog.get_database
Catalog.get_procedure
Catalog.get_schema
Catalog.get_table
Catalog.get_user_defined_function
Catalog.get_view
Catalog.listColumns
Catalog.listDatabases
Catalog.listProcedures
Catalog.listSchemas
Catalog.listTables
Catalog.listUserDefinedFunctions
Catalog.listViews
Catalog.list_columns
Catalog.list_databases
Catalog.list_procedures
Catalog.list_schemas
Catalog.list_tables
Catalog.list_user_defined_functions
Catalog.list_views
Catalog.procedureExists
Catalog.procedure_exists
Catalog.schemaExists
Catalog.schema_exists
Catalog.setCurrentDatabase
Catalog.setCurrentSchema
Catalog.set_current_database
Catalog.set_current_schema
Catalog.tableExists
Catalog.table_exists
Catalog.userDefinedFunctionExists
Catalog.user_defined_function_exists
Catalog.viewExists
Catalog.view_exists

9 changes: 9 additions & 0 deletions docs/source/snowpark/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,21 +35,26 @@ Functions
array_construct_compact
array_contains
array_distinct
array_except
array_flatten
array_generate_range
array_insert
array_intersection
array_join
array_max
array_min
array_position
array_prepend
array_remove
array_reverse
array_size
array_slice
array_sort
array_to_string
array_union
array_unique_agg
arrays_overlap
arrays_zip
as_array
as_binary
as_char
Expand Down Expand Up @@ -205,6 +210,10 @@ Functions
lpad
ltrim
make_interval
map_cat
map_concat
map_contains_key
map_keys
max
md5
mean
Expand Down
7 changes: 4 additions & 3 deletions docs/source/snowpark/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ Snowpark APIs
column
types
row
functions
window
grouping
functions
window
grouping
table_function
table
async_job
Expand All @@ -21,6 +21,7 @@ Snowpark APIs
udtf
observability
files
catalog
lineage
context
exceptions
Expand Down
1 change: 1 addition & 0 deletions docs/source/snowpark/session.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Snowpark Session
Session.append_query_tag
Session.call
Session.cancel_all
Session.catalog
Session.clear_imports
Session.clear_packages
Session.close
Expand Down
1 change: 1 addition & 0 deletions recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ requirements:
- protobuf >=3.20,<6
- python-dateutil
- tzlocal
- snowflake.core >=1.0.0,<2

test:
imports:
Expand Down
3 changes: 2 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
"protobuf>=3.20, <6", # Snowpark IR
"python-dateutil", # Snowpark IR
"tzlocal", # Snowpark IR
"snowflake.core>=1.0.0, <2", # Catalog
]
REQUIRED_PYTHON_VERSION = ">=3.8, <3.12"

Expand Down Expand Up @@ -199,7 +200,7 @@ def run(self):
*DEVELOPMENT_REQUIREMENTS,
"scipy", # Snowpark pandas 3rd party library testing
"statsmodels", # Snowpark pandas 3rd party library testing
"scikit-learn==1.5.2", # Snowpark pandas scikit-learn tests
"scikit-learn", # Snowpark pandas 3rd party library testing
# plotly version restricted due to foreseen change in query counts in version 6.0.0+
"plotly<6.0.0", # Snowpark pandas 3rd party library testing
],
Expand Down
Loading

0 comments on commit fe98abd

Please sign in to comment.