- Model Development: log_loss metric calculation is now distributed.
- Model Registry: Fix an issue that building images fails with specific docker setup.
- Model Registry: Fix an issue that unable to embed local ML library when the library is imported by
zipimport
. - Model Registry: Fix out-of-date doc about
platform
argument in thedeploy
function. - Model Registry: Fix an issue that unable to deploy a GPU-trained PyTorch model to a platform where GPU is not available.
- Model Development: Ordinal encoder can be used with mixed input column types.
- Model Development: Fix an issue when the sklearn default value is
np.nan
. - Model Registry: Fix an issue that incorrect docker executable is used when building images.
- Model Registry: Fix an issue that specifying
token
argument when usingsnowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel
withtransformers < 4.32.0
is not effective. - Model Registry: Fix an issue that incorrect system function call is used when deploying to SPCS.
- Model Registry: Fix an issue when using a
transformers.pipeline
that does not have atokenizer
. - Model Registry: Fix incorrectly-inferred image repository name during model deployment to SPCS.
- Model Registry: Fix GPU resource retention issue caused by failed or stuck previous deployments in SPCS.
- Model Development & Model Registry: Fix an error related to
pandas.io.json.json_normalize
. - Allow disabling telemetry.
- Model Registry: add
create_if_not_exists
parameter in constructor. - Model Registry: Added get_or_create_model_registry API.
- Model Registry: Added support for using GPU inference when deploying XGBoost (
xgboost.XGBModel
andxgboost.Booster
), PyTorch (torch.nn.Module
andtorch.jit.ScriptModule
) and TensorFlow (tensorflow.Module
andtensorflow.keras.Model
) models to Snowpark Container Services. - Model Registry: When inferring model signature,
Sequence
of built-in types,Sequence
ofnumpy.ndarray
,Sequence
oftorch.Tensor
,Sequence
oftensorflow.Tensor
andSequence
oftensorflow.Tensor
can be used instead of onlyList
of them. - Model Registry: Added
get_training_dataset
API. - Model Development: Size of metrics result can exceed previous 8MB limit.
- Model Registry: Added support save/load/deploy HuggingFace pipeline object (
transformers.Pipeline
) and our wrapper (snowflake.ml.model.models.huggingface_pipeline.HuggingFacePipelineModel
) to it. Using the wrapper to specify configurations and the model for the pipeline will be loaded dynamically when deploying. Currently, following tasks are supported to log without manually specifying model signatures:- "conversational"
- "fill-mask"
- "question-answering"
- "summarization"
- "table-question-answering"
- "text2text-generation"
- "text-classification" (alias "sentiment-analysis" available)
- "text-generation"
- "token-classification" (alias "ner" available)
- "translation"
- "translation_xx_to_yy"
- "zero-shot-classification"
- Model Development: Fixed a bug when using simple imputer with numpy >= 1.25.
- Model Development: Fixed a bug when inferring the type of label columns.
- Model Registry:
log_model()
now return aModelReference
object instead of a model ID. - Model Registry: When deploying a model with 1
target method
only, thetarget_method
argument can be omitted. - Model Registry: When using the snowflake-ml-python with version newer than what is available in Snowflake Anaconda
Channel,
embed_local_ml_library
option will be set asTrue
automatically if not. - Model Registry: When deploying a model to Snowpark Container Services and using GPU, the default value of num_workers will be 1.
- Model Registry:
keep_order
andoutput_with_input_features
in the deploy options have been removed. Now the behavior is controlled by the type of the input when callingmodel.predict()
. If the input is apandas.DataFrame
, the behavior will be the same askeep_order=True
andoutput_with_input_features=False
before. If the input is asnowpark.DataFrame
, the behavior will be the same askeep_order=False
andoutput_with_input_features=True
before. - Model Registry: When logging and deploying PyTorch (
torch.nn.Module
andtorch.jit.ScriptModule
) and TensorFlow (tensorflow.Module
andtensorflow.keras.Model
) models, we no longer accept models whose input is a list of tensor and output is a list of tensors. Instead, now we accept models whose input is 1 or more tensors as positional arguments, and output is a tensor or a tuple of tensors. The input and output dataframe when predicting keep the same as before, that is every column is an array feature and contains a tensor.
- Model Registry: Added support save/load/deploy xgboost Booster model.
- Model Registry: Added support to get the model name and the model version from model references.
- Model Registry: Restore the db/schema back to the session after
create_model_registry()
. - Model Registry: Fixed an issue that the UDF name created when deploying a model is not identical to what is provided and cannot be correctly dropped when deployment getting dropped.
- connection_params.SnowflakeLoginOptions(): Added support for
private_key_path
.
- Model Registry: Added support save/load/deploy Tensorflow models (
tensorflow.Module
). - Model Registry: Added support save/load/deploy MLFlow PyFunc models (
mlflow.pyfunc.PyFuncModel
). - Model Development: Input dataframes can now be joined against data loaded from staged files.
- Model Development: Added support for non-English languages.
- Model Registry: Fix an issue that model dependencies are incorrectly reported as unresolvable on certain platforms.
- Model Registry: When predicting a model whose output is a list of NumPy ndarray, the output would not be flattened, instead, every ndarray will act as a feature(column) in the output.
- Model Registry: Added support save/load/deploy PyTorch models (
torch.nn.Module
andtorch.jit.ScriptModule
).
- Model Registry: Fix an issue that when database or schema name provided to
create_model_registry
contains special characters, the model registry cannot be created. - Model Registry: Fix an issue that
get_model_description
returns with additional quotes. - Model Registry: Fix incorrect error message when attempting to remove a unset tag of a model.
- Model Registry: Fix a typo in the default deployment table name.
- Model Registry: Snowpark dataframe for sample input or input for
predict
method that contains a column with SnowflakeNUMBER(precision, scale)
data type wherescale = 0
will not lead to error, and will now correctly recognized asINT64
data type in model signature. - Model Registry: Fix an issue that prevent model logged in the system whose default encoding is not UTF-8 compatible from deploying.
- Model Registry: Added earlier and better error message when any file name in the model or the file name of model itself contains characters that are unable to be encoded using ASCII. It is currently not supported to deploy such a model.
- Model Registry: Prohibit non-snowflake-native models from being logged.
- Model Registry:
_use_local_snowml
parameter in options ofdeploy()
has been removed. - Model Registry: A default
False
embed_local_ml_library
parameter has been added to the options oflog_model()
. With this set toFalse
(default), the version of the local snowflake-ml-python library will be recorded and used when deploying the model. With this set toTrue
, local snowflake-ml-python library will be embedded into the logged model, and will be used when you load or deploy the model.
- Model Registry: A new optional argument named
code_paths
has been added to the arguments oflog_model()
for users to specify additional code paths to be imported when loading and deploying the model. - Model Registry: A new optional argument named
options
has been added to the arguments oflog_model()
to specify any additional options when saving the model. - Model Development: Added metrics:
- d2_absolute_error_score
- d2_pinball_score
- explained_variance_score
- mean_absolute_error
- mean_absolute_percentage_error
- mean_squared_error
- Model Development:
accuracy_score()
now works when given label column names are lists of a single value.
- Model Development: Changed Metrics APIs to imitate sklearn metrics modules:
accuracy_score()
,confusion_matrix()
,precision_recall_fscore_support()
,precision_score()
methods move from respective modules tometrics.classification
.
- Model Registry: The default table/stage created by the Registry now uses "SYSTEM" as a prefix.
- Model Registry:
get_model_history()
method as been enhanced to include the history of model deployment.
- Model Registry: A default
False
flag namedreplace_udf
has been added to the options ofdeploy()
. Setting this toTrue
will allow overwrite existing UDF with the same name when deploying. - Model Development: Added metrics:
- f1_score
- fbeta_score
- recall_score
- roc_auc_score
- roc_curve
- log_loss
- precision_recall_curve
- Model Registry: A new argument named
permanent
has been added to the argument ofdeploy()
. Setting this toTrue
allows the creation of a permanent deployment without needing to specify the UDF location. - Model Registry: A new method
list_deployments()
has been added to enumerate all permanent deployments originating from a specific model. - Model Registry: A new method
get_deployment()
has been added to fetch a deployment by its deployment name. - Model Registry: A new method
delete_deployment()
has been added to remove an existing permanent deployment.
- Model Registry:
predict()
method moves from Registry to ModelReference. - Model Registry:
_snowml_wheel_path
parameter in options ofdeploy()
, is replaced with_use_local_snowml
with default value ofFalse
. Setting this toTrue
will have the same effect of uploading local SnowML code when executing model in the warehouse. - Model Registry: Removed
id
field fromModelReference
constructor. - Model Development: Preprocessing and Metrics move to the modeling package:
snowflake.ml.modeling.preprocessing
andsnowflake.ml.modeling.metrics
. - Model Development:
get_sklearn_object()
method is renamed toto_sklearn()
,to_xgboost()
, andto_lightgbm()
for respective native models.
- Added PolynomialFeatures transformer to the snowflake.ml.modeling.preprocessing module.
- Added metrics:
- accuracy_score
- confusion_matrix
- precision_recall_fscore_support
- precision_score
- Model Registry: Model version can now be any string (not required to be a valid identifier)
- Model Deployment:
deploy()
&predict()
methods now correctly escapes identifiers
- Use cloudpickle to serialize and deserialize models throughout the codebase and removed dependency on joblib.
- Model Deployment: Added support for snowflake.ml models.
- Standardized registry API with following
- Create & open registry taking same set of arguments
- Create & Open can choose schema to use
- Set_tag, set_metric, etc now explicitly calls out arg name as metric_name, tag_name, metric_name, etc.
- Changes to support python 3.9, 3.10
- Added kBinsDiscretizer
- Support for deployment of XGBoost models & int8 types of data
- Big Model Registry Refresh
- Fixed API discrepancies between register_model & log_model.
- Model can be referred by Name + Version (no opaque internal id is required)
- Model Registry: Added support save/load/deploy SKL & XGB Models
- Allow using OneHotEncoder along with sklearn style estimators in a pipeline.
- Model Registry: Added support for delete_model. Use delete_artifact = False to not delete the underlying model data but just unregister.
- Initial version of snowflake-ml modeling package.
- Provide support for training most of scikit-learn and xgboost estimators and transformers.
- Minor fixes in preprocessing package.
- New in Preprocessing:
- SimpleImputer
- Covariance Matrix
- Optimization of Ordinal Encoder client computations.
- Minor fixes in OneHotEncoder.
- Model Registry
- PyTorch & Tensorflow connector file generic FileSet API
- New to Preprocessing:
- Binarizer
- Normalizer
- Pearson correlation Matrix
- Optimization in Ordinal Encoder to cache vocabulary in temp tables.
- Initial version of transformers including:
- Label Encoder
- Max Abs Scaler
- Min Max Scaler
- One Hot Encoder
- Ordinal Encoder
- Robust Scaler
- Standard Scaler