Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preload pyfunc v2 model to avoid pydantic schema generation error #535

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

khorshuheng
Copy link
Collaborator

@khorshuheng khorshuheng commented Feb 9, 2024

Description

Due to a bug that was resolved in https://github.com/mlflow/mlflow/pull/8722/files , the current version of Mlflow used by Merlin can potentially introduce some side effect by removing package from sys.modules.

One such side effect occurs when the user's Mlflow Pyfunc code happens to contain a subdirectory named typing . When appending the Pyfunc code to sys.path, Mlflow also unintentionally remove the typing standard module due to name clash, which clear some module level cache.

We have recently migrated from swagger to OpenAPI for code generation, and as a result, the Merlin Pyfunc model (in merlin.model) is now depending on Pydantic model. The removal of typing standard module from sys.modules affect pydantic ability to resolve some field types correctly, which results in schema generation error.

Ideally, we should upgrade Mlflow version to resolve this. But the fix is only introduce is version >2.0, which makes migration difficult. Hence, the alternative is to import Merlin pyfunc model prior to calling pyfunc.load_model. Importing the Pyfunc model allows pydantic to cache the schema for the class, which allows Pyfunc model to be instantiated correctly even though the typing modules is removed from sys.modules.

Modifications

Import Pyfunc models prior to calling pyfunc.load_model.

Tests

Checklist

  • Added PR label
  • Added unit test, integration, and/or e2e tests
  • Tested locally
  • Updated documentation
  • Update Swagger spec if the PR introduce API changes
  • Regenerated Golang and Python client if the PR introduces API changes

Release Notes


@khorshuheng khorshuheng added the bug Something isn't working label Feb 9, 2024
@ghost
Copy link

ghost commented Feb 9, 2024

👇 Click on the image for a new way to code review

Review these changes using an interactive CodeSee Map

Legend

CodeSee Map legend

@khorshuheng khorshuheng force-pushed the preload-model branch 2 times, most recently from 4b32231 to f65cf7b Compare February 9, 2024 11:20
Copy link
Contributor

@leonlnj leonlnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow thanks for catching this obscure one.

@khorshuheng khorshuheng merged commit 1269dbf into main Feb 14, 2024
24 of 25 checks passed
@khorshuheng khorshuheng deleted the preload-model branch February 14, 2024 04:04
@khorshuheng
Copy link
Collaborator Author

Update: while this resolves the issue of image building, it still doesn't prevent the batch prediction job spark executor from failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants