Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PECO-1803] Splitting the PySql connector into the core and the non core part #443

Closed
wants to merge 6 commits into from

Conversation

jprakash-db
Copy link
Contributor

PECO-1803

Related Links

databricks_sqlalchemy split is present in this PR - databricks/databricks-sqlalchemy#1

Description

databricks-sql-python library is split so that package size can be reduced for the end user based on their requirements
Particularly pyarrow is the heavy component that is planned to be kept optional
existing library split into

  • databricks-sql-connector ( This is kept, so that the existing users import flow does not change )
  • databricks-sql-connector-core ( This is the lightweight library that separates the core part )

Tasks Completed

  • Refractored the code into its respective folders based on the proposed design doc
  • pyproject.toml file has been changed to reflect the proper dependencies for the split
  • Made sure that all the existing e2e and units tests are working pre and post spit, ensuring parity
  • Added benchmarking queries to test the performance of pre and post split and a dashboard has been created for visualization
  • Dependency tests are also added to check how the library behaves when certain libraries are not available and the user requests their functions

How to Test

Testing pipeline remains the same as it is before the split.
pytest can be used to directly run both the integration as well as unit tests, by pytest [directory_name or file_name]

Addition of dist folder in this repo

Github actions have been setup in the databricks_sqlalchemy repo to run tests using the databricks_sql_connector_core. For running those tests currently we need the .whl file in the dist folder and for temporary testing it has been added to the PR.

Once the library gets published to a public repository such as PyPi then databricks_sqlalchemy will automatically download it from that repo and run the tests using Github actions

Performance Comparison - Benchmarking

The pre-split and post-split preformance comparison has been made using the large and small queries to make sure their is no regression of performance
Dashboard has been created so that everytime the benchmarking is run the result are stored in the benchfood, and comparisons can be made easily
Screenshot 2024-09-03 at 2 48 19 PM

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@jprakash-db jprakash-db self-assigned this Sep 18, 2024
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants