Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added section to Python models doc to discuss third party packages following this the thread: https://dbt-labs.slack.com/archives/C05FWBP9X1U/p1730272033637189 #6418

Merged
merged 16 commits into from
Nov 12, 2024

Conversation

nataliefiann
Copy link
Contributor

@nataliefiann nataliefiann commented Nov 5, 2024

What are you changing in this pull request and why?

I have added a section to the Python models doc to discuss third party packages following this the thread: https://dbt-labs.slack.com/archives/C05FWBP9X1U/p1730272033637189

Checklist

  • I have reviewed the Content style guide so my content adheres to these guidelines.
  • The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the version a whole page and/or version a block of content guidelines.
  • I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch."
  • The content in this PR requires a dbt release note, so I added one to the release notes page.

🚀 Deployment available! Here are the direct links to the updated files:

@nataliefiann nataliefiann requested a review from a team as a code owner November 5, 2024 13:52
Copy link

vercel bot commented Nov 5, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
docs-getdbt-com ✅ Ready (Inspect) Visit Preview Nov 12, 2024 1:09pm

@github-actions github-actions bot added content Improvements or additions to content size: small This change will take 1 to 2 days to address labels Nov 5, 2024
@b-per
Copy link
Contributor

b-per commented Nov 5, 2024

Thanks! I think that overall this is the way we should show it and point to the relevant docs but I am just wondering if the example we show is the best one.

Most people will want to upload Python packages so that they can call their favorite functions. In this example we are currently loading a CSV file. It works but it is not going to be the reason why people will want to use the imports config and load it from a Snowflake stage.

@nataliefiann
Copy link
Contributor Author

Hiya @mirnawong1 and @b-per

Thank you for reviewing this. It's much appreciated.

@b-per with ref to your feedback, I found the below example in the Snowflake doc which gives an example of installing a package with native code via importing from Stage. Would you recommend using this instead:

create or replace function native_module_test_zip()
returns string
language python
runtime_version=3.9
handler='compute'
imports=('@mystage/mycustompackage.zip')
as
$$
import os
import sys
def compute():
   from zipfile import ZipFile
   if os.path.exists('/tmp/mycustompackage'):
      shutil.rmtree('/tmp/mycustompackage')
   os.mkdir('/tmp/mycustompackage')
   zippath = os.path.join(sys._xoptions["snowflake_import_directory"], 'mycustompackage.zip')
   ZipFile(zippath).extractall('/tmp/mycustompackage')
   sys.path.insert(0, '/tmp/mycustompackage')
   #return os.listdir('/tmp/mycustompackage')
   import mycustompackage
   return mycustompackage.mycustompackage()
$$;

Kind Regards
Natalie

@b-per
Copy link
Contributor

b-per commented Nov 5, 2024

Yes, this looks much closer to what people might do! We just need to update it to make it a valid dbt model.

@nataliefiann
Copy link
Contributor Author

Thanks @b-per

I did a little research and came upon this example:

{{ config(
    materialized = 'view'
) }}

create or replace function native_module_test_zip()
returns string
language python
runtime_version=3.9
handler='compute'
imports=('@mystage/mycustompackage.zip')
as
$$
import os
import sys
from zipfile import ZipFile
import shutil

def compute():
   if os.path.exists('/tmp/mycustompackage'):
      shutil.rmtree('/tmp/mycustompackage')
   os.mkdir('/tmp/mycustompackage')
   zippath = os.path.join(sys._xoptions["snowflake_import_directory"], 'mycustompackage.zip')
   ZipFile(zippath).extractall('/tmp/mycustompackage')
   sys.path.insert(0, '/tmp/mycustompackage')
   
   # Import and call the function from mycustompackage
   import mycustompackage
   return mycustompackage.mycustompackage()
$$;

Would that suffice?

Kind Regards

@b-per
Copy link
Contributor

b-per commented Nov 6, 2024

I don't think so, no. This is not a valid dbt Python model.

There are a few things to change. Our docs describe what a Python model needs to have (for example a def model())

@nataliefiann
Copy link
Contributor Author

@nataliefiann
Copy link
Contributor Author

Hiya @mirnawong1 @b-per

I'm currently querying this in ask-adapters-support. I found this example:

def model(dbt, session):
    # Configure the model
    dbt.config(
        materialized="table",
        imports=["@mystage/mycustompackage.zip"],  # Specify the external package location
    )
    
    # Example data transformation using the imported package
    # (Assuming `some_external_package` has a function we can call)
    data = {
        "name": ["Alice", "Bob", "Charlie"],
        "score": [85, 90, 88]
    }
    df = pd.DataFrame(data)

    # Process data with the external package
    df["adjusted_score"] = df["score"].apply(lambda x: some_external_package.adjust_score(x))
    
    # Return the DataFrame as the model output
    return df

Mike Alfare advised: "I think mycustompackage.zip might need to be some_external_package.py. To be honest, this is not something I have seen before, but reviewing our functional tests, it seems like this could work. I'll see if I can find someone more familiar with this particular feature."

I thought I'd share here incase you had any ideas.

Kind Regards
Natalie

@b-per
Copy link
Contributor

b-per commented Nov 11, 2024

I think that both zip and py files would technically work. but people wanting to use external libraries would use the zip approach.

Here are other examples in the internet with the zip option

This snowflake page as well mentions the zip

Rather than waiting for more and more feedback I think that the best is to push the current version and update it later if need be. As it is, it is already providing some guidance on using the imports config which we didn't have before.

@b-per
Copy link
Contributor

b-per commented Nov 11, 2024

Was the snippet completely changed from the one in this thread? The formatting doesn't look as good as the snippet you shared before.

@nataliefiann
Copy link
Contributor Author

Hiya @b-per

Yep, I used the one you ref'd here. I'll rework it.

Kind Regards

website/docs/docs/build/python-models.md Outdated Show resolved Hide resolved
Copy link
Contributor

@mirnawong1 mirnawong1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm to me!

@nataliefiann nataliefiann merged commit 7cb5f09 into current Nov 12, 2024
13 checks passed
@nataliefiann nataliefiann deleted the nfiann-sf-thirdparty branch November 12, 2024 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
content Improvements or additions to content size: small This change will take 1 to 2 days to address
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants