Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2991] [SPIKE] Investigate minimal-snowplow-tracker #8409

Closed
1 task done
emmyoop opened this issue Aug 15, 2023 · 8 comments · Fixed by #10530
Closed
1 task done

[CT-2991] [SPIKE] Investigate minimal-snowplow-tracker #8409

emmyoop opened this issue Aug 15, 2023 · 8 comments · Fixed by #10530
Assignees
Labels
performance tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality

Comments

@emmyoop
Copy link
Member

emmyoop commented Aug 15, 2023

Housekeeping

  • I am a maintainer of dbt-core

Short description

minimal-snowplow-tracker is used in dbt-core for anonymous usage tracking. It is an internally managed fork of snowplow-python-tracker. Currently it is 100+ commits behind and only officially supports up to Python 3.7.

Acceptance criteria

Explore if there are performance hits for switching from our fork (minimal-snowplow-tracker) to the Snowplow maintained project (snowplow-python-tracker).

Impact to Adapters

None

Context

If there is a performance hit from using the full package maintained by Snowplow, we should look into rolling this code into the code codebase. Will need to verify if any other internal projects are depending on this project.

The minimal-snowplow-tracker project is considered a critical project by PyPI meaning it's in the top 1% of downloads on PyPI. I'm guessing that's because it's a dependency of core which is also a critical project. When we remove the project we should not remove it from PyPI for 6+ months to ensure we don't accidentally break other projects.

@github-actions github-actions bot changed the title [SPIKE] Investigate minimal-snowplow-tracker [CT-2991] [SPIKE] Investigate minimal-snowplow-tracker Aug 15, 2023
@emmyoop emmyoop added the triage label Aug 15, 2023
@jtcohen6 jtcohen6 added tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality and removed triage labels Aug 17, 2023
@akurdyukov
Copy link
Contributor

I bumped into a issue with minimal showplow after installing meltano using this guide: https://dagster.io/blog/dagster-meltano-integration-tutorial

The guide is a little bit inaccurate because it does not say that you need to install meltano package into venv the dagster is running in. And after installing meltano (and doing nothing with the code) I ran into

Stack Trace:
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_grpc/server.py", line 295, in __init__
    self._loaded_repositories: Optional[LoadedRepositories] = LoadedRepositories(
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_grpc/server.py", line 139, in __init__
    loadable_targets = get_loadable_targets(
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_grpc/utils.py", line 47, in get_loadable_targets
    else loadable_targets_from_python_module(module_name, working_directory)
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_core/workspace/autodiscovery.py", line 35, in loadable_targets_from_python_module
    module = load_python_module(
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster/_core/code_pointer.py", line 135, in load_python_module
    return importlib.import_module(module_name)
  File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/Users/alik/Development/dagster_analytics/dagster_analytics/definitions.py", line 4, in <module>
    from dagster_dbt import DbtCliResource
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/__init__.py", line 2, in <module>
    from .asset_defs import (
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/asset_defs.py", line 62, in <module>
    from dagster_dbt.core.resources import DbtCliClient
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/core/__init__.py", line 5, in <module>
    from .resources_v2 import (
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dagster_dbt/core/resources_v2.py", line 35, in <module>
    from dbt.contracts.results import NodeStatus, TestStatus
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/contracts/results.py", line 1, in <module>
    from dbt.contracts.graph.unparsed import FreshnessThreshold
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/contracts/graph/unparsed.py", line 3, in <module>
    from dbt import deprecations
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/deprecations.py", line 6, in <module>
    import dbt.tracking
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/tracking.py", line 104, in <module>
    emitter = TimeoutEmitter()
  File "/Users/alik/Development/dagster-env-10/lib/python3.10/site-packages/dbt/tracking.py", line 54, in __init__
    super().__init__(

This happens because meltano depends on snowplow-tracker [required: >=1.0.1,<2.0.0, installed: 1.0.1]. And the contract for snowplow_tracker.Emitter does not match between snowplow-tracker and minimal-snowplow-tracker.

I propose to update contract of minimal-snowplow-tracker to match snowplow-tracker. I think I can do it, but I need the sources. Can you please make it public?

@jtcohen6
Copy link
Contributor

I believe it is public: https://github.com/dbt-labs/snowplow-python-tracker

@jaceksan
Copy link

Do you happen to have any news here?

The issue with Meltano persists, I created an issue in their Github:
meltano/meltano#8256

Should I invest my time to work around it in Meltano, or do you plan to solve it here in the near future?

@martynydbt
Copy link

We don't have a timeline for working on this ticket yet @jaceksan

@jaceksan
Copy link

Edgar created a pull request which is already under review and IMO it relates to this issue (if it even does not fix it?):

#9528

@faultymajority
Copy link

faultymajority commented Apr 8, 2024

Why not just release a version 0.1.0 of minimal-snowplow-tracker, and install it to site-packages/minimal_snowplow_tracker, instead of clobbering the real site-packages/snowplow_tracker? The current way breaks every other package that uses the upstream snowplow tracker, which you forked and no longer (needed for your own purposes to) maintain.

@edgarrmondragon
Copy link
Contributor

Why not just release a version 0.1.0 of minimal-snowplow-tracker, and install it to site-packages/minimal_snowplow_tracker, instead of clobbering the real site-packages/snowplow_tracker? The current way is breaks every other package that uses the upstream snowplow tracker, which you forked and no longer (needed for your own purposes to) maintain.

That'd be ideal, but it'd still require changes in dbt to update the imports.

@faultymajority
Copy link

faultymajority commented Apr 8, 2024

At least in dbt-core, this is contained to two lines in one file (a simpler fix than your #9680). Is there telemetry in other massively used dbt packages? Personally I only use dbt-postgres and dbt-bigquery, which don't seem to have a snowplow dep, similarly it seems dbt-utils doesn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants