Skip to content

Commit

Permalink
Begin warning people about spaces in model names (#9886)
Browse files Browse the repository at this point in the history
* Add event type for deprecation of spaces in model names

* Begin emitting deprecation warning for spaces in model names

* Only warn on first model name with spaces unless `--debug` is specified

For projects with a lot of models that have spaces in their names, the
warning about this deprecation would be incredibly annoying. Now we instead
only log the first model name issue and then a count of how many models
have the issue, unless `--debug` is specified.

* Refactor `EventCatcher` so that the event to catch is setable

We want to be able to catch more than just `SpacesInModelNameDeprecation`
events, and in the next commit we will alter our tests to do so. Thus
instead of writing a new catcher for each event type, a slight modification
to the existing `EventCatcher` makes this much easier.

* Add project flag to control whether spaces are allowed in model names

* Log errors and raise exception when `allow_spaces_in_model_names` is `False`

* Use custom event for output invalid name counts instead of `Note` events

Using `Note` events was causing test flakiness when run in a multi
worker environment using `pytest -nauto`. This is because the event
manager is currently a global. So in a situation where test `A` starts
and test `tests_debug_when_spaces_in_name` starts shortly there after,
the event manager for both tests will have the callbacks set in
`tests_debug_when_spaces_in_name`. Then if something in test `A` fired
a `Note` event, this would affect the count of `Note` events that
`tests_debug_when_spaces_in_name` sees, causing assertion failures. By
creating a custom event, `TotalModelNamesWithSpacesDeprecation`, we limit
the possible flakiness to only tests that fire the custom event. Thus
we didn't _eliminate_ all possibility of flakiness, but realistically
only the tests in `test_check_for_spaces_in_model_names.py` can now
interfere with each other. Which still isn't great, but to fully
resolve the problem we need to work on how the event manager is
handled (preferably not globally).

* Always log total invalid model names if at least one

Previously we only logged out the count of how many invalid model names
there were if there was two or more invalid names (and not in debug mode).
However this message is important if there is even one invalid model
name and regardless of whether you are running debug mode. That is because
automated tools might be looking for the event type to track if anything
is wrong.

A related change in this commit is that we now only output the debug hint
if it wasn't run with debug mode. The idea being that if they are already
running it in debug mode, the hint could come accross as somewhat
patronizing.

* Reduce duplicate `if` logic in `check_for_spaces_in_model_names`

* Improve readability of logs related to problematic model names

We want people running dbt to be able to at a glance see warnings/errors
with running their project. In this case we are focused specifically on
errors/warnings in regards to model names containing spaces. Previously
we were only ever emitting the `warning_tag` in the message even if the
event itself was being emitted at an `ERROR` level. We now properly have
`[ERROR]` or `[WARNING]` in the message depending on the level. Unfortunately
we couldn't just look what level the event was being fired at, because that
information doesn't exist on the event itself.

Additionally, we're using events that base off of `DynamicEvents` which
unfortunately hard coded to `DEBUG`. Changing this would involve still
having a `level` property on the definition in `core_types.proto` and
then having `DynamicEvent`s look to `self.level` in the `level_tag`
method. Then we could change how firing events works based on the an
event's `level_tag` return value. This all sounds like a bit of tech
debt suited for PR, possibly multiple, and thus is not being done here.

* Alter `TotalModelNamesWithSpacesDeprecation` message to handle singular and plural

* Remove duplicate import in `test_graph.py` introduced from merging in main

---------

Co-authored-by: Emily Rockman <[email protected]>
  • Loading branch information
QMalcolm and emmyoop authored Apr 12, 2024
1 parent 99d033f commit f15e128
Show file tree
Hide file tree
Showing 9 changed files with 820 additions and 575 deletions.
6 changes: 6 additions & 0 deletions .changes/unreleased/Fixes-20240409-233347.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Fixes
body: Begin warning people about spaces in model names
time: 2024-04-09T23:33:47.850166-07:00
custom:
Author: QMalcolm
Issue: "9397"
6 changes: 5 additions & 1 deletion core/dbt/contracts/project.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ def validate(cls, data):

@dataclass
class ProjectFlags(ExtensibleDbtClassMixin):
allow_spaces_in_model_names: Optional[bool] = True
cache_selected_only: Optional[bool] = None
debug: Optional[bool] = None
fail_fast: Optional[bool] = None
Expand All @@ -320,7 +321,10 @@ class ProjectFlags(ExtensibleDbtClassMixin):

@property
def project_only_flags(self) -> Dict[str, Any]:
return {"source_freshness_run_project_hooks": self.source_freshness_run_project_hooks}
return {
"source_freshness_run_project_hooks": self.source_freshness_run_project_hooks,
"allow_spaces_in_model_names": self.allow_spaces_in_model_names,
}


@dataclass
Expand Down
24 changes: 24 additions & 0 deletions core/dbt/events/core_types.proto
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,30 @@ message ProjectFlagsMovedDeprecationMsg {
ProjectFlagsMovedDeprecation data = 2;
}

// D014
message SpacesInModelNameDeprecation {
string model_name = 1;
string model_version = 2;
string level = 3;
}

message SpacesInModelNameDeprecationMsg {
CoreEventInfo info = 1;
SpacesInModelNameDeprecation data = 2;
}

// D015
message TotalModelNamesWithSpacesDeprecation {
int32 count_invalid_names = 1;
bool show_debug_hint = 2;
string level = 3;
}

message TotalModelNamesWithSpacesDeprecationMsg {
CoreEventInfo info = 1;
TotalModelNamesWithSpacesDeprecation data = 2;
}

// I065
message DeprecatedModel {
string model_name = 1;
Expand Down
1,146 changes: 577 additions & 569 deletions core/dbt/events/core_types_pb2.py

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions core/dbt/events/types.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@
from dbt.events.base_types import WarnLevel, InfoLevel, DebugLevel, ErrorLevel, DynamicLevel


# TODO Move this to dbt_common.ui
def _error_tag(msg: str) -> str:
return f'[{red("ERROR")}]: {msg}'


# Event codes have prefixes which follow this table
#
# | Code | Description |
Expand Down Expand Up @@ -413,6 +418,43 @@ def message(self) -> str:
return warning_tag(f"Deprecated functionality\n\n{description}")


class SpacesInModelNameDeprecation(DynamicLevel):
def code(self) -> str:
return "D014"

def message(self) -> str:
version = ".v" + self.model_version if self.model_version else ""
description = (
f"Model `{self.model_name}{version}` has spaces in its name. This is deprecated and "
"may cause errors when using dbt."
)

if self.level == EventLevel.ERROR.value:
description = _error_tag(description)
elif self.level == EventLevel.WARN.value:
description = warning_tag(description)

return line_wrap_message(description)


class TotalModelNamesWithSpacesDeprecation(DynamicLevel):
def code(self) -> str:
return "D015"

def message(self) -> str:
description = f"Spaces in model names found in {self.count_invalid_names} model(s), which is deprecated."

if self.show_debug_hint:
description += " Run again with `--debug` to see them all."

if self.level == EventLevel.ERROR.value:
description = _error_tag(description)
elif self.level == EventLevel.WARN.value:
description = warning_tag(description)

return line_wrap_message(description)


# =======================================================
# I - Project parsing
# =======================================================
Expand Down
45 changes: 45 additions & 0 deletions core/dbt/parser/manifest.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
from dbt.context.manifest import generate_query_header_context
from dbt.contracts.graph.semantic_manifest import SemanticManifest
from dbt_common.events.base_types import EventLevel
from dbt_common.exceptions.base import DbtValidationError
import dbt_common.utils
import json
import pprint
Expand Down Expand Up @@ -62,6 +63,8 @@
StateCheckVarsHash,
DeprecatedModel,
DeprecatedReference,
SpacesInModelNameDeprecation,
TotalModelNamesWithSpacesDeprecation,
UpcomingReferenceDeprecation,
)
from dbt.logger import DbtProcessState
Expand Down Expand Up @@ -520,6 +523,7 @@ def load(self) -> Manifest:
self.write_manifest_for_partial_parse()

self.check_for_model_deprecations()
self.check_for_spaces_in_model_names()

return self.manifest

Expand Down Expand Up @@ -621,6 +625,47 @@ def check_for_model_deprecations(self):
)
)

def check_for_spaces_in_model_names(self):
"""Validates that model names do not contain spaces
If `DEBUG` flag is `False`, logs only first bad model name
If `DEBUG` flag is `True`, logs every bad model name
If `ALLOW_SPACES_IN_MODEL_NAMES` is `False`, logs are `ERROR` level and an exception is raised if any names are bad
If `ALLOW_SPACES_IN_MODEL_NAMES` is `True`, logs are `WARN` level
"""
improper_model_names = 0
level = (
EventLevel.WARN
if self.root_project.args.ALLOW_SPACES_IN_MODEL_NAMES
else EventLevel.ERROR
)

for node in self.manifest.nodes.values():
if isinstance(node, ModelNode) and " " in node.name:
if improper_model_names == 0 or self.root_project.args.DEBUG:
fire_event(
SpacesInModelNameDeprecation(
model_name=node.name,
model_version=version_to_str(node.version),
level=level.value,
),
level=level,
)
improper_model_names += 1

if improper_model_names > 0:
fire_event(
TotalModelNamesWithSpacesDeprecation(
count_invalid_names=improper_model_names,
show_debug_hint=(not self.root_project.args.DEBUG),
level=level.value,
),
level=level,
)

if level == EventLevel.ERROR:
raise DbtValidationError("Model names cannot contain spaces")

def load_and_parse_macros(self, project_parser_files):
for project in self.all_projects.values():
if project.project_name not in project_parser_files:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
import pytest

from dataclasses import dataclass, field
from dbt.cli.main import dbtRunner
from dbt_common.events.base_types import BaseEvent, EventLevel, EventMsg
from dbt.events.types import SpacesInModelNameDeprecation, TotalModelNamesWithSpacesDeprecation
from dbt.tests.util import update_config_file
from typing import Dict, List


@dataclass
class EventCatcher:
event_to_catch: BaseEvent
caught_events: List[EventMsg] = field(default_factory=list)

def catch(self, event: EventMsg):
if event.info.name == self.event_to_catch.__name__:
self.caught_events.append(event)


class TestSpacesInModelNamesHappyPath:
def test_no_warnings_when_no_spaces_in_name(self, project) -> None:
event_catcher = EventCatcher(SpacesInModelNameDeprecation)
runner = dbtRunner(callbacks=[event_catcher.catch])
runner.invoke(["parse"])
assert len(event_catcher.caught_events) == 0


class TestSpacesInModelNamesSadPath:
@pytest.fixture(scope="class")
def models(self) -> Dict[str, str]:
return {
"my model.sql": "select 1 as id",
}

def tests_warning_when_spaces_in_name(self, project) -> None:
event_catcher = EventCatcher(SpacesInModelNameDeprecation)
total_catcher = EventCatcher(TotalModelNamesWithSpacesDeprecation)
runner = dbtRunner(callbacks=[event_catcher.catch, total_catcher.catch])
runner.invoke(["parse"])

assert len(total_catcher.caught_events) == 1
assert len(event_catcher.caught_events) == 1
event = event_catcher.caught_events[0]
assert "Model `my model` has spaces in its name. This is deprecated" in event.info.msg
assert event.info.level == EventLevel.WARN


class TestSpaceInModelNamesWithDebug:
@pytest.fixture(scope="class")
def models(self) -> Dict[str, str]:
return {
"my model.sql": "select 1 as id",
"my model2.sql": "select 1 as id",
}

def tests_debug_when_spaces_in_name(self, project) -> None:
spaces_check_catcher = EventCatcher(SpacesInModelNameDeprecation)
total_catcher = EventCatcher(TotalModelNamesWithSpacesDeprecation)
runner = dbtRunner(callbacks=[spaces_check_catcher.catch, total_catcher.catch])
runner.invoke(["parse"])
assert len(spaces_check_catcher.caught_events) == 1
assert len(total_catcher.caught_events) == 1
assert (
"Spaces in model names found in 2 model(s)" in total_catcher.caught_events[0].info.msg
)
assert (
"Run again with `--debug` to see them all." in total_catcher.caught_events[0].info.msg
)

spaces_check_catcher = EventCatcher(SpacesInModelNameDeprecation)
total_catcher = EventCatcher(TotalModelNamesWithSpacesDeprecation)
runner = dbtRunner(callbacks=[spaces_check_catcher.catch, total_catcher.catch])
runner.invoke(["parse", "--debug"])
assert len(spaces_check_catcher.caught_events) == 2
assert len(total_catcher.caught_events) == 1
assert (
"Run again with `--debug` to see them all."
not in total_catcher.caught_events[0].info.msg
)


class TestAllowSpacesInModelNamesFalse:
@pytest.fixture(scope="class")
def models(self) -> Dict[str, str]:
return {
"my model.sql": "select 1 as id",
}

def test_dont_allow_spaces_in_model_names(self, project):
spaces_check_catcher = EventCatcher(SpacesInModelNameDeprecation)
runner = dbtRunner(callbacks=[spaces_check_catcher.catch])
runner.invoke(["parse"])
assert len(spaces_check_catcher.caught_events) == 1
assert spaces_check_catcher.caught_events[0].info.level == EventLevel.WARN

config_patch = {"flags": {"allow_spaces_in_model_names": False}}
update_config_file(config_patch, project.project_root, "dbt_project.yml")

spaces_check_catcher = EventCatcher(SpacesInModelNameDeprecation)
runner = dbtRunner(callbacks=[spaces_check_catcher.catch])
result = runner.invoke(["parse"])
assert not result.success
assert "Model names cannot contain spaces" in result.exception.__str__()
assert len(spaces_check_catcher.caught_events) == 1
assert spaces_check_catcher.caught_events[0].info.level == EventLevel.ERROR
4 changes: 4 additions & 0 deletions tests/unit/test_events.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,10 @@ def test_event_codes(self):
adapter_types.CollectFreshnessReturnSignature(),
core_types.TestsConfigDeprecation(deprecated_path="", exp_path=""),
core_types.ProjectFlagsMovedDeprecation(),
core_types.SpacesInModelNameDeprecation(model_name="", model_version="", level=""),
core_types.TotalModelNamesWithSpacesDeprecation(
count_invalid_names=1, show_debug_hint=True, level=""
),
# E - DB Adapter ======================
adapter_types.AdapterEventDebug(),
adapter_types.AdapterEventInfo(),
Expand Down
16 changes: 11 additions & 5 deletions tests/unit/test_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,22 @@
from dbt.adapters.factory import reset_adapters, register_adapter
import dbt.compilation
import dbt.exceptions
import dbt.flags
import dbt.parser
import dbt.config
import dbt.utils
import dbt.parser.manifest
from dbt import tracking
from dbt.cli.flags import convert_config
from dbt.contracts.files import SourceFile, FileHash, FilePath
from dbt.contracts.graph.manifest import MacroManifest, ManifestStateCheck
from dbt.contracts.project import ProjectFlags
from dbt.flags import get_flags, set_from_args
from dbt.graph import NodeSelector, parse_difference
from dbt.events.logging import setup_event_logger
from dbt.mp_context import get_mp_context
from queue import Empty
from .utils import config_from_parts_or_dicts, generate_name_macros, inject_plugin

from dbt.flags import set_from_args
from argparse import Namespace

set_from_args(Namespace(WARN_ERROR=False), None)
Expand Down Expand Up @@ -127,9 +127,15 @@ def get_config(self, extra_cfg=None):
cfg.update(extra_cfg)

config = config_from_parts_or_dicts(project=cfg, profile=self.profile)
dbt.flags.set_from_args(Namespace(), ProjectFlags())
setup_event_logger(dbt.flags.get_flags())
object.__setattr__(dbt.flags.get_flags(), "PARTIAL_PARSE", False)
set_from_args(Namespace(), ProjectFlags())
flags = get_flags()
setup_event_logger(flags)
object.__setattr__(flags, "PARTIAL_PARSE", False)
for arg_name, args_param_value in vars(flags).items():
args_param_value = convert_config(arg_name, args_param_value)
object.__setattr__(config.args, arg_name.upper(), args_param_value)
object.__setattr__(config.args, arg_name.lower(), args_param_value)

return config

def get_compiler(self, project):
Expand Down

0 comments on commit f15e128

Please sign in to comment.