Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CT 2251 model yaml frontmatter #7100

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from
Draft

Conversation

gshank
Copy link
Contributor

@gshank gshank commented Mar 1, 2023

resolves #7099

Description

This is a work in progress, not at all ready for code review. It's here for commenting and further development.

The changes in core/dbt/clients/yaml_helper.py came from a pull request by jakebiesinger (jakebiesinger#1)

Checklist

@gshank gshank requested review from a team as code owners March 1, 2023 20:28
@cla-bot cla-bot bot added the cla:yes label Mar 1, 2023
@gshank gshank marked this pull request as draft March 1, 2023 20:28
@github-actions
Copy link
Contributor

github-actions bot commented Mar 1, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@joellabes
Copy link
Contributor

@gshank A drive-by comment, based on this message:

There's still a fair amount of work to do [...] dealing with config in both the SQL files and a schema file, etc

I don't think we should support this! I think you should only be allowed to have YAML in the SQL file itself or a schema file:

It feels good to me that there is not a new layer of inheritance and override order of operations for people to get their heads around.

☝️ This is provided that if they try, they get a warning that some properties are being ignored, and the winner is chosen in a deterministic way. I don't really care which one wins if one is technically easier than the other, but all else being equal I'd vote for YFM being prioritised over schema.yml

(To reiterate, this is just what I reckon, not an official spec)

@gshank
Copy link
Contributor Author

gshank commented Mar 2, 2023

I wasn't clear enough in my comment... I didn't mean that we'd support doing config in both places, I mean figuring out how to detect that it was happening and issue an error....

@gshank gshank force-pushed the ct-2251-model_yaml_frontmatter branch from 4c1f814 to dc2b73a Compare March 3, 2023 04:34
Copy link
Member

@aranke aranke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use https://github.com/eyeseast/python-frontmatter instead of parsing front matter on our own?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Mar 3, 2023

@aranke It sounds like that package isn't doing a whole lot, and has some undesirable behaviors. Quoting @jakebiesinger in #6853 (reply in thread):

There is https://pypi.org/project/python-frontmatter/ which I'm using in my PR above, but I looked around at the internals and it's literally just regex parsing combined with a documentString.split(regex). This unfortunately accepts frontmatter from anywhere in the doc (not something we can configure, even by implementing a custom handler in that world. While there aren't many deps for that package, the latest release is March 2021...

Honestly, I think I'd rather provide our own stricter and simpler parser, given the above's flexibility + complexity. A few dozen lines of code is not much of a price to pay for parsing exactly how you want IMHO.

@gshank
Copy link
Contributor Author

gshank commented Mar 3, 2023

That was the same conclusion that I came to after looking at it. It's doing more than we want and the actually relevant code is pretty small.

@gshank
Copy link
Contributor Author

gshank commented Mar 4, 2023

This only implements yaml frontmatter for model nodes. Would we want to also support it for other node types?

Copy link
Contributor

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only implements yaml frontmatter for model nodes. Would we want to also support it for other node types?

If we're thinking about this as a proof of concept, model is definitely the node type most worth proving out :)

We could extend this to other node types that are 1:1 with a file (singular tests, analyses), mostly for the sake of consistency. I think resources that are defined as Jinja blocks (snapshots, macros), where you can have multiple in a file, would be tricky. Even though they're one-to-a-file, I think seeds would also be tricky, and probably not the right UX.

Not sure if you also tested with Python models... but it (mostly) "just works"! Which is pretty darn cool.

---
description: "This is my dbt-py model"
config:
  alias: my_cool_alias
---

def model(dbt, session):
    df = dbt.ref("my_model")
    return df

# with the same name as the model file, which won't work. So change the
# directory to model file name + .yaml.
file_path = self.original_file_path
if file_path.endswith(".sql"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Need to check for ".py" here as well, for Python models
  • Maybe it should just write to the same directory that the model is defined in? Need to think a bit more about this. I don't think it's a high-stakes decision one way or the other

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if we put it in the same directory that the model is in we could have collisions with other tests from other models in the same directory.

@jtcohen6
Copy link
Contributor

Assigning myself to take this for another spin :)

@gshank If we decide this is good-to-go from a user/product perspective, how much reworking would you want to do before putting this up for formal review of the code/implementation?

@gshank
Copy link
Contributor Author

gshank commented Mar 22, 2023

If we want to start out by just supporting models, I think it's good to review. There might eventually be some additional work related to some of the upcoming version project relating to checking that this is the only yaml for a group of versions.

@gshank
Copy link
Contributor Author

gshank commented Mar 22, 2023

Oh, I was wondering about the location of the new "yaml_config_dict" key. I put it in the node, but it could go somewhere else like the file object.

def has_yaml_frontmatter(content: str) -> bool:
"""Check first line for yaml frontmatter"""

return content.startswith(FRONTMATTER_CHECK[0]) or content.startswith(FRONTMATTER_CHECK[1])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think content could start with whitespace (including newlines) and this method would return False, while split_yaml_frontmatter would skip over leading whitespace.

Perhaps this would be better?

Suggested change
return content.startswith(FRONTMATTER_CHECK[0]) or content.startswith(FRONTMATTER_CHECK[1])
stripped = content.lstrip()
return stripped.startswith(FRONTMATTER_CHECK[0]) or stripped.startswith(FRONTMATTER_CHECK[1])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content should be stripped already, I think, when it's loaded in read_files. Did you actually see an error? Wouldn't hurt to double up on that though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! I honestly just drove by and looked over your approach. Cool that it's already stripped. It might be worth a test case?

@gshank
Copy link
Contributor Author

gshank commented May 25, 2023

This pull request is currently waiting for a new mashumaro release, in order to filter out the yaml_config_dict attribute in artifacts.

@jtcohen6 jtcohen6 added the Refinement Maintainer input needed label Jun 25, 2023
@callum-mcdata
Copy link
Contributor

@gshank is there any chance this might slip in with 1.7? Been writing a lot more dbt-sql these days and keep coming back to how much I would love yaml frontmatter!

@gshank
Copy link
Contributor Author

gshank commented Oct 10, 2023

Unfortunately we are right on top of code freeze for 1.7. I'll update the branch though, so we can see what's involved.

@codecov
Copy link

codecov bot commented Oct 10, 2023

Codecov Report

Attention: 38 lines in your changes are missing coverage. Please review.

Comparison is base (549dbf3) 86.43% compared to head (825738a) 85.14%.
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7100      +/-   ##
==========================================
- Coverage   86.43%   85.14%   -1.29%     
==========================================
  Files         176      176              
  Lines       26009    26089      +80     
==========================================
- Hits        22480    22213     -267     
- Misses       3529     3876     +347     
Flag Coverage Δ
integration 81.75% <50.00%> (-1.48%) ⬇️
unit 64.77% <36.84%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
core/dbt/parser/partial.py 93.42% <100.00%> (ø)
core/dbt/contracts/graph/nodes.py 93.49% <88.88%> (-1.61%) ⬇️
core/dbt/parser/base.py 92.82% <80.00%> (-0.76%) ⬇️
core/dbt/parser/schemas.py 88.01% <69.23%> (-4.04%) ⬇️
core/dbt/clients/yaml_helper.py 66.07% <36.00%> (-24.56%) ⬇️
core/dbt/parser/models.py 83.13% <30.43%> (-3.93%) ⬇️

... and 33 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@guizsantos
Copy link

Just for the record, I was just thinking about this, came up with exactly the idea to implement YFM and eventually found the disucssion, issue and finally, very gladly, this PR! I would really love to see this getting released and for that, I'd be happy to contribute!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes Refinement Maintainer input needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-2251] Inline model documentation + tests
7 participants