Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard] Support arbitrary non-aggregate expressions in dimensions #3702

Merged
merged 15 commits into from
Dec 22, 2023

Conversation

AdityaHegde
Copy link
Collaborator

@AdityaHegde AdityaHegde commented Dec 14, 2023

closes #3009

  • Add expression to dashboard yaml.
  • Update all queries and test for dimension expression.
  • Support unnest with dimension expressions.
  • Test on driud.
  • Introduce proper validation.

@AdityaHegde AdityaHegde self-assigned this Dec 14, 2023
@AdityaHegde AdityaHegde changed the title Adityahegde/dimension expressions [Dashboard] Support arbitrary non-aggregate expressions in dimensions Dec 14, 2023
@AdityaHegde AdityaHegde marked this pull request as ready for review December 20, 2023 12:42
@@ -297,7 +297,7 @@ func columnIdentifierExpression(mv *runtimev1.MetricsViewSpec, aliases []*runtim
// check if identifier is a dimension
for _, dim := range mv.Dimensions {
if dim.Name == name {
return safeName(metricsViewDimensionColumn(dim)), true
return metricsViewDimensionExpression(dim), true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get templated into the where clause:

  1. Might it be problematic to put a free form expression in here? I think at the least it will need to be wrapped in parentheses
  2. Would it be better to template in just the dimension name (which the expression is aliased to in the SELECT clause)? But not sure if that works on Druid

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this is for queries where the dimension is not selected we would have issues with the column not being there. The other option is to only add it if is in the select caluse. That made these builder functions too complex.

As for adding parentheses, it gets added when the identifier is actually used.

I guess we are missing a validation to make sure there isnt sql injection happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The case where the dim is not selected makes sense. Agree on not making it conditional on the select clause, seems to complicated.
  2. About the parentheses, imagine if the expression is col1 = 10 and the filter is dim1 = true, then you're saying the parenthesis will be added such that the where clause becomes (col1 = 10) = true (correct) and not col1 = 10 = true (which would error)?
  3. How could SQL injection happen? For the where filters, I thought we were doing lookups to resolve identifiers and using ? and args to pass values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes all binary expressions have it. like and in do not seem to have it. Lemme add that.
  2. It could happen in the dimension expression, count(*)";drop table blah; -- or something like that.

runtime/queries/metricsview.go Outdated Show resolved Hide resolved
runtime/queries/metricsview.go Outdated Show resolved Hide resolved
runtime/reconcilers/metrics_view.go Outdated Show resolved Hide resolved
runtime/reconcilers/metrics_view.go Outdated Show resolved Hide resolved
runtime/reconcilers/metrics_view.go Outdated Show resolved Hide resolved
runtime/reconcilers/metrics_view.go Outdated Show resolved Hide resolved
@AdityaHegde AdityaHegde force-pushed the adityahegde/dimension-expressions branch from dea5101 to b2a89f5 Compare December 21, 2023 09:20
Comment on lines 36 to +37
Column string
Expression string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't have validation to ensure that only one of these are set (exactly one of them should be non-nil)

@@ -297,7 +297,7 @@ func columnIdentifierExpression(mv *runtimev1.MetricsViewSpec, aliases []*runtim
// check if identifier is a dimension
for _, dim := range mv.Dimensions {
if dim.Name == name {
return safeName(metricsViewDimensionColumn(dim)), true
return metricsViewDimensionExpression(dim), true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The case where the dim is not selected makes sense. Agree on not making it conditional on the select clause, seems to complicated.
  2. About the parentheses, imagine if the expression is col1 = 10 and the filter is dim1 = true, then you're saying the parenthesis will be added such that the where clause becomes (col1 = 10) = true (correct) and not col1 = 10 = true (which would error)?
  3. How could SQL injection happen? For the where filters, I thought we were doing lookups to resolve identifiers and using ? and args to pass values?

Comment on lines 157 to 159
if d.Expression == "" {
if d.Column == "" {
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be an error? (And ideally checked also at parser level)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. I guess we will never hit this without the parser running and dont need backwards compatibility right?

}

err := olap.Exec(ctx, &drivers.Statement{
Query: fmt.Sprintf("SELECT %s FROM %s GROUP BY 1", d.Expression, safeSQLName(t.Name)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put parentheses around the expression, i.e. SELECT (%s) ...? In case the expression is a sub-query

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might this also be needed elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes adding them

@AdityaHegde AdityaHegde force-pushed the adityahegde/dimension-expressions branch from 2c8d821 to 82c2e40 Compare December 21, 2023 12:06
Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@begelundmuller begelundmuller merged commit 018ae8a into main Dec 22, 2023
7 checks passed
@begelundmuller begelundmuller deleted the adityahegde/dimension-expressions branch December 22, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Runtime: Investigate changing column to expression for dashboard dimensions
2 participants