Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new planner, JDBC driver, and DDL machinery #72

Merged
merged 3 commits into from
Dec 5, 2024
Merged

Add new planner, JDBC driver, and DDL machinery #72

merged 3 commits into from
Dec 5, 2024

Conversation

ryannedolan
Copy link
Collaborator

Summary

This PR adds a SQL/DDL layer on top of hoptimator-operator, along with new plugin APIs. For now, we are keeping the existing "adapter" APIs in place, but they will eventually get removed.

  • Added new CatalogProvider SPI and K8sCatalogProvider plugin.
  • Added new DeployerProvider SPI for deploying arbitrary objects, including tables and views.
  • Added new K8sViewDeployer and K8sMaterializedViewDeployer plugins.
  • Added K8sYamlApi and related machinery.
  • Added Source/Sink for tables used in a pipeline.
  • Added TableTemplates and related machinery.
  • Added K8s Pipeline CRD and related mechanisms.
  • Added ConnectorProvider SPI.

Motivation

Hoptimator's original architecture involved embedding a SQL planner (Apache Caclite) inside the control plane (hoptimator-operator). In practice, we've found several issues with this architecture:

  • Validating SQL inside the control plane makes authoring SQL difficult. Errors are only reported in the operator's logs, or in the data plane (e.g. Flink), rather than directly to the author.
  • The current architecture works well for one tenant, but serving multiple tenants means building and deploying entirely separate instances of the operator.
  • Extending Hoptimator with "adapters" is difficult. Adapters bundle together both "catalog" and "operator" logic.

To address these concerns, we've pulled the SQL engine out of the control plane into a standalone JDBC driver. The driver is capable of talking directly to Kubernetes, doing most of what hoptimator-operator did. This drastically improves the authoring experience. In particular, most errors can be immediately shown to the author.

Additionally, the JDBC driver is aware of Kubernetes namespaces, enabling true multi-tenancy. Each namespace can have different databases installed, have different data plane configuration, etc. The driver automatically uses whatever Kubernetes namespace it is running in, or whatever namespace kubectl is configured to use.

Details

The core innovation here is the concept of TableTemplates, which are K8s objects that the JDBC driver will reference when planning a pipeline. This mechanism lets you configure data plane connectors (e.g. Flink connectors) for each data source, and lets you attach physical objects that should get deployed as part of a pipeline. For example, you can install a TableTemplate that creates a Kafka READ ACL anytime a Kafka topic is accessed.

This new mechanism replaces the "adapter" model we previously used. Rather than writing, building, and deploying new code, you can created/update/delete TableTemplates with kubectl.

Migration

We plan to keep the "subscription controller" as-is, for now. This means we need to leave the old "adapter" APIs in-place as well. The "subscription controller" will be replaced with a new, much simpler "pipeline controller". Once the new controller reaches parity with the old one, we'll delete the old one, along with all related APIs.

Testing

This PR removes all integration tests, for now. Similar functionality is now unit-tested via "quidem" scripts.

Comment on lines +52 to +55
// TODO support map types
// Appears to require a Calcite version bump
// case MAP:
// return createAvroSchemaWithNullability(Schema.createMap(avroPrimitive(dataType.getValueType())), dataType.isNullable());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it take to update this? Is there risk?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I held back the calcite upgrade so that the calcite version would match the version Flink is using. when we upgrade Flink, we should be able to upgrade calcite and uncomment this. It doesn't really matter whether they're using the same version or not, I just wanted to avoid having two versions of the same dependency.

validate(table, originalTable, issues.child(x));
}
} catch (ClassCastException e) {
// nop
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this ClassCastException being swallowed in a few spots, why is this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schema.unwrap(CalciteSchema.class) will throw a ClassCastException if the schema is not a CalciteSchema. It's hard to tell if it will throw or not, so I just try and catch. Incidentally, java.sql.Wrapper (which, I assume, is what Calcite's Wrapper is based on) has a isWrapperFor(clazz) method. But I don't think Calcite's has such a test.

RelDataTypeSystem.DEFAULT));
MaterializedView hook = new MaterializedView(context, database, viewPath, rowType, sql,
Collections.emptyMap()); // TODO support CREATE ... WITH (options...)
ValidationService.validateOrThrow(hook, MaterializedView.class);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this where schema validation comes into play? Is there a reason to check this if it isn't an update?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I guess our current validations are only for changes to existing tables/views, but the validation API is open-ended. You can validate anything, theoretically. This hook probably isn't doing anything yet, but we might want to drop in materialized view validation logic at some point.

…wardCompatibilityValidator.java

Co-authored-by: Joseph Grogan <[email protected]>
@ryannedolan ryannedolan merged commit e12144b into main Dec 5, 2024
1 check passed
@ryannedolan ryannedolan deleted the bfdb branch December 5, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants