Add new planner, JDBC driver, and DDL machinery #72

ryannedolan · 2024-11-27T00:39:59Z

Summary

This PR adds a SQL/DDL layer on top of hoptimator-operator, along with new plugin APIs. For now, we are keeping the existing "adapter" APIs in place, but they will eventually get removed.

Added new CatalogProvider SPI and K8sCatalogProvider plugin.
Added new DeployerProvider SPI for deploying arbitrary objects, including tables and views.
Added new K8sViewDeployer and K8sMaterializedViewDeployer plugins.
Added K8sYamlApi and related machinery.
Added Source/Sink for tables used in a pipeline.
Added TableTemplates and related machinery.
Added K8s Pipeline CRD and related mechanisms.
Added ConnectorProvider SPI.

Motivation

Hoptimator's original architecture involved embedding a SQL planner (Apache Caclite) inside the control plane (hoptimator-operator). In practice, we've found several issues with this architecture:

Validating SQL inside the control plane makes authoring SQL difficult. Errors are only reported in the operator's logs, or in the data plane (e.g. Flink), rather than directly to the author.
The current architecture works well for one tenant, but serving multiple tenants means building and deploying entirely separate instances of the operator.
Extending Hoptimator with "adapters" is difficult. Adapters bundle together both "catalog" and "operator" logic.

To address these concerns, we've pulled the SQL engine out of the control plane into a standalone JDBC driver. The driver is capable of talking directly to Kubernetes, doing most of what hoptimator-operator did. This drastically improves the authoring experience. In particular, most errors can be immediately shown to the author.

Additionally, the JDBC driver is aware of Kubernetes namespaces, enabling true multi-tenancy. Each namespace can have different databases installed, have different data plane configuration, etc. The driver automatically uses whatever Kubernetes namespace it is running in, or whatever namespace kubectl is configured to use.

Details

The core innovation here is the concept of TableTemplates, which are K8s objects that the JDBC driver will reference when planning a pipeline. This mechanism lets you configure data plane connectors (e.g. Flink connectors) for each data source, and lets you attach physical objects that should get deployed as part of a pipeline. For example, you can install a TableTemplate that creates a Kafka READ ACL anytime a Kafka topic is accessed.

This new mechanism replaces the "adapter" model we previously used. Rather than writing, building, and deploying new code, you can created/update/delete TableTemplates with kubectl.

Migration

We plan to keep the "subscription controller" as-is, for now. This means we need to leave the old "adapter" APIs in-place as well. The "subscription controller" will be replaced with a new, much simpler "pipeline controller". Once the new controller reaches parity with the old one, we'll delete the old one, along with all related APIs.

Testing

This PR removes all integration tests, for now. Similar functionality is now unit-tested via "quidem" scripts.

jogrogan · 2024-12-04T18:34:31Z

hoptimator-avro/src/main/java/com/linkedin/hoptimator/avro/AvroConverter.java

+  // TODO support map types
+  // Appears to require a Calcite version bump
+  //    case MAP:
+  //      return createAvroSchemaWithNullability(Schema.createMap(avroPrimitive(dataType.getValueType())), dataType.isNullable());


What does it take to update this? Is there risk?

I think I held back the calcite upgrade so that the calcite version would match the version Flink is using. when we upgrade Flink, we should be able to upgrade calcite and uncomment this. It doesn't really matter whether they're using the same version or not, I just wanted to avoid having two versions of the same dependency.

jogrogan · 2024-12-04T21:27:11Z

hoptimator-jdbc/src/main/java/com/linkedin/hoptimator/jdbc/CompatibilityValidatorBase.java

+        validate(table, originalTable, issues.child(x));
+      }
+    } catch (ClassCastException e) {
+      // nop


I see this ClassCastException being swallowed in a few spots, why is this?

schema.unwrap(CalciteSchema.class) will throw a ClassCastException if the schema is not a CalciteSchema. It's hard to tell if it will throw or not, so I just try and catch. Incidentally, java.sql.Wrapper (which, I assume, is what Calcite's Wrapper is based on) has a isWrapperFor(clazz) method. But I don't think Calcite's has such a test.

hoptimator-jdbc/src/main/java/com/linkedin/hoptimator/jdbc/ForwardCompatibilityValidator.java

jogrogan · 2024-12-04T21:39:42Z

hoptimator-jdbc/src/main/java/com/linkedin/hoptimator/jdbc/HoptimatorDdlExecutor.java

+          RelDataTypeSystem.DEFAULT));
+      MaterializedView hook = new MaterializedView(context, database, viewPath, rowType, sql,
+          Collections.emptyMap());  // TODO support CREATE ... WITH (options...)
+      ValidationService.validateOrThrow(hook, MaterializedView.class);


Is this where schema validation comes into play? Is there a reason to check this if it isn't an update?

Right. I guess our current validations are only for changes to existing tables/views, but the validation API is open-ended. You can validate anything, theoretically. This hook probably isn't doing anything yet, but we might want to drop in materialized view validation logic at some point.

…wardCompatibilityValidator.java Co-authored-by: Joseph Grogan <[email protected]>

ryannedolan requested review from ehoner and jogrogan November 27, 2024 00:39

Add new planner, JDBC driver, and DDL machinery

d54a6cf

ryannedolan force-pushed the bfdb branch from 7d2a3f1 to d54a6cf Compare November 27, 2024 00:44

Disable integration tests in GH workflow

cfd38a1

jogrogan reviewed Dec 4, 2024

View reviewed changes

Update hoptimator-jdbc/src/main/java/com/linkedin/hoptimator/jdbc/For…

978e6b8

…wardCompatibilityValidator.java Co-authored-by: Joseph Grogan <[email protected]>

jogrogan approved these changes Dec 5, 2024

View reviewed changes

ryannedolan merged commit e12144b into main Dec 5, 2024
1 check passed

ryannedolan deleted the bfdb branch December 5, 2024 15:40

This was referenced Dec 10, 2024

Support generating multiple YAML documents from one template #34

Closed

Support multiple input dialects #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new planner, JDBC driver, and DDL machinery #72

Add new planner, JDBC driver, and DDL machinery #72

ryannedolan commented Nov 27, 2024

jogrogan Dec 4, 2024

ryannedolan Dec 4, 2024

jogrogan Dec 4, 2024

ryannedolan Dec 4, 2024

jogrogan Dec 4, 2024

ryannedolan Dec 4, 2024

Add new planner, JDBC driver, and DDL machinery #72

Add new planner, JDBC driver, and DDL machinery #72

Conversation

ryannedolan commented Nov 27, 2024

Summary

Motivation

Details

Migration

Testing

jogrogan Dec 4, 2024

Choose a reason for hiding this comment

ryannedolan Dec 4, 2024

Choose a reason for hiding this comment

jogrogan Dec 4, 2024

Choose a reason for hiding this comment

ryannedolan Dec 4, 2024

Choose a reason for hiding this comment

jogrogan Dec 4, 2024

Choose a reason for hiding this comment

ryannedolan Dec 4, 2024

Choose a reason for hiding this comment