Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Spec to Protobuf Conversion Rules and Tooling #677

Open
karenyrx opened this issue Nov 15, 2024 · 0 comments
Open

[PROPOSAL] Spec to Protobuf Conversion Rules and Tooling #677

karenyrx opened this issue Nov 15, 2024 · 0 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@karenyrx
Copy link

What/Why

A critical problem to be solved on the path for gRPC support in OpenSearch, is to have published Protobuf definitions, that are fully aligned with the spec. However, because of the many differences, converting an OpenAPI spec to Protobuf is not always straightforward. Often, it requires some manual intervention and careful consideration of how to best map REST concepts and data structures to the more constrained and different RPC-oriented protobuf/gRPC model. We hope to avoid tedious and error-prone manual conversions, and rather establish a standard set of conversion rules adhering to proto best practices, which can be integrated into automation and reused time after time.

What are you proposing?

  • A standard set of rules of converting the existing OpenAPI specs to Protobufs
  • A way to implement tooling to automate this conversion

Spec to Proto Conversion Rules

API Paths and Operations Translation (namespaces)

OpenAPI is meant only to support JSON/HTTP APIs, thus mappings to protobuf/gRPC ones will have some high-level differences:

  1. Endpoints which support more than 1 HTTP verb (such as /_search ,which supports both GET and POST) will have only 1 corresponding endpoint in GRPC (e.g. rpc Search(..))
  2. Although the spec can define fields that can be put in multiple parts of the HTTP request: either the HTTP endpoint, the HTTP URL params, or the HTTP requestBody, for GRPC, all 3 of these will be put into the same top-level proto message for that request.
  3. HTTP responses status codes do not apply to GRPC, but we can follow these standard mappings from Google, to map HTTP to GRPC status code. Detailed mappings summarized here.

Data Model Translation (schemas)

For translating the data models that describe the API inputs and output, besides following protobuf style guide and best practices, we additionally require some custom rules in order to address ambiguities and edge cases of OpenAPI to proto conversion, which are proposed below.

There are 4 main categories of custom proto conversion rules:

  1. Type translations: How to translate primitives, unstructured objects, null, regex, additionalProperties (map), and other types from openAPI specs to proto.
  2. Wrapping conventions: Determining when enums, oneofs should be nested within a message, when an extra wrapper message should be created (due to lack of proto support of "oneof repeated", etc), when types within allofs or anyofs should be flattened and merged together into 1 message.
  3. Naming conventions: Standardizing naming conventions to use for protobuf messages, enums, maps, extra wrapper layers added, and invalid/untranslatable field names (e.g. with leading underscores).
  4. Metadata: Determining how to carry overs annotations (e.g. x-version-added), descriptions, titles, and metadata from the spec to the protobufs, leveraging proto features such as fieldOptions, adding [required]/[optional] comments to the protobufs, etc.

For more details, refer to the custom proto conversion rules doc.

Example

A simplified example for the IndexSearch endpoint (with some fields omitted for simplicity), using the proto conversion rules would look something like:

service OpenSearch { 
    rpc IndexSearch(IndexSearchRequest) returns (IndexSearchResponse); 
}

message IndexSearchRequest {
   // [required] A list of indices to search for documents. Allowing targeted searches within one or more specified indices.
  repeated string index = 1; 

  // [optional] Whether to include the _source field in the response.
  SourceConfigParam source = 2 [json_name = "_source"];

  // [optional] A list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in `source_includes` query parameter. If the `source` parameter is `false`, this parameter is ignored.
  repeated string source_excludes = 3 [json_name = "_source_excludes"];

  // [optional] A list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the `source_excludes` query parameter. If the `source` parameter is `false`, this parameter is ignored.
  repeated string source_includes = 4 [json_name = "_source_includes"];
  
  // [optional] Search Request body
  IndexSearchRequestBody request_body = 5;
 }

// The response from index search request.
message IndexSearchResponse {
  oneof response {
    OkResponseBody ok_response_body = 1;
    ErrorCancelledResponseBody err_cancelled = 2 ;
    ErrorUnknownResponseBody err_unknown = 3;
    ErrorInvalidArgumentResponseBody err_invalid_arg = 4;
    //...
  }
}

where field 1 is an HTTP endpoint parameter corresponding to the index in the endpoint GET /{index}/_search, fields 2-4 are URL parameters, and field 5 is the http request body.

More comprehensive examples which leverage more of the proto conversion rules, can be expected be provided in a future PR that will be open to feedback or initial comments.

Tooling

The tooling to be built on a high-level should achieve the following:

  1. Adhere to the custom proto conversion rules above.
  2. Maintain backward compatibility with existing protobuf definition (e.g. append-only numbering).
  3. Splitting into different files for both organization, as well as to break up circular dependencies within proto files.

There are no direct off-the-shelf solutions which can fulfill all our requirements, but to prevent fully reinventing the wheel, we propose to reuse the tool OpenAPI Generator as much as we can, and add custom improvements and contributions on top of it, as well as extra pre/post-processing steps. We propose to maintain a fork of this tool within the opensearch-project organization to start with, allowing a balance between flexibility of adding customizations/improvements on this tool as well as the potential to contribute back to its upstream project (the project uses an Apache license). Although its protobuf generator is in Beta version, today it is missing support for many common proto features (e.g. conversions of 'allof' or 'anyof' to protos, as well as avoiding circular dependencies in generated proto files).

Current steps in the tool look something like this:
Screenshot 2024-11-15 at 3 41 18 AM

We can additionally add pre-processing and post-processing steps, as well as introduce customizations to the tool itself in order to automate the proto conversion:
Screenshot 2024-11-15 at 3 40 58 AM

Other Considerations

  1. Versioning: Protobuf versioning will follow the spec version, which until the spec is in a stable state, will be 0.[spec-major-version].[spec-minor-version]. Only the major version will be put in the proto package name. The minor version will be included in the proto as annotations, carried forward from the spec.

  2. CI workflows for auto-conversion: GHA workflows can be built to run the tooling to automate the spec to proto conversion. More details to keep the core<>spec<>proobufs in sync will be detailed in a separate issue.

Planned Milestones / Next steps

With the goal of having a proto package version published along with its matching spec version in this repo in mind, planned steps to achieve that include:

  1. PRs illustrating some examples of the manually converted protos for initial comments
  2. PRs for the basic proto tooling implementation
  3. PRs with fully regenerated protos using the tooling
@dblock dblock added enhancement New feature or request question Further information is requested and removed untriaged labels Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants