You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apache Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark.
Kyuubi implements the Hive Service RPC module, which provides the same way of accessing data as HiveServer2 and Spark Thrift Server. On the client side,you can build fantastic business reports, BI applications, or even ETL jobs only via the Hive JDBC module.
But at present, there is still a lack of a Restful API component in its API layer, which hinders its scalability. After we encapsulate the implementation of these RPC interfaces into HTTP APIs, it can be expanded more abundantly with a more general protocol and capabilities. In addition, it also facilitates us to build a Web UI for Kyuubi.
The goal of this proposal is to introduce our design of Kyuubi Restful API. It has three sections. First, we will sort out which core domain objects this Restful API will provide management services around, then we will define the restful APIs that manage these core domain objects, and finally discuss how to implement them.
Core domain objects and relationships
In Kyuubi, its most concerned core domain objects are: Session and Operation.
Session
The Session is the same as a conventional system. It maintains a valid life cycle for a specific user. A Session corresponds to a SessionHandle. The SessionHandle is used to uniquely identify a specific Session object, and it can also be understood as an index of Session. All Sessions are managed through SessionManager. SessionManager will hold all the sessions that are connected and provide APIs to manipulate Session objects.
The three objects mentioned above are the basic abstractions of Kyuubi, corresponding to the specific implementations of Kyuubi, namely KyuubiSessionImpl and KyuubiSessionManager.
Operation
Operation is another core object of Kyuubi. As the name suggests, it provides a layer of basic encapsulation of "compatible operations" supported by Hive Service RPC. These common operations include GetCatalogs, GetColumns, and so on. All supported operations are encapsulated in the OperationType enumeration type. Similar to Session, Kyuubi also provides a named OperationHandle to uniquely identify an Operation. All Operations are managed through OperationManager.
The specific implementation of OperationManager corresponding to Kyuubi is KyuubiOperationManager.
API Design
We will define a set of core APIs to manage the core domain objects mentioned in the previous section. These APIs may be incomplete and change with the evolution of Kyuubi. The description of these APIs will follow a unified template. The description of the template is as follows:
URL template
URL
mapping: the mapping interface comes from HiverServer2
desc: describe the purpose of this API
method: HTTP method, enum values, GET, POST, PUT, DELETE
params: request parameters
param key1: value1
param key2: value2
param keyN: valueN
returns: describe the return information
Session API
/${version}/sessions
mapping: none
desc: get all the session list hosted in SessionManager
method: GET
params: none
returns session list overview of Session
/${version}/sessions
mapping: ICLIService#openSession
desc: open(create) a Session
method: POST
params: required params to create a Session
returns: an instance of SessionHandle
/${version}/sessions/${identifier}
mapping: ICLIService#closeSession
desc: close a Session
method: DELETE
params: none
returns: status code(success or error)
/${version}/sessions/${identifier}
desc: get a session via session handle identifier
method: GET
param:identifier instance of SessionHandle
returns: an instance overview of Session
/${version}/sessions/infotype
mapping: None
desc: get all supported info types by Kyuubi session
From the perspective of convenience, we intend to upgrade the Statement to a top-level resource to provide some potential heavy users with the convenience of executing SQL statements. In principle, Statement should be a three-level resource under Session and Operation. But considering that the execution of SQL statements is a very high-frequency requirement. If Statement is used as a three-level resource, it will bring some problems:
Complexity at the level of interaction and operation:
performance may be reduced with bringing more communication;
To this end, we decided to upgrade Statement to a first-level resource, and implicitly handle Session and Operation on the server-side.
Implementation
In this section, we describe the implementation of Kyuubi RESTful service.
Technical selection
RESTful API services depend on the basic capabilities of HTTP services and the representation and definition of resources.
Regarding HTTP services, after researching and discussing them in the community, we finally decided to use Jetty1 as the basic framework for providing HTTP services. For more discussion, please see here 2. Jetty is a lightweight Servlet container framework. So far it has been quite mature and widely used in many open-source projects to provide HTTP service.
Regarding the representation and definition of resources, we use the annotations provided by the Jakarta RESTful web service3 to provide metadata for mapping relationships and resource descriptions. Jakarta provides ServletContainer to construct and instantiate ServletContextHandler so that these two frameworks can be seamlessly integrated.
Design
Here, we regard the service providing RESTful API as a frontend service based on the HTTP protocol, which is analogous to the current FrontendService implementation based on the Thrift protocol.
Let's briefly introduce the implementation of frontend service and backend service in Kyuubi. There is currently a FrontendService in Kyuubi, which is actually a relay agent of TCLIService. It is also used by KyuubiServer and SparkSQLEngine, running in two different processes. And these two processes(KyuubiServer, SparkSQLEngine) use different BackendService, the former uses KyuubiBackendService, and the latter uses SparkSQLBackendService. The relationship diagram is as follows:
In order to introduce a new frontend service based on HTTP protocol and keep the overall architecture and source code clear, we renamed FrontendService to ThriftFrontendService. And introduce a new RestFrontendService.
Based on the above design, the Servable that hosts frontend service and backend service needs to introduce the abstraction of Frontend Service so that it can initialize different instances in different server processes. So we introduced AbstractFrontendService as an abstract implementation of ThriftFrontendService and RestFrontendService. At the same time, AbstractFrontendService inherits from CompositeService so that multiple fronted services can be simultaneously enabled in a special server process. The approximate class diagram is as follows:
After the redesign, the class diagram loaded at runtime is as follows:
In the end, the model diagram of Kyuubi's interaction with end-users is as follows:
The above is a fine-tuning of Kyuubi's existing design and the introduction of RestFrontendService. Next, we will introduce the detailed design of the RESRful frontend service.
For all the requests of the resources, we introduce a context object called ApiRequestContext. It holds ServletContext and HttpServletRequest. Some pre-defined resources are listed below:
SessionListResource: handles all the requests about multiple sessions;
OneSessionResource: handles all the requests about one session with a given session handle.
OperationListResource: handles all the requests about multiple operations;
OneOperationResource: handles all the requests about one operation with a given operation handle.
ApiRootResource: works as a root resource handler for “api”
The UML class diagram shows as follow:
In addition, we will define entity classes for request and response and use Jackson tools to marshal and unmarshal JSON.
Test
We need to consider two levels of testing: HTTP request/response and business logic processing of resource requests.
Regarding the first point, we consider designing an HttpSuite and include some basic tests, for example:
visibility
jetty selects different ports under contention
jetty with HTTPS selects different port under contention
jetty binds to port 0 correctly
jetty with HTTPS binds to port 0 correctly
verify web URL contains the schema
verify web URL contains the port
And possibly more scenes.
Regarding the testing of business logic, we will rely on basic testing and cover the processing logic of these APIs as much as possible.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Motivation
Apache Kyuubi is a distributed multi-tenant JDBC server for large-scale data processing and analytics, built on top of Apache Spark.
Kyuubi implements the Hive Service RPC module, which provides the same way of accessing data as HiveServer2 and Spark Thrift Server. On the client side,you can build fantastic business reports, BI applications, or even ETL jobs only via the Hive JDBC module.
But at present, there is still a lack of a Restful API component in its API layer, which hinders its scalability. After we encapsulate the implementation of these RPC interfaces into HTTP APIs, it can be expanded more abundantly with a more general protocol and capabilities. In addition, it also facilitates us to build a Web UI for Kyuubi.
The goal of this proposal is to introduce our design of Kyuubi Restful API. It has three sections. First, we will sort out which core domain objects this Restful API will provide management services around, then we will define the restful APIs that manage these core domain objects, and finally discuss how to implement them.
Core domain objects and relationships
In Kyuubi, its most concerned core domain objects are: Session and Operation.
Session
The
Session
is the same as a conventional system. It maintains a valid life cycle for a specific user. ASession
corresponds to aSessionHandle
. TheSessionHandle
is used to uniquely identify a specific Session object, and it can also be understood as an index ofSession
. All Sessions are managed throughSessionManager
.SessionManager
will hold all the sessions that are connected and provide APIs to manipulateSession
objects.The three objects mentioned above are the basic abstractions of Kyuubi, corresponding to the specific implementations of Kyuubi, namely
KyuubiSessionImpl
andKyuubiSessionManager
.Operation
Operation
is another core object of Kyuubi. As the name suggests, it provides a layer of basic encapsulation of "compatible operations" supported by Hive Service RPC. These common operations includeGetCatalogs
,GetColumns
, and so on. All supported operations are encapsulated in theOperationType
enumeration type. Similar to Session, Kyuubi also provides a namedOperationHandle
to uniquely identify anOperation
. AllOperation
s are managed throughOperationManager
.The specific implementation of
OperationManager
corresponding to Kyuubi isKyuubiOperationManager
.API Design
We will define a set of core APIs to manage the core domain objects mentioned in the previous section. These APIs may be incomplete and change with the evolution of Kyuubi. The description of these APIs will follow a unified template. The description of the template is as follows:
Session API
/${version}/sessions
/${version}/sessions
/${version}/sessions/${identifier}
/${version}/sessions/${identifier}
/${version}/sessions/infotype
/${version}/sessions/${identifier}/info/${infotype}
/${version}/sessions/count
/${version}/sessions/statistic
/${version}/sessions/${identifier}/delegationtoken?owner=${owner}&renewer=${renewer}
/${version}/sessions/${identifier}/delegationtoken
/${version}/sessions/${identifier}/log?offset=${offset}&length=${length}
Operation API
Statement API
From the perspective of convenience, we intend to upgrade the Statement to a top-level resource to provide some potential heavy users with the convenience of executing SQL statements. In principle, Statement should be a three-level resource under Session and Operation. But considering that the execution of SQL statements is a very high-frequency requirement. If Statement is used as a three-level resource, it will bring some problems:
To this end, we decided to upgrade Statement to a first-level resource, and implicitly handle Session and Operation on the server-side.
Implementation
In this section, we describe the implementation of Kyuubi RESTful service.
Technical selection
RESTful API services depend on the basic capabilities of HTTP services and the representation and definition of resources.
Regarding HTTP services, after researching and discussing them in the community, we finally decided to use Jetty1 as the basic framework for providing HTTP services. For more discussion, please see here 2. Jetty is a lightweight Servlet container framework. So far it has been quite mature and widely used in many open-source projects to provide HTTP service.
Regarding the representation and definition of resources, we use the annotations provided by the Jakarta RESTful web service3 to provide metadata for mapping relationships and resource descriptions. Jakarta provides
ServletContainer
to construct and instantiateServletContextHandler
so that these two frameworks can be seamlessly integrated.Design
Here, we regard the service providing RESTful API as a frontend service based on the HTTP protocol, which is analogous to the current
FrontendService
implementation based on the Thrift protocol.Let's briefly introduce the implementation of frontend service and backend service in Kyuubi. There is currently a
FrontendService
in Kyuubi, which is actually a relay agent ofTCLIService
. It is also used byKyuubiServer
andSparkSQLEngine
, running in two different processes. And these two processes(KyuubiServer
,SparkSQLEngine
) use differentBackendService
, the former usesKyuubiBackendService
, and the latter usesSparkSQLBackendService
. The relationship diagram is as follows:In order to introduce a new frontend service based on HTTP protocol and keep the overall architecture and source code clear, we renamed
FrontendService
toThriftFrontendService
. And introduce a newRestFrontendService
.Based on the above design, the
Servable
that hosts frontend service and backend service needs to introduce the abstraction of Frontend Service so that it can initialize different instances in different server processes. So we introducedAbstractFrontendService
as an abstract implementation ofThriftFrontendService
andRestFrontendService
. At the same time,AbstractFrontendService
inherits fromCompositeService
so that multiple fronted services can be simultaneously enabled in a special server process. The approximate class diagram is as follows:After the redesign, the class diagram loaded at runtime is as follows:
In the end, the model diagram of Kyuubi's interaction with end-users is as follows:
The above is a fine-tuning of Kyuubi's existing design and the introduction of
RestFrontendService
. Next, we will introduce the detailed design of the RESRful frontend service.For all the requests of the resources, we introduce a context object called
ApiRequestContext
. It holdsServletContext
andHttpServletRequest
. Some pre-defined resources are listed below:SessionListResource
: handles all the requests about multiple sessions;OneSessionResource
: handles all the requests about one session with a given session handle.OperationListResource
: handles all the requests about multiple operations;OneOperationResource
: handles all the requests about one operation with a given operation handle.ApiRootResource
: works as a root resource handler for “api”The UML class diagram shows as follow:
In addition, we will define entity classes for request and response and use Jackson tools to marshal and unmarshal JSON.
Test
We need to consider two levels of testing: HTTP request/response and business logic processing of resource requests.
Regarding the first point, we consider designing an HttpSuite and include some basic tests, for example:
And possibly more scenes.
Regarding the testing of business logic, we will rely on basic testing and cover the processing logic of these APIs as much as possible.
Beta Was this translation helpful? Give feedback.
All reactions