Active-Active Replication #183

apavlo · 2015-01-27T03:01:45Z

The following is description on how to add active-active replication to H-Store. Note that this will have to be done at the node level (instead of at the partition level), because the internal mechanism for communicating between partitions is based on HStoreSites. This is because the HStoreCoordinator maintains network connections between site instances and not partitions.

The first step is update the internal system catalog to include information about primary and secondary sites. I believe that we should be able to reuse partition ids (e.g., two partitions will use the same id if they are replicas) because this will require no changes to the underlying query routing infrastructure. Likewise, I think that we can reuse the site ids. Modify src/catgen/spec.txt and add a new Site? (i.e., reference) to the Site object called primarySite. This will be a pointer to the Site object at runtime that is considered the master. We may want to add a boolean field to Site called isPrimary just as a quick way to determine whether the Site is the current primary.
The next step is to modify the code that initializes the catalog with the host information to allow us to define replicas. This is what gets invoked when you call ant hstore-prepare from the command-line. There is a utility class called FixCatalog that sets up the host information. This parses the command-line (or a hosts file) and creates a ClusterConfiguration object. We used to check to make sure that the user didn't use the same SiteId more than once, but now we need to allow it. The list of partition ids should also be the same on the replica. An alternative to this is to just allow multiple hostnames (separated by commas) in the host portion each entry and have that be the list of replicas.
Now that the catalog is populated, we need to modify the runtime portion of the system so that each HStoreCoordinator connects to all of the sites when the system starts up. This is done in HStoreCoordinator.initConnections(). Right now we maintain a simple array HStoreCoordinator.channels that maps the SiteId to its corresponding HStoreService RPC handle. These handles are how we send ProtoBuf messages between sites. I think there needs to be a second data structure that maps each unique Site instance to its corresponding HStoreService RPC handle. We could then maintain the regular HStoreCoordinator.channels array to send information from one primary site to another primary site. In our first implementation we would disallow replica Sites from sending messages to other Sites other than its primary. See this documentation on the catalog schema and syntax.
The next thing is to modify the hstoreservice.proto schema to include a new RPC for forwarding a transaction request from the primary to the secondary. The message should be similar to TransactionRedirectRequest except that you will want to include the txnId and base partition that the primary assigned to the request (this enough for now because we will assume that we are just doing single-partition txns). For now the payload will just be the serialized StoredProcedureInvocation that comes over the wire from the client. In the future we may want to send more of the meta-data that the TransactionInitializer computed for the request when it first arrives at the HStoreSite.
After you modify hstoreservice.proto, you will need to generate a new Java source file:
$ ant protorpc.java
This command will automatically build the ProtoBuff compiler and create a new Hstoreservice.java file. Now if you refresh the source code tree in Eclipse you will see that HStoreCoordinator.RemoteServiceHandler now has some errors because it is missing the implementation for the new RPC methods that you just defined. For now you can just add stubs. We'll come back to this step later.
Now that we have a way to send requests from the primary to the replicas, we need to actually do this and add a callback mechanism to make sure that we get back all of the responses that we need from the replicas before we send the result back to the client. To keep things simple for now, we'll do this exactly at the moment that we know our txn is ready to execute at the primary. The main place where we start a new txn is in PartitionExecutor.executeTransaction(). So in here, you'll want to invoke a utility method that you will write called HStoreCoordinator.transactionReplicate(). This method should take in a LocalTransaction handle and a RpcCallback. The RpcCallback can be stored in the LocalTransaction handle and should extend BlockingRpcCallback. This is a special type of callback that will wait to invoke method until it gets a certain number of responses back. In our case we need to wait until we get all the responses from our RPC requests to the replicas before we can send the result out.
Next we have to implement the code at the secondary to recieve the new RPC request from the primary and execute it. Go back to HStoreCoordinator.RemoteServiceHandler and fill in the code for the method that we stubbed out earlier. The code should be almost exactly the same as HStoreCoordinator.RemoteServiceHandler.transactionRedirect() except that you don't want the request to go through the full initialization process because we already have a txnId + basePartition for it. The txn request should get queued at the proper partition using HStoreSite.transactionStart(). You want to also make a fake callback like TransactionRedirectResponseCallback so that we can send the result back to the primary instead of to a non-existent client connection.
The last step is to modify HStoreSite.responseSend() so that we wait on the BlockingRpcCallback created in step Restore Support for VoltDB's Regression Suite #6 above until we get back all the responses from the replicas. Once we have that, then it's safe to release the result back to the client. I'm kind of burnt out writing all this up, so we can have a discussion about how to actually do this part once you get here.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Active-Active Replication #183

Active-Active Replication #183

apavlo commented Jan 27, 2015

Active-Active Replication #183

Active-Active Replication #183

Comments

apavlo commented Jan 27, 2015