You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following is description on how to add active-active replication to H-Store. Note that this will have to be done at the node level (instead of at the partition level), because the internal mechanism for communicating between partitions is based on HStoreSites. This is because the HStoreCoordinator maintains network connections between site instances and not partitions.
The first step is update the internal system catalog to include information about primary and secondary sites. I believe that we should be able to reuse partition ids (e.g., two partitions will use the same id if they are replicas) because this will require no changes to the underlying query routing infrastructure. Likewise, I think that we can reuse the site ids. Modify src/catgen/spec.txt and add a new Site? (i.e., reference) to the Site object called primarySite. This will be a pointer to the Site object at runtime that is considered the master. We may want to add a boolean field to Site called isPrimary just as a quick way to determine whether the Site is the current primary.
The next step is to modify the code that initializes the catalog with the host information to allow us to define replicas. This is what gets invoked when you call ant hstore-prepare from the command-line. There is a utility class called FixCatalog that sets up the host information. This parses the command-line (or a hosts file) and creates a ClusterConfiguration object. We used to check to make sure that the user didn't use the same SiteId more than once, but now we need to allow it. The list of partition ids should also be the same on the replica. An alternative to this is to just allow multiple hostnames (separated by commas) in the host portion each entry and have that be the list of replicas.
Now that the catalog is populated, we need to modify the runtime portion of the system so that each HStoreCoordinator connects to all of the sites when the system starts up. This is done in HStoreCoordinator.initConnections(). Right now we maintain a simple array HStoreCoordinator.channels that maps the SiteId to its corresponding HStoreService RPC handle. These handles are how we send ProtoBuf messages between sites. I think there needs to be a second data structure that maps each unique Site instance to its corresponding HStoreService RPC handle. We could then maintain the regular HStoreCoordinator.channels array to send information from one primary site to another primary site. In our first implementation we would disallow replica Sites from sending messages to other Sites other than its primary. See this documentation on the catalog schema and syntax.
The next thing is to modify the hstoreservice.proto schema to include a new RPC for forwarding a transaction request from the primary to the secondary. The message should be similar to TransactionRedirectRequest except that you will want to include the txnId and base partition that the primary assigned to the request (this enough for now because we will assume that we are just doing single-partition txns). For now the payload will just be the serialized StoredProcedureInvocation that comes over the wire from the client. In the future we may want to send more of the meta-data that the TransactionInitializer computed for the request when it first arrives at the HStoreSite.
After you modify hstoreservice.proto, you will need to generate a new Java source file: $ ant protorpc.java
This command will automatically build the ProtoBuff compiler and create a new Hstoreservice.java file. Now if you refresh the source code tree in Eclipse you will see that HStoreCoordinator.RemoteServiceHandler now has some errors because it is missing the implementation for the new RPC methods that you just defined. For now you can just add stubs. We'll come back to this step later.
Now that we have a way to send requests from the primary to the replicas, we need to actually do this and add a callback mechanism to make sure that we get back all of the responses that we need from the replicas before we send the result back to the client. To keep things simple for now, we'll do this exactly at the moment that we know our txn is ready to execute at the primary. The main place where we start a new txn is in PartitionExecutor.executeTransaction(). So in here, you'll want to invoke a utility method that you will write called HStoreCoordinator.transactionReplicate(). This method should take in a LocalTransaction handle and a RpcCallback. The RpcCallback can be stored in the LocalTransaction handle and should extend BlockingRpcCallback. This is a special type of callback that will wait to invoke method until it gets a certain number of responses back. In our case we need to wait until we get all the responses from our RPC requests to the replicas before we can send the result out.
Next we have to implement the code at the secondary to recieve the new RPC request from the primary and execute it. Go back to HStoreCoordinator.RemoteServiceHandler and fill in the code for the method that we stubbed out earlier. The code should be almost exactly the same as HStoreCoordinator.RemoteServiceHandler.transactionRedirect() except that you don't want the request to go through the full initialization process because we already have a txnId + basePartition for it. The txn request should get queued at the proper partition using HStoreSite.transactionStart(). You want to also make a fake callback like TransactionRedirectResponseCallback so that we can send the result back to the primary instead of to a non-existent client connection.
The last step is to modify HStoreSite.responseSend() so that we wait on the BlockingRpcCallback created in step Restore Support for VoltDB's Regression Suite #6 above until we get back all the responses from the replicas. Once we have that, then it's safe to release the result back to the client. I'm kind of burnt out writing all this up, so we can have a discussion about how to actually do this part once you get here.
The text was updated successfully, but these errors were encountered:
The following is description on how to add active-active replication to H-Store. Note that this will have to be done at the node level (instead of at the partition level), because the internal mechanism for communicating between partitions is based on
HStoreSites
. This is because theHStoreCoordinator
maintains network connections between site instances and not partitions.src/catgen/spec.txt
and add a newSite?
(i.e., reference) to theSite
object calledprimarySite
. This will be a pointer to theSite
object at runtime that is considered the master. We may want to add a boolean field toSite
calledisPrimary
just as a quick way to determine whether the Site is the current primary.ant hstore-prepare
from the command-line. There is a utility class called FixCatalog that sets up the host information. This parses the command-line (or a hosts file) and creates a ClusterConfiguration object. We used to check to make sure that the user didn't use the same SiteId more than once, but now we need to allow it. The list of partition ids should also be the same on the replica. An alternative to this is to just allow multiple hostnames (separated by commas) in the host portion each entry and have that be the list of replicas.HStoreCoordinator
connects to all of the sites when the system starts up. This is done inHStoreCoordinator.initConnections()
. Right now we maintain a simple arrayHStoreCoordinator.channels
that maps the SiteId to its correspondingHStoreService
RPC handle. These handles are how we send ProtoBuf messages between sites. I think there needs to be a second data structure that maps each unique Site instance to its correspondingHStoreService
RPC handle. We could then maintain the regularHStoreCoordinator.channels
array to send information from one primary site to another primary site. In our first implementation we would disallow replica Sites from sending messages to other Sites other than its primary. See this documentation on the catalog schema and syntax.TransactionRedirectRequest
except that you will want to include the txnId and base partition that the primary assigned to the request (this enough for now because we will assume that we are just doing single-partition txns). For now the payload will just be the serializedStoredProcedureInvocation
that comes over the wire from the client. In the future we may want to send more of the meta-data that theTransactionInitializer
computed for the request when it first arrives at theHStoreSite
.hstoreservice.proto
, you will need to generate a new Java source file:$ ant protorpc.java
This command will automatically build the ProtoBuff compiler and create a new Hstoreservice.java file. Now if you refresh the source code tree in Eclipse you will see that
HStoreCoordinator.RemoteServiceHandler
now has some errors because it is missing the implementation for the new RPC methods that you just defined. For now you can just add stubs. We'll come back to this step later.PartitionExecutor.executeTransaction()
. So in here, you'll want to invoke a utility method that you will write calledHStoreCoordinator.transactionReplicate()
. This method should take in aLocalTransaction
handle and aRpcCallback
. TheRpcCallback
can be stored in theLocalTransaction
handle and should extendBlockingRpcCallback
. This is a special type of callback that will wait to invoke method until it gets a certain number of responses back. In our case we need to wait until we get all the responses from our RPC requests to the replicas before we can send the result out.HStoreCoordinator.RemoteServiceHandler
and fill in the code for the method that we stubbed out earlier. The code should be almost exactly the same asHStoreCoordinator.RemoteServiceHandler.transactionRedirect()
except that you don't want the request to go through the full initialization process because we already have a txnId + basePartition for it. The txn request should get queued at the proper partition usingHStoreSite.transactionStart()
. You want to also make a fake callback likeTransactionRedirectResponseCallback
so that we can send the result back to the primary instead of to a non-existent client connection.HStoreSite.responseSend()
so that we wait on theBlockingRpcCallback
created in step Restore Support for VoltDB's Regression Suite #6 above until we get back all the responses from the replicas. Once we have that, then it's safe to release the result back to the client. I'm kind of burnt out writing all this up, so we can have a discussion about how to actually do this part once you get here.The text was updated successfully, but these errors were encountered: