You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we use a doubly-linked list LRU for tracking the order of how tuples were accessed. When an existing tuple is accessed or modified, we move it from its current location in the LRU to the end of the list. Then when we evict data, we grab a bunch of tuples from the front the list and combine them together in a block.
All though this works well enough and was relatively easy to implement, there are a couple of problems with this approach:
We have to store two 4-byte offsets in each tuple's header (one for the next/prev tuple in the list).
We do not have a global view of the tuple's access patterns.
We have to sample which tuples we will update their position in the LRU at runtime because doing this for every tuple in every txn is expensive.
Per John Ousterhout's recommendation, we will switch to a timestamp-based sampling approach.
Here are the detailed instructions:
We are going to want to keep the original linked-list implementation to make it easier to run experiments. Create a new boolean HStoreConf parameter called site.anticache_timestamps with the default set to false. Then update the EE's build script (build.py) to check for this parameter; if it's set to true, then you can set a new pre-processor directive called ANTICACHE_TIMESTAMPS. You will need to update build.xml to pass site.anticache_timestamps to build.py, and then update buildtools.py to take in the parameter from the commandline and update the BuildContext object. These scripts are used to construct the EE's Makefile.
Next, go into the EE's TableTuple code and modify the header so that it can include a 32-bit timestamp. You will need to modify all parts of the code that pad out the header for the LRU (search for 'ANTICACHE' in the code). Use your new pre-processor directive to set the header to the correct size if it's enabled. You will also need to add new methods to set/get the timestamp in the header that are only enabled if your pre-processor is set to true. You'll want to also disable the LRU chain ones as well (e.g., getNextTupleInChain()).
Find the parts of the code in AntiCacheEvictionManager that invoke TableTuple::setNextTupleInChain(). This is where you are going to want to modify the code to invoke your new TableTuple methods instead of LRU chain. Use the RDTSC cycle counter for the timestamp. Again, you will need to break out the code that uses the the old LRU chain stuff versus the new timestamp stuff.
When it's time to evict, pick a bunch of tuples at random, say 200. Then choose the, say, 50-100 tuples with the oldest timestamps and evict them.
If there are certain tables you don't want to evict, you can either sample in a way that doesn't choose candidates from those tables, or you can just oversample a bit more and ignore tuples that have particular attributes, such as coming from tables you don't want to evict.
This will allow us to also to remove the command-line evictable option (note that this not a HStoreConf parameter). We will want to run experiments that show both the performance and storage overhead difference between the two.
The text was updated successfully, but these errors were encountered:
Currently, we use a doubly-linked list LRU for tracking the order of how tuples were accessed. When an existing tuple is accessed or modified, we move it from its current location in the LRU to the end of the list. Then when we evict data, we grab a bunch of tuples from the front the list and combine them together in a block.
All though this works well enough and was relatively easy to implement, there are a couple of problems with this approach:
Per John Ousterhout's recommendation, we will switch to a timestamp-based sampling approach.
Here are the detailed instructions:
site.anticache_timestamps
with the default set to false. Then update the EE's build script (build.py) to check for this parameter; if it's set to true, then you can set a new pre-processor directive calledANTICACHE_TIMESTAMPS
. You will need to update build.xml to passsite.anticache_timestamps
to build.py, and then update buildtools.py to take in the parameter from the commandline and update theBuildContext
object. These scripts are used to construct the EE's Makefile.getNextTupleInChain()
).TableTuple::setNextTupleInChain()
. This is where you are going to want to modify the code to invoke your new TableTuple methods instead of LRU chain. Use the RDTSC cycle counter for the timestamp. Again, you will need to break out the code that uses the the old LRU chain stuff versus the new timestamp stuff.This will allow us to also to remove the command-line
evictable
option (note that this not a HStoreConf parameter). We will want to run experiments that show both the performance and storage overhead difference between the two.The text was updated successfully, but these errors were encountered: