Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing the Anti-Cache LRU #155

Open
apavlo opened this issue Jun 16, 2014 · 0 comments
Open

Removing the Anti-Cache LRU #155

apavlo opened this issue Jun 16, 2014 · 0 comments

Comments

@apavlo
Copy link
Owner

apavlo commented Jun 16, 2014

Currently, we use a doubly-linked list LRU for tracking the order of how tuples were accessed. When an existing tuple is accessed or modified, we move it from its current location in the LRU to the end of the list. Then when we evict data, we grab a bunch of tuples from the front the list and combine them together in a block.

All though this works well enough and was relatively easy to implement, there are a couple of problems with this approach:

  1. We have to store two 4-byte offsets in each tuple's header (one for the next/prev tuple in the list).
  2. We do not have a global view of the tuple's access patterns.
  3. We have to sample which tuples we will update their position in the LRU at runtime because doing this for every tuple in every txn is expensive.

Per John Ousterhout's recommendation, we will switch to a timestamp-based sampling approach.

Here are the detailed instructions:

  1. We are going to want to keep the original linked-list implementation to make it easier to run experiments. Create a new boolean HStoreConf parameter called site.anticache_timestamps with the default set to false. Then update the EE's build script (build.py) to check for this parameter; if it's set to true, then you can set a new pre-processor directive called ANTICACHE_TIMESTAMPS. You will need to update build.xml to pass site.anticache_timestamps to build.py, and then update buildtools.py to take in the parameter from the commandline and update the BuildContext object. These scripts are used to construct the EE's Makefile.
  2. Next, go into the EE's TableTuple code and modify the header so that it can include a 32-bit timestamp. You will need to modify all parts of the code that pad out the header for the LRU (search for 'ANTICACHE' in the code). Use your new pre-processor directive to set the header to the correct size if it's enabled. You will also need to add new methods to set/get the timestamp in the header that are only enabled if your pre-processor is set to true. You'll want to also disable the LRU chain ones as well (e.g., getNextTupleInChain()).
  3. Find the parts of the code in AntiCacheEvictionManager that invoke TableTuple::setNextTupleInChain(). This is where you are going to want to modify the code to invoke your new TableTuple methods instead of LRU chain. Use the RDTSC cycle counter for the timestamp. Again, you will need to break out the code that uses the the old LRU chain stuff versus the new timestamp stuff.
  4. When it's time to evict, pick a bunch of tuples at random, say 200. Then choose the, say, 50-100 tuples with the oldest timestamps and evict them.
  5. If there are certain tables you don't want to evict, you can either sample in a way that doesn't choose candidates from those tables, or you can just oversample a bit more and ignore tuples that have particular attributes, such as coming from tables you don't want to evict.

This will allow us to also to remove the command-line evictable option (note that this not a HStoreConf parameter). We will want to run experiments that show both the performance and storage overhead difference between the two.

@apavlo apavlo changed the title Removing the LRU Removing the Anti-Cache LRU Jun 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant