In my previous blog post I discussed how we converted from a file based caching solution to one based on Couchbase. In this blog post I am going discuss the system we built to group cache entries together into logical groups, and the mechanism used to quickly invalidate all cache entries for a particular group using sentinels rather than lists.

First Version: Lists of Cached Items

The original file based solution simply managed groups of cache values as part of a cache block, and because the groups were structured as directories on disk, it was easy enough to invalidate all cache entries for a group simply by recursing over the directory and deleting all the cache files. Hence the first version of our Couchbase cache store class was implemented very similarly. When an item was added to the cache, it was also added to a list of all cached items store in a meta data object for the cache group. Then when a group of cached items needed to be invalidated, we would read the list of items from the cache and then iterate over it deleting them from the Couchbase cache.

The initial implementation of the Read and Write functions looked like this:

/// <summary>
/// Adds a value to a dictionary of type <TKey, TValue> with the given key and value using a
/// Check-And-Set operation and retries. Note that if the item exists in the list, the value
/// is updated otherwise a new entry is added.
/// </summary>
/// <typeparam name="TKey">Type of the key for the items in the list</typeparam>
/// <typeparam name="TValue">Type of the value for the items in the list</typeparam>
/// <param name="client">Couchbase client</param>
/// <param name="key">Key used to identify the hash table</param>
/// <param name="entryKey">Key for the entry to be stored in the hash table</param>
/// <param name="entryValue">Value for the entry to be stored in the hash table</param>
private void AddToHashWithCas<TKey, TValue>(
    ICouchbaseClient client,
    string key,
    TKey entryKey,
    TValue entryValue)
{
    // Now keep track of all the items in this cache group
    CasResult<bool> casResult;
    do {
        // Try to read the existing list (or create one if it does not exist)
        var result = client.GetWithCas<Dictionary<TKey, TValue>>(key);
        var items = result.Result ?? new Dictionary<TKey, TValue>();

        // See if there is an existing entry
        if (items.ContainsKey(entryKey)) {
            // Update the value if it exists already
            items[entryKey] = entryValue;
        } else {
            // Add a new entry if it does not exist
            items.Add(entryKey, entryValue);
        }

        // Attempt to save the items using a Check-And-Set operation
        casResult = client.Cas(StoreMode.Set, key, items, result.Cas);
    } while (casResult.Result == false);
}

/// <summary>
/// Writes out a serialized object to the cache
/// </summary>
/// <param name="o">Object to write to the cache</param>
/// <param name="group">Cache group that this item belongs to</param>
/// <param name="key">Key name for this cache item</param>
/// <param name="expiresAt">Optional date/time stamp that the cached item will expire at</param>
public void Write(
    object o,
    string group,
    string key,
    DateTime? expiresAt)
{
    // Get access to the cache
    var client = GT.CBCache;

    // Determine the full key name for this cache item
    var fullKey = group + '/' + key;
    var lockKey = fullKey + ".lock";

    // Store the item in the cache
    if (expiresAt == null) {
        client.Store(StoreMode.Set, fullKey, o);
    } else {
        client.Store(StoreMode.Set, fullKey, o, (DateTime)expiresAt);
    }

    // Now keep track of all the items in this cache group
    AddToHashWithCas<string, DateTime?>(client, group, fullKey, expiresAt);

    // Clear the lock
    client.Store(StoreMode.Set, lockKey, false);
}

/// <summary>
/// Reads a serialized object from the cache
/// </summary>
/// <param name="o">Place to return the object read from the cache</param>
/// <param name="group">Cache group that this item belongs to</param>
/// <param name="key">Key name for this cache item</param>
/// <returns>True if the item was read from the cache, false if not</returns>
public bool Read(
    out object o,
    string group,
    string key)
{
    // Get access to the cache
    var client = GT.CBCache;

    // Determine the keys we need. The key of the item we key off the primary group
    var fullKey = group + '/' + key;
    var lockKey = fullKey + ".lock";

    // Loop around trying to read the cache until either it is valid, or we get the write lock for the cache. This way
    // we guarantee that we will only ever have one writer actually generating the cache data at a time.
    CasResult<bool> casResult;
    CasResult<bool> locked = new CasResult<bool>();
    do {
        // Loop until the lock is clear in case another instance is writing the cache. We time out after 10 seconds
        for (int i = 0; i < TimeoutLoops; i++) {
            // Check if someone has the lock
            locked = client.GetWithCas<bool>(lockKey);
            if (!locked.Result) {
                // Not locked for writing, so we are good to go
                break;
            }

            // Sleep for 100 milliseconds to allow the lock to get cleared
            System.Threading.Thread.Sleep(TimeoutSleep);
        }

        // Read the item from the cache and return it if found
        if ((o = client.Get(fullKey)) != null) {
            return true;
        }

        // Check if we got the lock. If not then someone has the lock and we timed out, so just clear the lock
        // and move on. This should not happen, but if it does it is OK.
        if (locked.Result) {
            client.Store(StoreMode.Set, lockKey, false);
            return false;
        }

        // Attempt to set the lock using a Check-And-Set operation
        casResult = client.Cas(StoreMode.Set, lockKey, true, locked.Cas);
    } while (casResult.Result == false);

    // Return false indicating that the cache is invalid and we have the write lock
    return false;
}

/// <summary>
/// Resets all the cache entries for this specific cache group
/// </summary>
/// <param name="groupKey">Cache group to clear</param>
public void ClearGroup(
    ICouchbaseClient client,
    string groupKey)
{
    Dictionary<string, DateTime?> items;
    CasResult<bool> casResult;
    do {
        // Try to read the existing list (or create one if it does not exist)
        var result = client.GetWithCas<Dictionary<string, DateTime?>>(groupKey);
        items = result.Result ?? new Dictionary<string, DateTime?>();

        // Clear the list using a Check-And-Set operation
        casResult = client.Cas(StoreMode.Set, groupKey, new Dictionary<string, DateTime?>(), result.Cas);
    } while (casResult.Result == false);

    // Now loop through and kill off all the cache entries referenced in the list. Although
    // it is possible someone might have added a new cache entry to the list while this is
    // being processed, as long as we clear each one each cache entry will end up being
    // correctly refreshed
    foreach (var item in items) {
        client.Remove(item.Key);
    }
}

Using these functions is pretty simple. Code that wants to cache something will first call the Read() function and attempt to read an existing cache block from the cache and if that fails, it will then generate the cached block data, and call the write function:

string group = "myGroup";
string key = "myItem";
object o;
if (!Read(out o, group, key)) {
    o = "Something to cache";
    Write(o, group, key, DateTime.Now.AddMinutes(5));
}

// Do something with object o

If the Read() function finds a valid cache entry, it simply returns the result. If the Read() function fails, it will grab the lock for the cached item using a Check And Set operation in Couchbase, and then return false. The caller then generates the content to be cached, and calls the Write() function. This function then writes the cached data to the cache, adds a reference to the cached item to the list of all cache items for this group and then clears the lock. This ensures that no other code will be generating the same cached item at the same time, so if an item expires any other threads needing that same cached item will end up blocking until the locks is cleared, at which time they will pick up the newly generated cache item.

Although it may seem inefficient to make all the other threads block and wait for the first thread to generate the cached item, in practice this turns out to be more efficient than letting all the threads attempt to generate the cached item at the same time. If for instance the cached item is generated using expensive calls to a SQL database, and suddenly 100 threads all need the same cached item, the SQL server is going to be able to generate the content for 1 thread making a set of expensive SQL calls much faster than having 100 threads all make the same SQL calls in parallel.

Also it is worth pointing out that in this implementation we use a hash table to store the list of items in a cache block rather than a list, primarily because our caches end up with thousands of items per group and using a hash table makes maintaining the list much more efficient than using a link list (which would have to do a linear search to see if the item was already in the list).

Although the code above works fine, the process of invalidating cache entries is still time consuming because the ClearGroup() function has to iterate through all the items in the group and remove them from the cache store, which can take some time with thousands of items in the group. Also the cache items in the group may be already getting generated while the clear operation is in progress, since we don’t block them from being generate during clears.

On top of that we can only group cache entries together by placing them in a single group. It would be nice if there was a way to be able to make a cache entry belong to multiple groups, giving you more control over when an item is invalidated in the cache. And it would also be nice to be able to improve the performance of the clear operation. Thankfully all of this can be done using the concept of Sentinels!

Second Version: Using Sentinels to Avoid Lists

So how does a sentinel work? The basic concept is that rather than storing references to all the cache items in a list that represent a group of cache items, we instead store a list of prior sentinel values as meta data long with the cached item. In essence we are flipping things upside down, and don’t store what items are in a group, but rather what groups an item belongs to. When we need to read a cached item, we read a list of prior sentinel values stored with the item, and compare it to all the current sentinel values. If any of the values do not match, we consider the item invalidated and we ignore it even if it still exists in the cache.

What is great about using this is that the overhead of managing lists of items in a group is gone, because we no longer track what items belong to a group. Instead we have a much smaller overhead to keep track of the groups that an item might belong to. But more importantly, we can very quickly invalidate all items in a group simply by doing a fast, atomic Couchbase Increment operation on a sentinel. This will instantly invalidate any items that belong to that group the next time a read attempt is made for any items in that group.

So our final implementation of the Read and Write functions looks like this:

/// <summary>
/// Writes out a serialized object to the cache
/// </summary>
/// <param name="o">Object to write to the cache</param>
/// <param name="group">Cache group that this item belongs to</param>
/// <param name="key">Key name for this cache item</param>
/// <param name="expiresAt">Optional date/time stamp that the cached item will expire at</param>
public void Write(
    object o,
    string group,
    string key,
    DateTime? expiresAt)
{
    // Get access to the cache
    var client = GT.CBCache;

    // Determine the keys we need
    var fullKey = group + '/' + key;
    var lockKey = fullKey + ".lock";

    // Store the item in the cache
    if (expiresAt == null) {
        client.Store(StoreMode.Set, fullKey, o);
    } else {
        client.Store(StoreMode.Set, fullKey, o, (DateTime)expiresAt);
    }

    // Clear the lock to let other processes into the cached item
    client.Store(StoreMode.Set, lockKey, false);
}

/// <summary>
/// Reads a serialized object from the cache
/// </summary>
/// <param name="o">Place to return the object read from the cache</param>
/// <param name="group">Cache group that this item belongs to</param>
/// <param name="key">Key name for this cache item</param>
/// <param name="dependencies">List of dependent sentinels this item is dependent on</param>
/// <returns>True if the item was read from the cache, false if not</returns>
public bool Read(
    out object o,
    string group,
    string key,
    List<string> dependencies = null)
{
    // Get access to the cache
    var client = GT.CBCache;

    // Determine the keys we need. The key of the item we key off the primary group
    var fullKey = group + '/' + key;
    var metaKey = fullKey + ".meta";
    var lockKey = fullKey + ".lock";

    // Set the default return values
    o = null;

    // Loop around trying to read the cache until either it is valid, or we get the write lock for the cache. This way
    // we guarantee that we will only ever have one writer actually generating the cache data at a time.
    CasResult<bool> casResult;
    CasResult<bool> locked = new CasResult<bool>();
    List<string> priorSentinels, sentinels;
    do {
        // Loop until the lock is clear in case another instance is writing the cache. We time out after 10 seconds
        for (int i = 0; i < TimeoutLoops; i++) {
            // Check if someone has the lock
            locked = client.GetWithCas<bool>(lockKey);
            if (!locked.Result) {
                // Not locked for writing, so we are good to go
                break;
            }

            // First try to get the cached item. If it is still present then we can simply serve up stale
            // data while one thread is writing the new cache data is being generated
            if ((o = client.Get(fullKey)) != null) {
                return true;
            }

            // Sleep for 100 milliseconds to allow the lock to get cleared
            System.Threading.Thread.Sleep(TimeoutSleep);
        }

        // Read the items we need from the cache with a multi-get
        var keys = new List<string> { group, metaKey, fullKey };
        if (dependencies != null) {
            // Add in any dependencies that this item depends on if required
            keys.AddRange(dependencies);
        }
        var results = client.Get(keys);
        priorSentinels = results.ContainsKey(metaKey) ? (List<string>)results[metaKey] : new List<string> { "0" };
        sentinels = new List<string> { results.ContainsKey(group) ? (string)results[group] : "0" };
        if (dependencies != null) {
            // If we have dependencies, make sure we get the sentinel value or set it to the default if not present
            foreach (var dependency in dependencies) {
                sentinels.Add(results.ContainsKey(dependency) ? (string)results[dependency] : "0");
            }
        }

        // Check if any of the sentinels indicate this item is invalidated. If any sentinels don't match the values
        // last time the cache was written then this item is invalid
        if (priorSentinels.SequenceEqual(sentinels)) {
            // Check if we got the cached item
            if (results.ContainsKey(fullKey) && ((o = results[fullKey]) != null)) {
                return true;
            }
        }

        // Check if we got the lock. If not then someone has the lock and we timed out just continue on. The write
        // code will eventually clear the lock and let other threads in. But we log this so we know it happened.
        if (locked.Result) {
            // NOTE: Add code to log this error to an error log in here...
            break;
        }

        // Attempt to set the lock using a Check-And-Set operation
        casResult = client.Cas(StoreMode.Set, lockKey, true, locked.Cas);
    } while (casResult.Result == false);

    // Store the sentinels for this item before it gets written since we know the cached item will now
    // be valid after the write completes. But we cannot write the value in the write code, because we
    // have to make sure we write the sentinel values we read above in case any of them have now been changed
    // (in which case the next read for the cache would be invalid also). So it is better to write them here
    // than to expect the caller to pass them to the write function.
    client.Store(StoreMode.Set, metaKey, sentinels);

    // Return false indicating that the cache is invalid and we have the write lock
    return false;
}

/// <summary>
/// Resets all the cache entries for this specific cache group
/// </summary>
/// <param name="sentinel">Cache group to clear</param>
public void InvalidateCache(
    string sentinel)
{
    // Increment the sentinel for this group to invalidate old entries. We start with a default value of 1
    // the first time a value is written since by default when the value is not present we assume it is 0
    GT.CBCache.Increment(sentinel, 1, 1);
}

As you can see from the code above, the Write() function is now much simpler as it no longer has to update a list of items in the group when it writes an item to the cache. The read function on the other hand is more complex, as it is passed in an optional list of extra sentinel values that a cache item may depend upon. Once the read function has obtained the cache lock for the item, it then uses a fast Mult-Get operation in Couchbase to read all the values it needs, including the cached item and the value of all sentinel keys it depends upon. If the sentinels don’t exist, we give it a default value of “0″ which will happen the first time that a sentinel is used and it has never been invalidated. Once the read operation has a list of current sentinel values and prior sentinel values we can compare if the two lists are identical and if they are, return the actual cache item if it was present.

Also note that the Write() function never writes the current sentinel values to the meta data item, but rather the Read() function does that right before it exits and returns false. We know that whoever is calling the Read() function will immediately call the Write() function if Read() returns false, and we also know that the Read() function has the write lock for the cache item when it exits also.

Using the new code is pretty much the same as before, but we can also pass in an optional list of other sentinels we might want to use to invalidate that particular cache entry:

string group = "myGroup";
string key = "myItem";
object o;
if (!Read(out o, group, key, new List<string> { "secondGroup" })) {
    o = "Something to cache";
    Write(o, group, key, DateTime.Now.AddMinutes(5));
}

// Do something with object o

Finally when we decide we wish to invalidate a group of cache items using a particular sentinel, we use the InvalidateCache() function above which just increments the sentinel value by 1. Note that we always start incrementing with a value of 1 in this function since we assume the absence of a sentinel value is equivalent to a value of 0 in the read code. So the first time we write a sentinel value, it must be 1 to ensure it actually caused any items covered by the sentinel to be invalidated.

That’s it! Have fun using Couchbase and I hope this blog will help you build a better caching solution for your own software.

Leave a Reply