Redis to the Rescue

Bill Inkoom
Hubtel Engineering
Published in
4 min readJun 28, 2020

--

Once upon a time, there existed data that lived peacefully in a pgSql database. This data belonged to over 1 million unique users who’d access via mobile or web. This was mostly ok with response times between 1 to 5 seconds. This database contained a trail of user events, users request data with an identifier then the service queries the database to retrieve information associated with that identifier.

All seemed fine, but what would happen if a lot of users showed up at the same time requesting for data, what would happen if the number of events per user doubled, what would happen if the number of unique users also doubled with all the above.

A quick search pointed me to how Facebook and Twitter had used Key-Value stores to solve some of these issues. The two names that came up were Redis and Memcached. A comparison showed that Redis would be a better tool for the problem at hand, mainly because it was a data-structure store whereas Memchached wasn’t.

Welcome to the world of Redis

Redis (Remote Dictionary Server) is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, geospatial indexes with radius queries and streams.

A few stats

  1. The max length of a list is 2 ³²–1 elements (4294967295, more than 4 billion of elements per list).
  2. The max number of members in a set is 2 ³²–1 (4294967295, more than 4 billion of members per set).

Redis also has powerful commands like expire that auto deletes a key after some n seconds.

It also has sorted sets (Olog(n) to add and O(log(n)+m) to remove) , which guarantees uniqueness of members and preserves order by a score you can provide.

What does this mean for our service

From here on we shall refer to our service as the stream service. With just these few commands we can build a self-resizing cache for our stream service. The language used here is PHP, but you can ignore that since it’s the commands that matter, any Redis library in your language of choice should have these commands.

Case 1 : Nothing in cache

The first time a user makes a request to our stream service, we check if they have data in the cache by using

$key = “stream:streamIdzcard($key)

zCard : Returns the sorted set cardinality (number of elements) of the sorted set stored at key. Time complexity: O(1)

If cardinality is 0, it means we don’t have any data in cache so we load the last 50 streams from our database and store it as a sorted set at the key provided.

zadd($key,[ 
jsonEncode($streamItem1) => $streamItem1->id,
jsonEncode($streamItem2) => $streamItem2->id,

jsonEncode($streamItemN) => $streamItemN->id,
]);

zAdd : Adds all the specified members with the specified scores to the sorted set stored at key. Time complexity: O(log(N)) for each item added, where N is the number of elements in the sorted set.

Upon successful addition we set an expiry on this key using

expire($key, $time); 
//currently 3( 259200 seconds) days or any number of days we want

expire : Set a timeout on key. After the timeout has expired, the key will automatically be deleted. A key with an associated timeout is often said to be volatile in Redis terminology. Time complexity: O(1)

This means if the user does not do a get request in 3 days(or any n days we specify), their key would be deleted and cardinality would be 0 so this code would trigger again when they finally appear.

Case 2 : !Nothing (Something) in cache

However if the cardinality is greater than 1, it means there is data in the cache so we fetch from there instead.

$startFrom = (max($page - 1, 0)) * 5;$streamsInKVS = $this->client->zrevrange($key, $startFrom, $startFrom + 5);

zRemRangeByScore : Removes all elements in the sorted set stored at key with a score between min and max (inclusive). Time complexity: O(log(N)+M) with N being the number of elements in the sorted set and M the number of elements removed by the operation.

So we fetch 5 from cache per request. This also supports pagination, so increasing the page number would give you different results. All we need to do is a little math to traverse equally spaced partitions.

How do we keep cache fresh!?

Simple, anytime there is a new record, we insert in the stream database and the cache too

$this->client->zremrangebyscore($key, $score, $score);
$this->client->zadd($key, [$data => $score]);

Since sorted set stores unique members, a modified record will be treated as new, however it could be you made an update on a stream item so we remove then we add.

zRemRangeByScore : Removes all elements in the sorted set stored at key with a score between min and max (inclusive). Time complexity: O(log(N)+M) with N being the number of elements in the sorted set and M the number of elements removed by the operation.

After these enhancements, the performance benefits recorded are incredible, currently averaging 150 ms per request where as previously we hovered between 1–5 seconds.

In Summary

This implies every user will have their own ‘mini-shard’ of stream, where access won’t be affected by all other streams in the case of our pgSql database. It is self-resizing because at any point in time we are only storing keys of users who consistently request for data. Keys of users who never return will automatically be deleted, this implies that at any given time the key-set is a reflection of our active users.

Thanks for making it here, feedback would be awesome!

--

--

Bill Inkoom
Hubtel Engineering

Software Engineer with great love for Mathematics especially Abstract Mathematics.