Redis Indices 
127.0.0.1:6379> CREATE INDEX _email ON user:*->email 
@itamarhaber / #RedisTLV / 22/9/2014
A Little About Myself 
A Redis Geek and Chief Developers Advocate 
at .com 
I write at http://redislabs.com/blog and edit the 
Redis Watch newsletter at 
http://redislabs.com/redis-watch-archive
Motivation 
● Redis is a Key-Value datastore -> fetching 
(is always) by (primary) key is fast 
● Searching for keys is expensive - SCAN (or, 
god forbid, the "evil" KEYS command) 
● Searching for values in keys requires a full 
(hash) table scan & sending the data to the 
client for processing
https://twitter.com/antirez/status/507082534513963009
antirez is Right 
● Redis is a "database SDK" 
● Indices imply some kind of schema (and 
there's none in Redis) 
● Redis wasn't made for indexing 
● ... 
But despite the Creator's humble opinion, 
sometimes you still need a fast way to search :)
So What is an Index? 
"A database index is a data 
structure that improves the speed 
of data retrieval operations" 
Wikipedia, 2014 
Space-Time Tradeoff
What Can be Indexed? 
Data Index 
Key -> Value Value -> Key 
• Values can be numbers or strings 
• Can be derived from "opaque" values: 
JSONs, data structures (e.g. Hash), 
functions, …
Index Operations Checklist 
1. Create index from existing data 
2. Update the index on 
a. Addition of new values 
b. Updates of existing values 
c. Deletion of keys (and also RENAME/MIGRATE…) 
3. Drop the index 
4. If needed do index housekeeping 
5. Access keys using the index
A Simple Example: Reverse Lookup 
Assume the following database, where every 
user has a single unique email address: 
HMSET user:1 id "1" email "dfucbitz@terah.net" 
How would you go about efficiently fetching the 
user's ID given an email address?
Reverse Lookup (Pseudo) Recipe 
def idxEmailAdd(email, id): # 2.a 
if not(r.setnx("_email:" + email, id)): 
raise Exception("INDEX_EXISTS") 
def idxEmailCreate(): # 1 
for each u in r.scan("user:*"): 
id, email = r.hmget(u, "id", "email") 
idxEmailAdd(email, id)
Reverse Lookup Recipe, more admin 
def idxEmailDel(email): # 2.c 
r.del("_email:" + email) 
def idxEmailUpdate(old, new): # 2.b 
idxEmailDel(old) 
idxEmailAdd(new) 
def idxEmailDrop(): ... # similar to Create
Reverse Lookup Recipe, integration 
def addUser(json): 
... 
idxEmailAdd(email, id) 
... 
def updateUser(json): ...
Reverse Lookup Recipe, usage 
def getUser(id): 
return r.hgetall("user:" + id) 
TA-DA! 
def getUserByEmail(email): # 5 
return getUser(r.get("_email:" + email))
Reverse Lookup Recipe, Analysis 
● Asymptotic computational complexity: 
o Creating the index: O(N), N is no. of values 
o Adding a new value to the index: O(1) 
o Deleting a value from the index: O(1) 
o Updating a value: O(1) + O(1) = O(1) 
o Deleting the index: O(N), N is no. of values 
● What about memory? Every key in Redis 
takes up some extra space...
Hash Index 
_email = { "dfucbitz@terah.net": 1, 
"foo@bar.baz": 2 ... } 
● Small lookups (e.g. countries) → single key 
● Big lookups → partitioned to "buckets" (e.g. 
by email address hash value) 
More info: http://redis.io/topics/memory-optimization
Always Remember 
That You Are Absolutely 
Unique 
(Just Like Everyone Else)
Uniqueness 
The lookup recipe makes the assumption that 
every user has a single email address and that 
it's unique (i.e. 1:1 relationship). 
What happens if several keys (users) have the 
same indexed value (email)?
Non-Uniqueness with Lists 
Use lists instead of using Redis' strings/hashes. 
To add: 
r.lpush("_email:" + email, id) # 2.a 
Simple. What about accessing the list for writes 
or reads? Naturally, getting the all list's 
members is O(N) but...
What?!? WTF do you mean O(N)?!? 
Because a Redis List is essentially a linked list, 
traversing it requires up to N operations 
(LINDEX, LRANGE…). That 
means that updates & deletes 
are O(N) 
Conclusion: suitable when N (i.e. number of 
duplicate index entries) is smallish (e.g. < 10)
OT: A Tip for Traversing Lists 
Lists don't have LSCAN, but with 
RPOPLPUSH you easily can do a 
circular list pattern and go over all 
the members in O(N) w/o copying 
the entire list. 
More at: http://redis.io/commands/rpoplpush
Back to Non-Uniqueness - Hashes 
Use Hashes to store multiple index values: 
r.hset("_email:" + email, id, "") # 2.a 
Great - still O(1). How about deleting? 
r.hdel("_email:" + email, id) # 2.b 
Another O(1). 
(unused)
Non-Uniqueness, Sets Variant 
r.sadd("_email:" + email, id) # 2.a 
Great - still O(1). How about deleting? 
r.srem("_email:" + email, id) # 2.b 
Another O(1).
List vs. Hash vs. Set for NUIVs* 
* Non-Unique Index Value 
● Memory: List ~= Set ~= Hash (N < 100) 
● Performance: List < Set, Hash 
● Unlike a List's elements, Set members and 
Hash fields are: 
o Unique - meaning you can't index the same key 
more than once (makes sense). 
o Unordered - a non-issue for this type of index. 
o Are SCANable 
● Forget Lists, use Sets or Hashes.
Forget Hashes, Sets are Better 
Because of the Set operations: 
SUNION, SDIFF, SINTER 
Endless possibilities, including 
matchmaking: 
SINTER _interest:devops _hair:blond _gender:...
[This Slide has No Title] 
NULL means no value and Redis is all about 
values. 
When needed, arbitrarily decide on a value for 
NULLs (e.g. "<null>") and handle it 
appropriately in code.
Index Cardinality (~= unique values) 
● High cardinality/no duplicates -> use a Hash 
● Some duplicates -> use Hash and "pointers" 
to Sets 
_email = { "dfucbitz@terah.net": 1, 
"foo@bar.baz": "*" ...} 
_email:foo@bar.baz = { 2, 3 } 
● Low cardinality is, however, another story...
Low Cardinality 
When an indexed attribute has a small number 
of possible values (e.g. Boolean, gender...): 
● If distribution of values is 50:50, consider not 
indexing it at all 
● If distribution is heavily unbalanced (5:95), 
index only the smaller subsets, full scan rest 
● Use a bitmap index if possible
Bitmap Index 
Assumption: key names are ordered 
How: a Bitset where a bit's position maps to a 
key and the bit's value is the indexed value: 
first bit -> dfucbitz is online 
_isLoggedIn = /100…/ 
second bit -> foo isn't logged in
Bitmap Index, cont. 
More than 2 values? Use n Bitsets, where n is 
the number of possible indexed values, e.g.: 
_isFromTerah = /100.../ 
_isFromEarth = /010.../ 
Bonus: BITOP AND / OR / XOR / NOT 
BITOP NOT _ET _isFromEarth 
BITOP AND onlineET _isLoggedIn _ET
Interlude: Redis Indices Save Space 
Consider the following: in a relational database 
you need "x2" space: for the indexed data 
(stored in a table) and for the index itself. 
With most Redis indices, you don't have to 
store the indexed data -> space saved :)
Numerical Ranges with Sorted Sets 
Numerical values, including timestamps 
(epoch), are trivially indexed with a Sorted Set: 
ZADD _yearOfBirth 1972 "1" 1961 "2"... 
ZADD _lastLogin 1411245569 "1" 
Use ZRANGEBYSCORE and 
ZREVRANGEBYSCORE for range queries
Ordered "Composite" Numerical Indices 
Use Sorted Sets scores that are constructed by 
the sort (range) order. Store two values in one 
score using the integer and fractional parts: 
user:1 = { "id": "1", "weightKg": "82", 
"heightCm": "218", ... } 
score = weightKg + ( heightCm / 1000 )
"Composite" Numerical Indices, cont. 
For more "complex" sorts (up to 53 bits of 
percision), you can construct the score like so: 
user:1 = { "id": "1", "weightKg": "82", 
"heightCm": "218", "IQ": "100", ... } 
score = weightKg * 1000000 + 
heightCm * 1000 + IQ 
Adapted from: 
http://www.dr-josiah.com/2013/10/multi-column-sql-like-sorting-in-redis.html
Full Text Search (Almost) (v2.8.9+) 
ZRANGEBYLEX on Sorted Set members that 
have the same score is handy for suffix 
wildcard searches, i.e. dfuc*, a-la 
autocomplete: http://autocomplete.redis.io/ 
Tip: by storing the reversed string (gnirts) you 
can also do prefix searches, i.e. *terah.net, just 
as easily.
Another Nice Thing With Sorted Sets 
By combining the use of two of these, it is 
possible to map ranges to keys (or just data). 
For example, what is 5? 
ZADD min 1 "low" 4 "medium" 7 "high" 
ZADD max 3 "low" 6 "medium" 9 "high" 
ZREVRANGEBYSCORE min –inf 5 LIMIT 0 1 
ZRANGEBYSCORE max 5 +inf LIMIT 0 1
Binary Trees 
Everybody knows that 
binary trees are really useful 
for searching and other stuff. 
You can store a binary tree 
as an array in a Sorted Set: 
(Happy 80th Birthday!)
Why stop at binary trees? BTrees! 
@thinkingfish from Twitter explained that they 
took the BSD implementation of BTrees and 
welded it into Redis (open source rulez!). This 
allows them to do efficient (speed-wise, not 
memory) key and range lookups. 
http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis- 
to-scale-105tb-ram-39mm-qps-10000-ins.html
Index Atomicity & Consistency 
In a relational database the index is (hopefully) 
always in sync with the data. 
You can strive for that in Redis, but: 
• Your code will be much more complex 
• Performance will suffer 
• There will be bugs/edge cases/extreme 
uses…
The Opposite of Atomicity & Consistency 
On the other extreme, you could consider 
implementing indexing with a: 
• Periodical process (lazy indexing) 
• Producer/Consumer pattern (i.e. queue) 
• Keyspace notifications 
You won't have any guarantees, but you'll be 
offloading the index creation from the app.
Indices, Lua & Clustering 
Server-side scripting is an obvious 
consideration for implementing a lot (if 
not all) of the indexing logic. But ... 
… in a cluster setup, a script runs on 
a single shard and can only access the 
keys there -> no guarantee that a key 
and an index are on the same shard.
Don't Think – Copy-Paste! 
For even more "inspiration" you can review the 
source code of popular ORMs libraries for 
Redis, for example: 
• https://github.com/josiahcarlson/rom 
• https://github.com/yohanboniface/redis-limpyd
Redis Indices (#RedisTLV)

Redis Indices (#RedisTLV)

  • 1.
    Redis Indices 127.0.0.1:6379>CREATE INDEX _email ON user:*->email @itamarhaber / #RedisTLV / 22/9/2014
  • 2.
    A Little AboutMyself A Redis Geek and Chief Developers Advocate at .com I write at http://redislabs.com/blog and edit the Redis Watch newsletter at http://redislabs.com/redis-watch-archive
  • 3.
    Motivation ● Redisis a Key-Value datastore -> fetching (is always) by (primary) key is fast ● Searching for keys is expensive - SCAN (or, god forbid, the "evil" KEYS command) ● Searching for values in keys requires a full (hash) table scan & sending the data to the client for processing
  • 4.
  • 5.
    antirez is Right ● Redis is a "database SDK" ● Indices imply some kind of schema (and there's none in Redis) ● Redis wasn't made for indexing ● ... But despite the Creator's humble opinion, sometimes you still need a fast way to search :)
  • 6.
    So What isan Index? "A database index is a data structure that improves the speed of data retrieval operations" Wikipedia, 2014 Space-Time Tradeoff
  • 7.
    What Can beIndexed? Data Index Key -> Value Value -> Key • Values can be numbers or strings • Can be derived from "opaque" values: JSONs, data structures (e.g. Hash), functions, …
  • 8.
    Index Operations Checklist 1. Create index from existing data 2. Update the index on a. Addition of new values b. Updates of existing values c. Deletion of keys (and also RENAME/MIGRATE…) 3. Drop the index 4. If needed do index housekeeping 5. Access keys using the index
  • 9.
    A Simple Example:Reverse Lookup Assume the following database, where every user has a single unique email address: HMSET user:1 id "1" email "[email protected]" How would you go about efficiently fetching the user's ID given an email address?
  • 10.
    Reverse Lookup (Pseudo)Recipe def idxEmailAdd(email, id): # 2.a if not(r.setnx("_email:" + email, id)): raise Exception("INDEX_EXISTS") def idxEmailCreate(): # 1 for each u in r.scan("user:*"): id, email = r.hmget(u, "id", "email") idxEmailAdd(email, id)
  • 11.
    Reverse Lookup Recipe,more admin def idxEmailDel(email): # 2.c r.del("_email:" + email) def idxEmailUpdate(old, new): # 2.b idxEmailDel(old) idxEmailAdd(new) def idxEmailDrop(): ... # similar to Create
  • 12.
    Reverse Lookup Recipe,integration def addUser(json): ... idxEmailAdd(email, id) ... def updateUser(json): ...
  • 13.
    Reverse Lookup Recipe,usage def getUser(id): return r.hgetall("user:" + id) TA-DA! def getUserByEmail(email): # 5 return getUser(r.get("_email:" + email))
  • 14.
    Reverse Lookup Recipe,Analysis ● Asymptotic computational complexity: o Creating the index: O(N), N is no. of values o Adding a new value to the index: O(1) o Deleting a value from the index: O(1) o Updating a value: O(1) + O(1) = O(1) o Deleting the index: O(N), N is no. of values ● What about memory? Every key in Redis takes up some extra space...
  • 15.
    Hash Index _email= { "[email protected]": 1, "[email protected]": 2 ... } ● Small lookups (e.g. countries) → single key ● Big lookups → partitioned to "buckets" (e.g. by email address hash value) More info: http://redis.io/topics/memory-optimization
  • 16.
    Always Remember ThatYou Are Absolutely Unique (Just Like Everyone Else)
  • 17.
    Uniqueness The lookuprecipe makes the assumption that every user has a single email address and that it's unique (i.e. 1:1 relationship). What happens if several keys (users) have the same indexed value (email)?
  • 18.
    Non-Uniqueness with Lists Use lists instead of using Redis' strings/hashes. To add: r.lpush("_email:" + email, id) # 2.a Simple. What about accessing the list for writes or reads? Naturally, getting the all list's members is O(N) but...
  • 19.
    What?!? WTF doyou mean O(N)?!? Because a Redis List is essentially a linked list, traversing it requires up to N operations (LINDEX, LRANGE…). That means that updates & deletes are O(N) Conclusion: suitable when N (i.e. number of duplicate index entries) is smallish (e.g. < 10)
  • 20.
    OT: A Tipfor Traversing Lists Lists don't have LSCAN, but with RPOPLPUSH you easily can do a circular list pattern and go over all the members in O(N) w/o copying the entire list. More at: http://redis.io/commands/rpoplpush
  • 21.
    Back to Non-Uniqueness- Hashes Use Hashes to store multiple index values: r.hset("_email:" + email, id, "") # 2.a Great - still O(1). How about deleting? r.hdel("_email:" + email, id) # 2.b Another O(1). (unused)
  • 22.
    Non-Uniqueness, Sets Variant r.sadd("_email:" + email, id) # 2.a Great - still O(1). How about deleting? r.srem("_email:" + email, id) # 2.b Another O(1).
  • 23.
    List vs. Hashvs. Set for NUIVs* * Non-Unique Index Value ● Memory: List ~= Set ~= Hash (N < 100) ● Performance: List < Set, Hash ● Unlike a List's elements, Set members and Hash fields are: o Unique - meaning you can't index the same key more than once (makes sense). o Unordered - a non-issue for this type of index. o Are SCANable ● Forget Lists, use Sets or Hashes.
  • 24.
    Forget Hashes, Setsare Better Because of the Set operations: SUNION, SDIFF, SINTER Endless possibilities, including matchmaking: SINTER _interest:devops _hair:blond _gender:...
  • 25.
    [This Slide hasNo Title] NULL means no value and Redis is all about values. When needed, arbitrarily decide on a value for NULLs (e.g. "<null>") and handle it appropriately in code.
  • 26.
    Index Cardinality (~=unique values) ● High cardinality/no duplicates -> use a Hash ● Some duplicates -> use Hash and "pointers" to Sets _email = { "[email protected]": 1, "[email protected]": "*" ...} _email:[email protected] = { 2, 3 } ● Low cardinality is, however, another story...
  • 27.
    Low Cardinality Whenan indexed attribute has a small number of possible values (e.g. Boolean, gender...): ● If distribution of values is 50:50, consider not indexing it at all ● If distribution is heavily unbalanced (5:95), index only the smaller subsets, full scan rest ● Use a bitmap index if possible
  • 28.
    Bitmap Index Assumption:key names are ordered How: a Bitset where a bit's position maps to a key and the bit's value is the indexed value: first bit -> dfucbitz is online _isLoggedIn = /100…/ second bit -> foo isn't logged in
  • 29.
    Bitmap Index, cont. More than 2 values? Use n Bitsets, where n is the number of possible indexed values, e.g.: _isFromTerah = /100.../ _isFromEarth = /010.../ Bonus: BITOP AND / OR / XOR / NOT BITOP NOT _ET _isFromEarth BITOP AND onlineET _isLoggedIn _ET
  • 30.
    Interlude: Redis IndicesSave Space Consider the following: in a relational database you need "x2" space: for the indexed data (stored in a table) and for the index itself. With most Redis indices, you don't have to store the indexed data -> space saved :)
  • 31.
    Numerical Ranges withSorted Sets Numerical values, including timestamps (epoch), are trivially indexed with a Sorted Set: ZADD _yearOfBirth 1972 "1" 1961 "2"... ZADD _lastLogin 1411245569 "1" Use ZRANGEBYSCORE and ZREVRANGEBYSCORE for range queries
  • 32.
    Ordered "Composite" NumericalIndices Use Sorted Sets scores that are constructed by the sort (range) order. Store two values in one score using the integer and fractional parts: user:1 = { "id": "1", "weightKg": "82", "heightCm": "218", ... } score = weightKg + ( heightCm / 1000 )
  • 33.
    "Composite" Numerical Indices,cont. For more "complex" sorts (up to 53 bits of percision), you can construct the score like so: user:1 = { "id": "1", "weightKg": "82", "heightCm": "218", "IQ": "100", ... } score = weightKg * 1000000 + heightCm * 1000 + IQ Adapted from: http://www.dr-josiah.com/2013/10/multi-column-sql-like-sorting-in-redis.html
  • 34.
    Full Text Search(Almost) (v2.8.9+) ZRANGEBYLEX on Sorted Set members that have the same score is handy for suffix wildcard searches, i.e. dfuc*, a-la autocomplete: http://autocomplete.redis.io/ Tip: by storing the reversed string (gnirts) you can also do prefix searches, i.e. *terah.net, just as easily.
  • 35.
    Another Nice ThingWith Sorted Sets By combining the use of two of these, it is possible to map ranges to keys (or just data). For example, what is 5? ZADD min 1 "low" 4 "medium" 7 "high" ZADD max 3 "low" 6 "medium" 9 "high" ZREVRANGEBYSCORE min –inf 5 LIMIT 0 1 ZRANGEBYSCORE max 5 +inf LIMIT 0 1
  • 36.
    Binary Trees Everybodyknows that binary trees are really useful for searching and other stuff. You can store a binary tree as an array in a Sorted Set: (Happy 80th Birthday!)
  • 37.
    Why stop atbinary trees? BTrees! @thinkingfish from Twitter explained that they took the BSD implementation of BTrees and welded it into Redis (open source rulez!). This allows them to do efficient (speed-wise, not memory) key and range lookups. http://highscalability.com/blog/2014/9/8/how-twitter-uses-redis- to-scale-105tb-ram-39mm-qps-10000-ins.html
  • 38.
    Index Atomicity &Consistency In a relational database the index is (hopefully) always in sync with the data. You can strive for that in Redis, but: • Your code will be much more complex • Performance will suffer • There will be bugs/edge cases/extreme uses…
  • 39.
    The Opposite ofAtomicity & Consistency On the other extreme, you could consider implementing indexing with a: • Periodical process (lazy indexing) • Producer/Consumer pattern (i.e. queue) • Keyspace notifications You won't have any guarantees, but you'll be offloading the index creation from the app.
  • 40.
    Indices, Lua &Clustering Server-side scripting is an obvious consideration for implementing a lot (if not all) of the indexing logic. But ... … in a cluster setup, a script runs on a single shard and can only access the keys there -> no guarantee that a key and an index are on the same shard.
  • 41.
    Don't Think –Copy-Paste! For even more "inspiration" you can review the source code of popular ORMs libraries for Redis, for example: • https://github.com/josiahcarlson/rom • https://github.com/yohanboniface/redis-limpyd