For the past few years I’ve been trying to build an operating system that can do everything you expect an operating system to do while also encoding its entire state as a single URL. That way, we get a permanent link to every O/S state, achieving truly comprehensive deep linking.
For a long time, I struggled with finding a solid approach to achieving this goal. A user-friendly operating system involves many independent, dependent and otherwise tightly-coupled components — especially when you want it to be a platform that supports third-party app development. I tried a variety of different methods to encode/decode data from a string without redundancy.
Some of my attempts involved query parameters, some of them base64-encoded JSON data, and some of them custom domain-specific languages. They all had the same problem — they failed to capitalize on the actual space available in URLs.
An incredible combinatorial explosion appears when you try to compute the number of unique URLs that can exist. Query params, JSON objects, domain-specific languages and virtually all other syntax only take advantage of a minuscule fraction of that space.
How is it Bigger than the Universe?
I'm being a bit facetious, but according to some very quick research, the estimated number of particles in the observable universe is . That’s a 97-digit number. Even though that’s an unfathomably large number (imagine having to count to it), its not that long. Have a look:
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
It barely fills two lines of text on my monitor. That's a lot shorter than this article.
If you could assign a unique natural number (or hash) to every particle in the known universe you’d never need more than 97 digits to do it.
If we change the base from to , is less than . That means you never need more than 54 characters to assign a unique base-64 hash to every particle in the observable universe.
That’s an incredibly short string of text for such a big feat. Here’s a sample of such a base-64 string I made by mashing my keyboard 54 times:
09m84COSM84wf897hos_t83740FW3M-8tdocs8374tyvsc9nc3b1nt
In theory, this could be the hash of a random particle in the universe. Hopefully, it’s one close by.
This kind of assignment, where each element in a set has an integer value and there are no gaps or collisions, is called a minimal perfect hash function (MPHF).
Of course, URLs are more than simple numerals. For my project, they will require a domain name and unless we encode numerical data into the domain name, that added length can't be used to store much data.
Yet, even if we add an 8-letter protocol and the longest allowable domain name (255 characters)…
https://heres.a.huge.domain.name.as.long.as.it.can.possibly.be.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.example.com/09m84COSM84wf897hos_t83740FW3M-8tdocs8374tyvsc9nc3b1nt
… we now have a valid URL capable of disambiguating any particle in the known universe and it only uses characters. For comparison, here's a URL I got by searching Google for pictures of cats, sporting 376 characters:
https://www.google.com/search?sca_esv=c99a67a28e4a9644&q=cats&udm=2&fbs=AIIjpHxU7SXXniUZfeShr2fp4giZ1Y6MJ25_tmWITc7uy4KIeioyp3OhN11EY0n5qfq-zENwnGygERInUV_0g0XKeHGJtesk9Adz8V_3dCFxRHd-4rVc28Hvas3fJjxYa4l0bNk99rKkfyreU5lHIQnHuafulMAL-Uqa-w7l2q-jrp2K_DA_pRuYwavY1buYjWGJpBAMTtrbI6TDhEB-GyDqzxOrfIfwCQ&sa=X&ved=2ahUKEwiQhOe9hd6NAxVRRjABHUCvJ6oQtKgLegQIGRAB&biw=1223&bih=754&dpr=2
TL; DR:
Based on current estimates, there are far more unique URLs than there are particles in the observable universe.
This had me thinking: how can a web app truly capitalize on all of that storage space?
How many possible URLs are there?
Back when Internet Explorer was popular, URLs were allowed to have thousands of characters. Today, they can have millions.
To determine the actual cardinality of the set of all URLs, we need to account for a lot of factors. For example, are consecutive slashes allowed in the pathname or will they be normalized away? Do we limit the URL length to two thousand characters like Internet Explorer or two million like Chrome?
Lets go factor-by-factor and, at each step, we’ll assume the worst possible case.
Worst-Case Length
First, we start with the length. Let’s use a hard limit of 2000 characters. Since Chrome supports 2MB, we just removed 99.9% of the URLs from our set. Even so, there is still a lot of nuance to computing the cardinality of that remaining 0.1%.
No Protocol Mutability
For the protocol, forget about any cardinality imparted by that. I only want to think about secure browsing contexts (“https://”). We subtract those 8 characters, leaving us with 1992 characters.
Worst Possible Host
What about the host name? Well, what’s the worst case we could possibly run in to?
Many projects and apps are meant to exist on only one origin, so lets not even allow any mutability in the host.
The maximum length of a host name is 255-characters. So let’s bring back that behemoth of a name that I showed earlier:
https://heres.a.huge.domain.name.as.long.as.it.can.possibly.be.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.example.com
Just to be clear, we removed every URL from our set that doesn’t have this domain name which, as you know, was pretty much all of them. Yet, we’re still left with 1737 characters we can store data in.
No Optional Features
There’s still a lot of unpredictability in the size of our set, however. For example, the user credentials, port, query parameters and hash/fragment all add a ton of cardinality to our set but they also add an incredible amount of complexity to the equation.
Luckily, they’re all optional. We’ll just remove all of the URLs that have any of these features (which, if you didn’t know, was almost all of the URLs we had left).
No Variable-Length Pathnames
At this point, we’re really just counting the set of all valid pathnames up to 1737 characters long. All pathnames require at least the initial slash. In other words, that first pathname character is useless.
Furthermore, how do we account for the variable-length of pathnames? We’d have to add the number of 2-character pathnames to the number of 3-character pathnames to the number of 4-character pathnames and so on.
To compute that, we require a geometric series. That’s too rich for our worst-case set, so lets just remove every URL whose pathname is not exactly 1737 characters long. Again, we just removed almost every URL from our already stripped-down set.
No Special Characters
URL pathnames support 76+ characters (once you include the percent symbol, comma, period, exclamation point, etc.), suggesting it might be possible to approach base 76.
Sadly, though, many of these special symbol combinations will get stripped away/normalized down to others during URL resolution. Others will cause an error if used incorrectly. It isn’t trivial to compute the cardinality of that.
The solution? You guessed it. We’ll remove most of the URLs from our set! Let’s only consider URLs that use the following 64 character alphabet to spell out their pathname segments:
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_
But we can’t just string together 1737 of these bad boys and call it a day…
No Variable-Length Segments
A valid URL pathname can be broken up into segments, each separated by a forward slash. Some systems throw an error if even one of those segments is longer than 250 characters. It wouldn’t be terribly hard to account for these arrangements but it would mean writing out a geometric series.
So, lets only count URLs that have this exact arrangement:
Seven fixed-length segments: six 250-character segments followed by one 230 character segment.
This requires seven slashes to split it all up; 1737 minus 7 leaves us with an even 1730 segment characters.
Not to sound like a broken record, but almost none of the URLs in our set had this exact arrangement. We just removed almost everything we had left, yet again.
Just for fun, here is a “random” sample from our set:
https://heres.a.huge.domain.name.as.long.as.it.can.possibly.be.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.abcdefghijklmnopqrstuvwxyz.example.com/ld9370LC9uzp83Mklv90dujd9v7774n0V7p7em5078B7S790dkb094977au9D07B098yAV76ouyf79J7uft7d5f77g90jki98765dc7hkji7865d7dgy77yuhjkIk8Kgf567hj9ny7GVNjn7677V9jbvr567bj099efbn49efg776ft5x4rxi7T8Yoih0n7hv6r4d657d8g9u8b779ub8yf7t7d75d674s77743s2a1a7772zqzdtcuhbi/007opkolpp7m777ko7n777ou77h8yvtcr7tds5es5241awzsx7dfchvj7jkhboi78g8765c6e4x421s576y57t6fv8iub9ub87ytvc64dc768fv897b9ub9yv7tc7tv8ybhvutF64d52as2zfxcghvbhininIonoi0ih9ubgiI0P1MS07vmllp0097f74d2wa1asswextyvunimlljo00987643212wzqz1a12aswzz23sxerc34dfcrvr/5fvtby65g67buh7nu98m09k0,00k9jubyuhubvtctvubuvtdcrd4dxw23314152375786e6d976f70867f97743d8742a13s7278rf97vtiyfcdtd7xtrwSte7wrxztewzyrTUCtivuyvbOUyv77utrs462s67546rv9vbyv8rtc6X4s13A2ezwrexrcYdexwa2WESDfgtyu789iujhnmko9iujhbVFdre321qasxcfvghuiuytr432was/zxcvfghyui9o876543wsaxcdfghyuI9Oijnjiuygtftdrezwertrytfugyhiu8978gtf7r6d53s64512zwxetCRYtfughioYG87f564dsetrcyvubhij9i8u7t6f5r7743217aqwezsxdcrft7gyh77ujikJHgt5432177777qwertyhj7kJHgre321qaszxcvghjio987654321QWErfcv7bhuyTRfdfgtyui987654321qwerfv77798/bnhjHGv7fghjio09oiujhn7b7VCxsaq12we7rty7u8Ikj7hgFD777seRTyu876YT7r4ewsDFghJiuy6543721Qas7zxcVBnjio9876574321qasDFGhbnJUytrfdsERtyuJHgbvfdeRthBVGCFdsw34543212qaszxcVbghjkio0987654321qasZXcvbnhJKo0987654ERD7sw21qas7ZXcfghyuI9O0okmjki87uyhgbvgt54edsxzsa/Q12qwasXCfrt56tygBnhjio909OIKjmnko09iujhY76tfgcder321q7aszxCvgbhjui7uY6T5r4e321234567879i8UytrfdsXcvbghjIuyTRewqA7SzxcvbgHJik7oKjmjI7UyhgfrEwszx7sAWaszxcdftYUijkmjNHBGfrtYuiuYTredFCghyui89876543eDfgyuytr43edfgyuhbghJUI987654321qaszxCVghui9o876543wErt/fGHYBUtcu63s312asydu6IYUTYVbokuvhtExt42at4wrexcutyvbo8ygoiyrc53yr3254Sdu6rfvoyboiybiutyvyuCtycurc6u5c865777ctycuyut7ityd7i6757DE7U7674s5u74s54sd7xry7txc7jyrfxd7j7yrDujdI7775edu77564st427as13szr7ezUT7rxiYGCkyuC77FJut7ykm79OM0p7978t
I mashed my keyboard as fast as I could and didn’t bother adding any underscores or dashes, and it still took a surprisingly long time to type this out.
Final Cardinality of our Set
Our set is now a fraction of a fraction of a fraction of a fraction of a fraction of a fraction of a fraction of its former self. Every URL in our set is a highly unlikely 264-character origin followed by a just-as-unlikely arrangement of 1730 base-64 digits. Ultimately, what we’ve got here isn’t even a drop in the bucket.
But the good news is we finally have a set that can be represented by a trivial cardinality expression! Here it is:
Remember how we said there are less than particles in the observable universe? Contrast that with . The ratio is basically . But what does our final number look like?
First, recall this number:
1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
You can hash every particle in the universe with this.
Now, take a look at the number of URLs in our “tiny” worst-case set:
49130930808458578125726405944552267433522803882142885516112811523788026785247859392846571767356697524778325967483200069078779474567637462287039820734625250499484361851684556510293931285640090262847320625396236411697500945066199598105166159939755919883442591852056298637720529199720593614665368213198995251597105758524900013562528183914971627692406189778877642149164166096666344310948740383560007413387592604328612531345115046683029331104462385525968171774813136492661946688877582952419371606080350664628677161116012681284900630454342749044599225617460538414933002793524748242310551996204900729123914340847332998493284864702610102439307491216408396209310348370320754399598907600619635495790019370265074821420291117076832510034388136166750175664483098298227108288715448223965251691403281034083780954570303916571362770718114444005072106144764849088939314984894581474809543833192044410395714190504825893726938138808786090562830615071943734645166593502644409746065788403640661934980709315726553694898120393882753330274852067296608154605846048350579895683801542722129840564975973695792574841807503188443209805038542236087627591190050313931573054215490244259112910601708234964741899698246055425592081845013246361046328159137214118378184713651023860908738930831971118247941416724345631434014423798305497398506349015730376483008110926005644593572237269397514131887446582215094604019679547368523447625105463951586910365216023840045751551948134564505146576477286926376308492310938786531574335304262217571479998838543109756673336971087025554889723823983924836021091907239786400390676746118864056643193205208982882660920507349524831913201182159186699144808760747505873640789216340957248824164601830737224297040770920458197407697184648022371257191918470172361380986595622828733829413281500185801880456613038104242643213822575569128878763680705643247394470467893917356297826947055879379566532902944043402022244021075932899192372571062439168957178821897773439784799153514401590031281325655093535268608362216209580429528935699708423114866509895772049714403629267974279109387816328548424345736717256948117015825744199080052425083293799967061771719865084133230904096721868612800747277569863102967441274245770550134896869196181886526475029674372524872538258468524762201586321646559863923236843242370292251062221529264770493701602901661815354058944472808349405810473671013491176811312485003537600785430597702027500137187545757793793903699970939444681886308707596870841054350202539644315500656737966010773075989351158870179723002367751683561878721860274497306669607424568652625472586573342936967070726864546132506261184714500488245623426076538010987701708241991726967647765104168030353744906806687895709021815045224257436600078097509507924968458836098125864106257805097776865485964770796555656543643860800587923743280926493394369557925897760397469072974966876141221188598603450443972539793912611469980053930814065772811818839300960781935055752799228266748050545969251447362299129060916916664278672204155214991658109636131883622892313180240750974195280013175415626930825404039290811759144500594703907940131918638449619062937500747268988899138994176
🤯
Imagine what you could hash with this range!
In the case of my project, the “things” I need to hash are operating system states.
Do I really need this many application states to make an operating system?
My guess was no, especially if I embrace a minimalist approach to UI interaction design.
However, to take full advantage of all of that storage space, I would need to create a MPHF that could integrate naturally into today’s web app technology stack.
MVC + MPHF
In addition to an MPHF, my project needed a reactive full-stack MVC framework. An application state needed to be recoverable from any hash, and vice-versa, so both the hash function and its inverse function had to actually exist and be computable (not just theoretically). Furthermore, it had to be fast - fast enough for real-time operating system interaction (at least 60 frames per second).
All of my math and programming knowledge is self-taught. Unfortunately, that means I sometimes come upon information in a less-than-ideal order. I filled several fat sketchbooks with graphs and notes and ended up creating a piecewise-defined MPHF.
Yet, I only learned the term “perfect hash function” after I finished the algorithm. Had I studied perfect hash functions at any point before this project, it would have shaved months off of the process. But maybe I wouldn’t have arrived at the same breakthrough.
The breakthrough came when I realized that the structure I was building to support the bijection had a surprising amount in common with MVC architecture. In fact, the two ended up laying right on top of each other and being the same thing.
Every MVC controller object is piece of the whole piecewise-defined function. Whether you assign a hash (which acts as the data model) to a controller or manipulate the controller (for example, in response to user interaction), the application state and the model are always be kept in sync.
Each controller is a reusable component. It has it’s own hash range from 0 to k-1, where k is the cardinality of the component’s model.
These components assemble together like LEGO® blocks, and no matter what assembly you make with them, the result is still a minimal perfect hash function over the assembly’s state domain.
Leveraging the JavaScript prototype chain, I was able to add object-oriented concepts like inheritance into the equation.
The algorithm is fast (60hz most of the time - but talk to me later when I put a bit more stress on it) and it powers my operating system’s reactive framework.
It takes a bit of doing to express an enjoyable user experience using just numbers, but once you have those expressions it is easy to turn them into software components.
What I’ve made so far are a few basic apps which remember where you scroll in each section and include some basic user preferences like light/dark/device mode, theme options, and a menu that remembers where it is in its open/close tween. All of this comes together in a configuration space that demands less than 5 characters - significantly less than a base64-encoded JSON object. Not to mention, my domain names are never 255 characters long.
I’m currently creating the quantitative expressions of movable windows, the ability to have arbitrarily many of them open, and a full read-write file system that behaves in the familiar way. I expect these to start adding length to my hashes.
Conclusion
Hopefully, this shows you just how powerful a minimal perfect hash function is when employed for the purposes of data compression.
The next question I need to answer is whether the URL space is bigger than the multiverse? Check out my previous post, which takes the conversation to the next level.

A Space-Age Metaphor for Software Development
Eric Augustinowicz ・ Jun 23
I look forward to writing more about my project soon!
Thanks for reading!
Top comments (0)