Instead of doormen, Stockholm apartment buildings have a little keypad at the front door, into which you key a 4-digit code to gain access to the lobby. The code is known to residents, and is liberally shared with acquaintances, workmen, postmen, and no doubt sold to an underworld of snail-mail spammers and worseI’d love to see a database of Stockholm door codes, though, if only to gain a little humint on what are the most popular 4-digit combinations..
The keypad is unusual in that there is no enter key. Instead, it remembers the last 4 digits you have pressed, and compares those to the code which unlocks the door. In other words, if you key in the sequence 123456, you’ll have tried codes 1234, 2345 and 3456. If the correct code is 2345, you’ll have opened the door after keying in the first 5 digits in the above sequence.
This is different from how a bank machine keypad operates, where you need to key in four digits (and then enter) for every PIN code number you try. If you were to forget your code, you’d have to go through a maximum of 40,000 key punches (not counting the mathematically boring enter key): 10,000 combinations of 4 keys each (0000, 0001 … 9998, 9999).
On a door-code keypad, however, it’s clear that you can cycle through every combination in far fewer than 40,000 key punches: If I key in just the sequence 1234567890, for example, I have already tried 7 different 4-digit combinations with just 10 key punches. On a bank machine keypad, it would have taken me 28 key punches to try 7 different 4-digit combinations.
This is interesting information for all those of us who think it within the bounds of possibility they might one day find themselves standing somewhat stupefied in the snow outside their Stockholm apartment at 3am on a Sunday in January, with the mercury all shrivelled up, not a soul in sight and not a code in mind. Deciding beforehand on the right contingency strategy for such a situation could shave hours off the frigid poking at that dumb but durable little keypad which would then have to follow.
A rundown of the options, then:
Going through the combinations bank-machine style. Pros: Counting from 1 to 10,000 is easy; you’re guaranteed to hit your code eventually Cons: 40,000 key punches at around 2 per second will take just over 5 1/2 hours. With luck, your code is 0012 — but nobody is ever lucky at 3am on Sundays.
Going at it randomly. Pros: No thinking required whatsoever. Cons: Looks pathetic; generating truly random patterns is nearnigh impossible for humans, especially drunk ones; you might actually never type in the correct code.
Calculating beforehand the shortest possible sequence that cycles through all possible codes, writing it down and keeping it in your parka in case of just this kind of emergency. Pros: Should get you into any warm lobby in no time, relatively speaking. Cons: The “beforehand” part of the first sentence in this paragraph.
The rest of this post concerns itself with exploring the possibility of making this third option a viable one for Stockholmers. To that end, I have been wondering: (1) What is the minimum amount of door-code keypad punches needed to try every possible combination? (2) Is it possible to run through every possible combination without being forced to repeat any combination? (3) Is there a unique such solution? (4) If not, how many solutions are there? And finally, (5) How likely am I to spontaneously generate the shortest possible sequence if I pursue the second, random strategy described above?
If the answer to (2) is yes, it’s easy to see that the answer to (1) is 10,003. That’s because after three punches, every subsequent key I press adds a new 4-digit combination to the sequence, and there are 10,000 combinations to go through. In this best-case scenario, at 2 presses per second, we’d go through every combination in under 84 minutes.
As I started looking into this, it felt like the answer to (2) should be yes, but I lacked the mathematical tools to back up my hunch. So I decided to try to simplify the problem and then apply brute force. Instead of looking for the shortest sequence that contains all possible 4-digit combinations without repetitions, I decided to try to find one that contains all 2-digit combinations without repetitions — the shortest sequence that would unlock a door-code keypad with a 2-digit code.
I printed out a 10×10 grid with all 100 combinations (00, 01 … 98, 99) and headed for a café in Kungsholmen. After a café latte’s worth of trying to connect the numbers, I’d found a sequence 101 digits long that contained all the combinations exactly once — showing that (2) is true, at least for 2 digits:
00112131415161718191022324252627282920334353637383930445464748494055657585950667686960778797088980990
I can’t help it, but I happen to feel that that’s a pretty cool sequence — it feels so dense, so efficient. Just from inspection, some other things become clear:
There are many distinct such shortest possible sequences. For example, you can make some others just by moving the 00 at the start of the sequence to wherever else there is a 0, just by moving the first 0 next to any other 0. This means the answer to question (3) is No, there is no unique solution; not for two digits, and by extension, not for four digits, because here too we’d be able to insert the specific combination 0000 anywhere we find 000 in the sequence. The same holds for shortest sequences containing all combinations of any length.
The last digit and the first digit are repeated. That’s because every digit except the first and last does “double duty” in belonging to a combination, and since there is an equal total number of digits in the combinations 00, 01 … 98,99 (twenty 0s, twenty 1s, twenty 2s etc… for a total of 200 digits), the first and last digit have to be the same. This is the case for any shortest sequence containing all 2-digit combinations.
Since the last digit and the first digit are always the same, you can fuse them into one digit that does “double duty”, in the process turning this sequence into a great big circle of 100 digits. This is even more elegant; you can start anywhere you like in the sequence and loop around it, and you’ll have gone through every combination when you’ve made a complete loop. A better way to denote our sequence, then, would be
…0011213141516171819102232425262728292033435363738393044546474849405565758595066768696077879708898099…
You can go in either direction around this loop. That’s because every combination has its own unique mirror image: 91 maps to 19, 34 maps to 43, etc… And palindromes map to themselves.
It felt like these observations should apply by extension to shortest sequences for combinations of any length, but I still could not demonstrate to my satisfaction that the shortest circular sequence containing all combinations of x digits had to be of length 10x — in the case of our door-code keypad problem, a loop of 10,000 digits. Also, I was nowhere near being able to count the number of such shortest sequences.
I thought I’d simplify the problem further, then — by reducing the base from 10 to 2. We can do that, because we’ve really just been using digits as stand-ins for distinct objects, and combinations as ordered collections of these objects. For that matter, you could think of our 4-digit codes as being equivalent to 4-letter words, made from an alphabet comprised of 10 letters. In base 2, there are only two letters, if you will: I and O.
In base 2, then, what do the shortest circular sequences containing all 2-letter words“codes” / “combinations” / “words” / “cyphers” / “strings” / “stretches” / “ordered collections”… — it’s all the exact same thing. look like?
…0011…
That’s it, actually (1001, 1100 and 0110 are all shifted versions of the same circular sequence)For completeness’s sake, the shortest circular sequence containing all 1-letter words in base 2 is …01…, and it’s unique, obviously.. Notice that the length is 4, or 22. How about the shortest sequences containing all eight 3-letter words, in base 2?
…00010111… = …00011101…
There are two, though they are each other’s mirror image; proceeding clockwise on the first is equivalent to proceeding counterclockwise on the second. Length: 23 = 8.
Shortest sequences containing all 16 4-letter words, base 2?
…0000100110101111… = …0000111101011001…
…0000100111101011… = …0000110101111001…
…0000101001101111… = …0000111101100101…
…0000101001111011… = …0000110111100101…
…0000101100111101… = …0000101111001101…
…0000101101001111… = …0000111100101101…
…0000101111010011… = …0000110010111101…
…0000110100101111… = …0000111101001011…
That’s 16 sequences of 16 letters each, including mirror images.
To find these sequences, I had to resort to drawing crude graphs, with all the possible words connected by arrows indicating which word could lead to which as we create a sequence (think “continue punching away at the keypad”). For example, both 1010 and 0010 can be followed by 0100 or 0101, in the exact same way that on the door-code keypad, the codes 1001, 2001, … 9001 and 0001 can be followed by 0011, 0012, … 0019 or 0010, depending on which key you choose to punch next.
The trick is to visit each word only once by following the arrows. Here, for example, is a graph with all possible 2-letter words in base 2 as nodes:
This is a much prettier version, made later, after I got some help online. Notice how there is only one way of cycling through all the nodes once.
Here is a graph with all possible 3-letter words in base 2 as nodes, again made much prettier after the fact:
In all these graphs, I’ve also labelled the arrows, because they represent a unique relationship between two words. For example, the arrow from node 011 to 110 is labelled 0110, because that is the 4-letter word created when these 3-letter words follow one another in our sequence.
This hints at a very profound relationship between these graphs: Every arrow on the smaller graph corresponds to a node on the larger graph. Therefore, trying to find a path that cycles through every arrow on the smaller graph is the same as finding a path that cycles through every node on the larger graph. And visually, at least, it is a lot easier to find all the ways to cycle through every arrow on the smaller graph than it is to find all the ways to cycle through every node on the larger graph. Hence I never needed to draw a graph with all 4-letter words as nodes in order to find the 16 shortest circular sequences containing all 4-letter words listed earlier — instead, I just found all the ways to cycle through all the arrows on the larger graph above.
This is as far as I got on my own. It began to feel like I was trying to reinvent the wheel, badly, so I decided to seek professional help. After rummaging about on Mathworld for a while, I hit paydirt: It turns out a loop sequence of letters or digits is called a necklace. The shortest possible such sequence containing all possible words of a certain length is called a de Bruijn sequence. The graph associated with a de Bruijn sequence is called a de Bruijn graph. The path along such a graph that uses every arrow (edge) exactly once is called a Eulerian circuit (and all graphs that allow such a circuit are called Eulerian graphs, making de Bruijn graphs a subset of Eulerian graphs). The path that uses every node (vertex) exactly once is called a Hamiltonian circuit (and you can guess what a Hamiltonian graph is).
It turns out that Leonard Euler provided us with the crucial link that shows why the answer to question (2), “Is it possible to run through every possible combination without being forced to repeat any combination?” is always yes, regardless of alphabet or word size. According to the Mathworld page on Eulerian graphs,
“Euler showed (without proofI don’t know if someone has supplied a proof of this, in the meantime. Something to look up.) that a connected graph is Eulerian if and only if it has no graph vertices of odd degree. […] A directed graph is Eulerian if and only if every graph vertex has equal indegree and outdegree.”
In other words, if on a graph connected by arrows every node has as many arrows coming in as arrows going out, you will always be able to make at least one Eulerian circuit — you will always be able to find a way to cycle through all the arrows without ever getting stuck at a node. This makes sense, intuitively: Take the larger graph above, the de Bruijn graph for base 2, order 3 (which means the nodes represent 3-letter words): every node has an equal number of arrows going out, because the alphabet (base) from which to choose the next letter as you proceed along your circuit is the same at every node. The same goes for arrows coming in, which represent the letter being dropped as we progress along our circuit — it too has to be a member of the same alphabet.
But what about generating a bona fide de Bruijn sequence for our door-code keypad? Well, Mathworld is linked to the math program Mathematica (the brainchild of Stephen Wolfram), and the page on de Bruijn sequences mentions that Mathematica has an algorithm for generating such sequences for any given alphabet and word length. Although Mathematica is perhaps one of the most impressive applications ever made, it also costs 25,000kr., and hence I’ve never managed to justify buying it. Fortunately, there is a 15-day free trial, so I hurriedly downloaded it, and got to work. Here is a trial run, a de Bruijn sequence for 3-digit combinations in base 10:
…9798787770760750740730720710980970960950940930920910108908708608508408308208889998988081009909008007006005004003002000190180170160150140130120119118117116115114113112912812712612512412312213913813713613513413313214914814714614514414314215915815715615515415315216916816716616516416316217917817717617517417317218918818718618518418318219919819719619519419319212111029028027026025024023022922822722622522422392382372362352342332492482472462452442432592582572562552542532692682672662652642632792782772762752742732892882872862852842832992982972962952942932322202103903803703603503403393383373363353349348347346345344359358357356355354369368367366365364379378377376375374389388387386385384399398397396395394343330320310490480470460450449448447446445945845745645546946846746646547947847747647548948848748648549949849749649545444043042041059058057056055955855755695685675665795785775765895885875865995985975965655505405305205106906806706696686679678677689688687699698697676660650640630620610790780779778978879…
Again, I love the density of these sequences. When it came to generating a de Bruijn sequence for 4 digits in base 10, though, Mathematica took its time — 2 hours and 45 minutes, to be exact. By then I had grown restless and had been googling “de Bruijn sequence”, only to find a most surprising page among the results: A Swedish blogger! Hakan Kjellerstrand has a page up with a much faster algorithm for generating de Bruijn sequences. He even singles out the specific solution for Stockholm door-code keypads. So click on that link, print out the result and keep it handy whenever you have a door-code keypad blocking your way.
But two of the five original questions remained unanswered: First, how many such sequences are there? Hakan and Mathematica both provide one sample de Bruijn sequence, not a counting function for de Bruijn sequences for a given alphabet and word length.
It turns out the answer wasn’t even known for sure until 2002, when Vladimir Rosenfeld at the University of Haifa published a proof for the general formula in a paper titled “Enumerating de Bruijn Sequences.” It’s not online, as far as I can tell, but he does talk about his results in another paper of his, Enumerating Kautz Sequences (original link).
According to Rosenfeld, in 1946 Dutch mathematician N.G. De Bruijn proved a formula for counting the number of shortest circular sequences containing all q-letter words using a 2-letter alphabet (base 2). Here it is:
total number = 22(q-1)–q
Indeed, for q = 3 the result is 2 and for q = 4 the result is 16, as shown earlier. This made him famous, and led to this entire class of sequences being named after him, though it doesn’t seem like he proved a generalized formula for any given alphabet size (base). In 2002, Rosenfeld did just that, apparently. For an alphabet size s and word length size q, the number of circular de Bruijn sequences of length sqis! means factorial. 4! = 4x3x2x1:
(s!)s(q-1)s–q
Making s=10 and q=4 gives us 10!1000/10000, or 5.79×106555Update 2004-10-09: Well, you can’t trust the internet for anything, can you? The paper by Rosenfeld misprints the formula for the number of circular de Bruijn sequences (I’ve corrected the main text now). The relationship between the number of circular de Bruijn sequences and the number of linear de Bruijn sequences didn’t look right when I read the paper — I thought the difference between these two formulas should be a factor of s^q, because every circular de Bruijn sequence seemed to me to contain that many starting points for linear de Bruijn sequences. But hey, I didn’t prove the results, and my blog doesn’t have “expert anonymous reviewers,” so what would I know? Today, though, I found a different formula for the number of circular de Bruijn sequences in Stephen Wolfram’s New Kind of Science and this one confirms my hunch. And so far, Wolfram has never failed me. (The formula for the linear de Bruijn sequence remains the same, though.)
Update 2004-10-21:Vladimir sent me his original paper, Enumerating de Bruijn sequences [PDF] and indeed there the formula is correct. And an interesting read., sequences. If we want to tally up the more real-world situation, where you need to key in 10,003 digits into the door-code keypad, then the formula is slightly modified:
(s!)s(q-1)
which gives 10!1000 or 5.79×106559 linear de Bruijn sequences. That’s our answer for question (4).
Which leaves us only question (5): “How likely am I to spontaneously generate the shortest possible sequence if I pursue the second, random strategy described above?” To answer this, all we need is to divide the total number of linear de Bruijn sequences (which we just counted) into the total number of sequences that are 10,003 digits long (which is just 1010,003). The answer: 1 in 5.79×103444 times. Given that the number of stars in the universe is about 7×1022, it’s a safe bet to say that every pig will need to fly before anyone ever achieves this feat.
Postscript: De Bruijn sequences are actually useful, and crop up in several seemingly unrelated fields. They provide the mathematical underpinning for DNA manipulation tools — DNA being nothing but sequences of words composed of the letters G, A, T and C. Rosenfeld’s paper, too, aimed to provide mathematical tools for understanding “minimal generating sequences” in DNA, that is, “the sequence of minimal length that produces all possible amino acids.” Also, de Bruijn sequences seem to get mentioned a lot in connection with cryptography, especially by Stephen Wolfram in the footnotes to A New Kind of Science, though how or why is something I think I need to look into next.