Political blogs are from Mars, community blogs are from Venus

Annica Tiger has written two interesting posts (with good comment streams) on women bloggers in Sweden, asking in particular why so few are represented on Bloggforum‘s panelsOm du tänker komma till Bloggforum Stockholm 2004, glöm inte att registrera dig. Och förlåt, men jag hinner inte än om att skriva om viktiga saker på svenska. Nja, jag skulle kunna göra det, men du skulle inte vilja läsa det. Ger mig ett år till.. If roughly 50% of bloggers are women, why are only two out of 15 participants women? Shouldn’t a representative sample of Swedish bloggers have a roughly equal number of men and women?

It should. But Bloggforum is not a representative sample of Swedish bloggers. To explain how this came about, maybe it’s best to ask “Why have Bloggforum at all?” The forum (I think) shouldn’t try to replicate in the flesh what blogging does best digitally — and blogging can adeptly cover a great many issues, for example the very one we’re discussing now. The whole blogging medium is geared towards conversation, so why “blog unplugged” in a forum setting?

Because blogs are somewhat of a closed system in a society that is not yet fully aware of them. Because the conversation about blogging should include those who don’t blog. Because many professions are poised to be affected by the rise of personal publishing, and professionals who are already blogging are the best positioned to help with the transition.

Bloggforum participants, then, reflect the rise of “pro-blogging” in Sweden. In the US, pro blogs include such notables as Gawker, Wonkette, Gothamist, Talking Points Memo, Andrew Sullivan, Instapundit, Daily Kos, Crooked Timber, The Volokh Conspiracy, Matthew Yglesias, Juan Cole and James Lileks. One thing they have in common is that they are read by non-bloggers much more widely than other blogs. The other thing they have in common is that they are predominantly authored by men.

Well, at least many genres of pro blog are. Political opinion bloggers are almost to a man, er, male, with the notable exceptions of Virginia Postrel and Ana Marie Cox on either extreme of the seriousness spectrum. (In contrast, the sexes are more balanced on US newspaper opinion pages). Blogger-professionals, like lawyers and journalists, also tend to be male (while again, the sexes are more balanced in the profession at large). Satire and personality-cult blogs, however, seem to be a female bastion (Wonkette again, Eurotrash, Maccers, Belle de Jour), while community blogs like Gothamist are pretty evenly split.

The Swedish blogosphere has now entered its pro-blogging phase, but not uniformly on all fronts. It is the political and media blogs which are leading the charge, and — as in the US — these are predominantly authored by men. It is this kind of blogging that the current Bloggforum focuses on, not because it is inherently more interesting than the more personal strain of blogging (and certainly not because it is dominated by men), but because it is, right now, more relevant to the debate about whether blogging can change the political and media landscape in Sweden. These are the questions most likely to perk the ears of mainstream media, and hence most likely to raise the profile of blogging, which leads to more readers for all.

In the meantime, I can’t wait for a Stockholm city events blog, or one that dredges the gossip rags and solicits celeb sightings from readers. Or how about a Stockholm restaurant review blog, by an anonymous foodie with an appetite, an expense account and a snarky palate? A Swedish culture blog? — someone should release into the wild interns with attitude to sniff out the good from the bad from the ugly among Stockholm’s gallery and concert offerings. There is already one pioneer, of course: Anna’s still unique fashion/shopping blog. Whoever authors these future blogs — men or women — should be on future Bloggfora.

For what it’s worth, I have a few theories as to why political and media blogs in particular are predominantly male, even while both sexes populate the field:

1) I am biased, and I don’t know it, so I just think there are more men than women authoring these kinds of blogs.

2) Political blogging is by nature an aggressive, competitive sport, prone to combative stances, and men tend to like this environment more than women.

3) Media blogging is by nature all about professional self-promotion, and men are shameless.

4) Women, being mature, don’t depend on ego-affirming site statistics for a sense of self-worth.

I’d love to hear why I am completely off the mark in this post. I’m not all that sure about what I’ve just written.

Blogging eye for the Swedish political guy

Politiskt.nu should be the center of political debate in the Swedish blogosphere. It is not. The site is all dolled up with fancy technical solutions, ready for political sparks to fly, but instead is proof that if you build it, they won’t come unless you do it right. Politiskt.nu is in dire need of a makeover. I’ll oblige:

1. Post! This group blog of six saw three weeks go by between its last two posts. Not surprisingly, the last comment is from Oct 7In fact, because posts published more than two weeks ago have their comments closed automatically, it was impossible to comment at all for a week on politiskt.nu, until tonight, when the first post in three weeks was published.. You have to show some enthusiasm before others will. It’s just like dating — you have to like yourself before others will like you.

2. Link generously! Every post on politiskt.nu is just a wall of text, even when it refers to other articles. Blogging is not the same as publishing newspaper articles in reverse chronological order. It takes a different mindset. Your post has to mesh with the web. You should use links instinctively as shorthand for explanations, attributions, sources, proofs, references, punchlines of jokes… They are obligatory.

3. Get personal. Is there a relationship between the bloggers on Politiskt.nu? Do they comment on each other’s posts? Do their posts acknowledge each other’s presence? Do they even read each other? I couldn’t tell. It’s as if there are six unconnected blogs here. We want a show — some butting of heads, parries, retreats, comebacks, good points, nice catches, notes of grace in victory… There has to be a logic behind having these bloggers under one roof.

4. Catch the blogging bug. Were the contributors rearing to blog when they were recruited, or did blogging have to be explained to them? Were they ardent readers of and commenters on other blogs before they themselves started? Do they have their finger on the pulse of what’s exciting Swedish bloggers right now? Do they get it? It has to be a grassroots effort — a group blog’s stable of talent should not have to be cajoled into posting.

5. Ditch registration. If you are going to have commenting, don’t turn it into a privilege. Stop forcing people to register — you’ll lose 90% of your commenters. Nobody cares enough about your blog to remember yet another password. Even Movable Type’s new system, which lets you register one identity valid for a slew of blogs, is not really catching on — and that’s because commenters are in a buyer’s market; there are plenty of blogs vying for their input, and if you put up barriers, these other blogs are but a click away. If you build dams, the critical mass will stay on the other side.

if you’re concerned about spam, there are some good technical solutions out there. Ever since I added an extra question to my blog commenting form which humans find easy to answer but machines not, I get around 2 manually submitted spam comments a month — down from around 100 automated ones a dayIt now works withMT 3.11‘s new templates too, thanks to Strang’s efforts.. It’s an easy solution to implement as well, one which any content management system worth its salt should manage. Finally, if you have a blog with comments, you have to resign yourself to regular weeding — you can’t legislate away abuse.

6. Focus on the essentials. Simpify your site so it becomes more amenable to the daily quick fix visit. Nobody uses the calendar to navigate blogs, so ditch itThe one thing a calendar does do spectacularly well is show a dearth of posts. And how useful is one of these on the first of the month?. Instead, give more space to your most recent comments. Don’t put your blogroll in a drop-down menu. Don’t put your bloggers’ names in a drop-down menu — these names are your main draw, and should be visible as soon as you visit the site, without having to mouse over pictures or click on menus. RSS is good, but just put up the link — it’s not your job to explain it. In fact, nothing in your horizontal menu bar is essential, on the grounds that it is better to do it than to talk about it. Give and accept trackbacks. And, pardon me, but I HATE not being able to see the URL of where I am in the browser’s address bar. That is so 1998.

Of all these, tips 5 and 6 are the easiest to do. Tip number 4 is the hardest, but if you get that right, numbers 3,2 and 1 will follow effortlessly.

Pardon the tone, it was for effect. Politiskt.nu is a good enough idea to attempt salvaging.

I also do requests

Commenting on my earlier post on Stockholm door code sequences, a reader writes (well, OK, it’s Geoff):

Can you do the same thing with the rings and the sequences and the words=numbers=objects in a list, and tell me just how long it would take for the proverbial monkey to type up a copy of Hamlet by randomly banging away at a typewriter?

Seriously.

Geoff, you’re right, that monkey is indeed proverbial. Wikipedia has a very interesting article about him and the various guises in which he and his infinite siblings have been bashing away at typewriters for nearly 100 years now, ever since he was evoked by French mathematician Émile Borel to illustrate an otherwise not-too accessible law of mathematics.

I think the reason our typing monkey has entered popular consciousness to the extent he has is that he straddles two of our fascinations: Our obsessive habit as a species to seek out patterns in nature; and infinity, around which we just never manage to wrap our minds, try as we might.

The reason for the first of these two fascinations, I think, is that we humans are really just finely honed cause-and-effect detectors, hoping to use this skill to avoid harm long enough to procreate. When our detectors misfire — when we generate false positives — we notice coincidences. How we deal with coincidences depends on our ability to intuit the odds of unlikely juxtapositions occurring randomly (and they do occur). Most of us are terrible at such estimations, so we end up turning coincidences into meaningful events, letting them fuel our superstitious beliefs.

We’re suddenly in the middle of a digression here, I know, but there is an interesting corollary example of this: People who buy lottery tickets of the PowerBall variety avoid choosing a sequence like 1,2,3,4,5 and 6; it just doesn’t look random enough to win. In fact, if that were the only option available, I suspect many habitual players would not be willing to pay for it, even though the chance of that sequence winning is exactly the same as their preferred “random” sequence, of which “type” there are far more.

Because the number of sequences in which the average lottery player can detect a pattern is far lower than the total number of possible sequences, as a group “sequences in which I see a pattern” wins less often than “sequences in which I can’t see a pattern.” This much is true. The mistake comes in thinking that membership of the larger group increases a specific sequence’s chances of being selected at random.

An exploration of this fallacy propels Inflexible Logic by Russell Maloney, a wonderful short story from The New Yorker circa 1940. In it, the protagonist decides to test empirically whether six chimpanzees eventually do end up writing all the works in the British Museum — with remarkable results.

And one of my favorite writers, Jorge Luis Borges, wrote The Library of BabelBoth these short stories are worth copying and pasting into a word processing document, printing out and reading, if you have a spare half hour., a short story which explores the futility of deriving meaning from patterns found in sequences if all possible sequences exist. Who else but Borges would think to use that as a plot device!?

This brings us back to the problem at hand. Although our typing monkey has had much coverage on the web, I have not found an actual calculation of the probability he would type out a copy of Hamlet in any given sitting. So, Geoff, I will oblige you:

The calculation is very simple. Take this copy of Hamlet. It contains 32,197 words made from 194,270 characters. The “alphabet” of possible characters includes both lowercase and uppercase letters, punctuation marks and spaces — let’s say 64 characters in total. The chance that a monkey randomly types out Hamlet in a given sitting, then, is one in 64194,270. According to Mathematica, that equals one in 3.833 x 10350,886 — a staggeringly small chance.

Another way to conduct this experiment would be to find and then line up 194,270 monkeys and put a typewriter in front of each of them. We let all of them hit a single key each at a time, and string together the result. If we manage to train them to type one character per second, we get a potential Hamlet text every second. There have been 441,504,000,000,000,000, or 4.415 x 1017 seconds since the Big Bang, approximately, so if our monkeys had started typing soon after the birth of the universe, the probability that they’d have something for us by now is 1-(1-1/(3.833 x 10350,886))4.415 x 1017Unfortunately, even Mathematica gets an overflow error trying to calculate that. Methodology: First you calculate the probability of a copy of Hamlet not being typed at a given sitting (1-1/(3.833 x 10350,886)), then you raise that to the power of the number of sittings, in our case the number of seconds since the Big Bang, 4.415 x 1017. This gives us the probability of Hamlet not having been written after all these seconds; to find the probability that it has, just subtract that number from 1..

That’s a vanishingly small chance. According to our French mathematician Borel, who actually thought a great deal about this, the class of events with probabilities of less than one in 1050 of occurring are negligible on a cosmic scale. The probability our monkeys will type Hamlet is certainly in that class. However, Borel also came up with a class of events with probabilities that are negligible on a “supercosmic” scale — probabilities of less than one in 101010, or 1010,000,000,000— something exceedingly unlikely to happen even if given an inordinate number of universes to play with. We’d definitely have a text of Hamlet before long on this scale, according to our calculations.

But Borel gave an example of an event with a negligible chance of occurring even at the supercosmic scale: the chance that a container holding a mixture of a fair number of oxygen atoms and nitrogen atoms would spontaneously have all the oxygen atoms jump to one side and the nitrogen atoms to the other side, thus organizing itself, decreasing the system’s entropy and breaking the Second Law of Thermodynamics.

We can therefore state with confidence that monkeys will type Hamlet long before the Second Law of Thermodynamics breaks.

The de Bruijn Code

Instead of doormen, Stockholm apartment buildings have a little keypad at the front door, into which you key a 4-digit code to gain access to the lobby. The code is known to residents, and is liberally shared with acquaintances, workmen, postmen, and no doubt sold to an underworld of snail-mail spammers and worseI’d love to see a database of Stockholm door codes, though, if only to gain a little humint on what are the most popular 4-digit combinations..

The keypad is unusual in that there is no enter key. Instead, it remembers the last 4 digits you have pressed, and compares those to the code which unlocks the door. In other words, if you key in the sequence 123456, you’ll have tried codes 1234, 2345 and 3456. If the correct code is 2345, you’ll have opened the door after keying in the first 5 digits in the above sequence.

This is different from how a bank machine keypad operates, where you need to key in four digits (and then enter) for every PIN code number you try. If you were to forget your code, you’d have to go through a maximum of 40,000 key punches (not counting the mathematically boring enter key): 10,000 combinations of 4 keys each (0000, 0001 … 9998, 9999).

On a door-code keypad, however, it’s clear that you can cycle through every combination in far fewer than 40,000 key punches: If I key in just the sequence 1234567890, for example, I have already tried 7 different 4-digit combinations with just 10 key punches. On a bank machine keypad, it would have taken me 28 key punches to try 7 different 4-digit combinations.

This is interesting information for all those of us who think it within the bounds of possibility they might one day find themselves standing somewhat stupefied in the snow outside their Stockholm apartment at 3am on a Sunday in January, with the mercury all shrivelled up, not a soul in sight and not a code in mind. Deciding beforehand on the right contingency strategy for such a situation could shave hours off the frigid poking at that dumb but durable little keypad which would then have to follow.

A rundown of the options, then:

  • Going through the combinations bank-machine style. Pros: Counting from 1 to 10,000 is easy; you’re guaranteed to hit your code eventually Cons: 40,000 key punches at around 2 per second will take just over 5 1/2 hours. With luck, your code is 0012 — but nobody is ever lucky at 3am on Sundays.
  • Going at it randomly. Pros: No thinking required whatsoever. Cons: Looks pathetic; generating truly random patterns is nearnigh impossible for humans, especially drunk ones; you might actually never type in the correct code.
  • Calculating beforehand the shortest possible sequence that cycles through all possible codes, writing it down and keeping it in your parka in case of just this kind of emergency. Pros: Should get you into any warm lobby in no time, relatively speaking. Cons: The “beforehand” part of the first sentence in this paragraph.
  • The rest of this post concerns itself with exploring the possibility of making this third option a viable one for Stockholmers. To that end, I have been wondering: (1) What is the minimum amount of door-code keypad punches needed to try every possible combination? (2) Is it possible to run through every possible combination without being forced to repeat any combination? (3) Is there a unique such solution? (4) If not, how many solutions are there? And finally, (5) How likely am I to spontaneously generate the shortest possible sequence if I pursue the second, random strategy described above?

    If the answer to (2) is yes, it’s easy to see that the answer to (1) is 10,003. That’s because after three punches, every subsequent key I press adds a new 4-digit combination to the sequence, and there are 10,000 combinations to go through. In this best-case scenario, at 2 presses per second, we’d go through every combination in under 84 minutes.

    As I started looking into this, it felt like the answer to (2) should be yes, but I lacked the mathematical tools to back up my hunch. So I decided to try to simplify the problem and then apply brute force. Instead of looking for the shortest sequence that contains all possible 4-digit combinations without repetitions, I decided to try to find one that contains all 2-digit combinations without repetitions — the shortest sequence that would unlock a door-code keypad with a 2-digit code.

    I printed out a 10×10 grid with all 100 combinations (00, 01 … 98, 99) and headed for a café in Kungsholmen. After a café latte’s worth of trying to connect the numbers, I’d found a sequence 101 digits long that contained all the combinations exactly once — showing that (2) is true, at least for 2 digits:

    00112131415161718191022324252627282920334353637383930445464748494055657585950667686960778797088980990

    I can’t help it, but I happen to feel that that’s a pretty cool sequence — it feels so dense, so efficient. Just from inspection, some other things become clear:

  • There are many distinct such shortest possible sequences. For example, you can make some others just by moving the 00 at the start of the sequence to wherever else there is a 0, just by moving the first 0 next to any other 0. This means the answer to question (3) is No, there is no unique solution; not for two digits, and by extension, not for four digits, because here too we’d be able to insert the specific combination 0000 anywhere we find 000 in the sequence. The same holds for shortest sequences containing all combinations of any length.
  • The last digit and the first digit are repeated. That’s because every digit except the first and last does “double duty” in belonging to a combination, and since there is an equal total number of digits in the combinations 00, 01 … 98,99 (twenty 0s, twenty 1s, twenty 2s etc… for a total of 200 digits), the first and last digit have to be the same. This is the case for any shortest sequence containing all 2-digit combinations.
  • Since the last digit and the first digit are always the same, you can fuse them into one digit that does “double duty”, in the process turning this sequence into a great big circle of 100 digits. This is even more elegant; you can start anywhere you like in the sequence and loop around it, and you’ll have gone through every combination when you’ve made a complete loop. A better way to denote our sequence, then, would be
  • …0011213141516171819102232425262728292033435363738393044546474849405565758595066768696077879708898099…

  • You can go in either direction around this loop. That’s because every combination has its own unique mirror image: 91 maps to 19, 34 maps to 43, etc… And palindromes map to themselves.
  • It felt like these observations should apply by extension to shortest sequences for combinations of any length, but I still could not demonstrate to my satisfaction that the shortest circular sequence containing all combinations of x digits had to be of length 10x — in the case of our door-code keypad problem, a loop of 10,000 digits. Also, I was nowhere near being able to count the number of such shortest sequences.

    I thought I’d simplify the problem further, then — by reducing the base from 10 to 2. We can do that, because we’ve really just been using digits as stand-ins for distinct objects, and combinations as ordered collections of these objects. For that matter, you could think of our 4-digit codes as being equivalent to 4-letter words, made from an alphabet comprised of 10 letters. In base 2, there are only two letters, if you will: I and O.

    In base 2, then, what do the shortest circular sequences containing all 2-letter words“codes” / “combinations” / “words” / “cyphers” / “strings” / “stretches” / “ordered collections”… — it’s all the exact same thing. look like?

    …0011…

    That’s it, actually (1001, 1100 and 0110 are all shifted versions of the same circular sequence)For completeness’s sake, the shortest circular sequence containing all 1-letter words in base 2 is …01…, and it’s unique, obviously.. Notice that the length is 4, or 22. How about the shortest sequences containing all eight 3-letter words, in base 2?

    …00010111… = …00011101…

    There are two, though they are each other’s mirror image; proceeding clockwise on the first is equivalent to proceeding counterclockwise on the second. Length: 23 = 8.

    Shortest sequences containing all 16 4-letter words, base 2?

    …0000100110101111… = …0000111101011001…

    …0000100111101011… = …0000110101111001…

    …0000101001101111… = …0000111101100101…

    …0000101001111011… = …0000110111100101…

    …0000101100111101… = …0000101111001101…

    …0000101101001111… = …0000111100101101…

    …0000101111010011… = …0000110010111101…

    …0000110100101111… = …0000111101001011…

    That’s 16 sequences of 16 letters each, including mirror images.

    To find these sequences, I had to resort to drawing crude graphs, with all the possible words connected by arrows indicating which word could lead to which as we create a sequence (think “continue punching away at the keypad”). For example, both 1010 and 0010 can be followed by 0100 or 0101, in the exact same way that on the door-code keypad, the codes 1001, 2001, … 9001 and 0001 can be followed by 0011, 0012, … 0019 or 0010, depending on which key you choose to punch next.

    The trick is to visit each word only once by following the arrows. Here, for example, is a graph with all possible 2-letter words in base 2 as nodes:

    dBG[2,2].gif

    This is a much prettier version, made later, after I got some help online. Notice how there is only one way of cycling through all the nodes once.

    Here is a graph with all possible 3-letter words in base 2 as nodes, again made much prettier after the fact:

    dBG[2,3].gif

    In all these graphs, I’ve also labelled the arrows, because they represent a unique relationship between two words. For example, the arrow from node 011 to 110 is labelled 0110, because that is the 4-letter word created when these 3-letter words follow one another in our sequence.

    This hints at a very profound relationship between these graphs: Every arrow on the smaller graph corresponds to a node on the larger graph. Therefore, trying to find a path that cycles through every arrow on the smaller graph is the same as finding a path that cycles through every node on the larger graph. And visually, at least, it is a lot easier to find all the ways to cycle through every arrow on the smaller graph than it is to find all the ways to cycle through every node on the larger graph. Hence I never needed to draw a graph with all 4-letter words as nodes in order to find the 16 shortest circular sequences containing all 4-letter words listed earlier — instead, I just found all the ways to cycle through all the arrows on the larger graph above.

    This is as far as I got on my own. It began to feel like I was trying to reinvent the wheel, badly, so I decided to seek professional help. After rummaging about on Mathworld for a while, I hit paydirt: It turns out a loop sequence of letters or digits is called a necklace. The shortest possible such sequence containing all possible words of a certain length is called a de Bruijn sequence. The graph associated with a de Bruijn sequence is called a de Bruijn graph. The path along such a graph that uses every arrow (edge) exactly once is called a Eulerian circuit (and all graphs that allow such a circuit are called Eulerian graphs, making de Bruijn graphs a subset of Eulerian graphs). The path that uses every node (vertex) exactly once is called a Hamiltonian circuit (and you can guess what a Hamiltonian graph is).

    It turns out that Leonard Euler provided us with the crucial link that shows why the answer to question (2), “Is it possible to run through every possible combination without being forced to repeat any combination?” is always yes, regardless of alphabet or word size. According to the Mathworld page on Eulerian graphs,

    “Euler showed (without proofI don’t know if someone has supplied a proof of this, in the meantime. Something to look up.) that a connected graph is Eulerian if and only if it has no graph vertices of odd degree. […] A directed graph is Eulerian if and only if every graph vertex has equal indegree and outdegree.”

    In other words, if on a graph connected by arrows every node has as many arrows coming in as arrows going out, you will always be able to make at least one Eulerian circuit — you will always be able to find a way to cycle through all the arrows without ever getting stuck at a node. This makes sense, intuitively: Take the larger graph above, the de Bruijn graph for base 2, order 3 (which means the nodes represent 3-letter words): every node has an equal number of arrows going out, because the alphabet (base) from which to choose the next letter as you proceed along your circuit is the same at every node. The same goes for arrows coming in, which represent the letter being dropped as we progress along our circuit — it too has to be a member of the same alphabet.

    But what about generating a bona fide de Bruijn sequence for our door-code keypad? Well, Mathworld is linked to the math program Mathematica (the brainchild of Stephen Wolfram), and the page on de Bruijn sequences mentions that Mathematica has an algorithm for generating such sequences for any given alphabet and word length. Although Mathematica is perhaps one of the most impressive applications ever made, it also costs 25,000kr., and hence I’ve never managed to justify buying it. Fortunately, there is a 15-day free trial, so I hurriedly downloaded it, and got to work. Here is a trial run, a de Bruijn sequence for 3-digit combinations in base 10:

    …9798787770760750740730720710980970960950940930920910108908708608508408308208889998988081009909008007006005004003002000190180170160150140130120119118117116115114113112912812712612512412312213913813713613513413313214914814714614514414314215915815715615515415315216916816716616516416316217917817717617517417317218918818718618518418318219919819719619519419319212111029028027026025024023022922822722622522422392382372362352342332492482472462452442432592582572562552542532692682672662652642632792782772762752742732892882872862852842832992982972962952942932322202103903803703603503403393383373363353349348347346345344359358357356355354369368367366365364379378377376375374389388387386385384399398397396395394343330320310490480470460450449448447446445945845745645546946846746646547947847747647548948848748648549949849749649545444043042041059058057056055955855755695685675665795785775765895885875865995985975965655505405305205106906806706696686679678677689688687699698697676660650640630620610790780779778978879…

    Again, I love the density of these sequences. When it came to generating a de Bruijn sequence for 4 digits in base 10, though, Mathematica took its time — 2 hours and 45 minutes, to be exact. By then I had grown restless and had been googling “de Bruijn sequence”, only to find a most surprising page among the results: A Swedish blogger! Hakan Kjellerstrand has a page up with a much faster algorithm for generating de Bruijn sequences. He even singles out the specific solution for Stockholm door-code keypads. So click on that link, print out the result and keep it handy whenever you have a door-code keypad blocking your way.

    But two of the five original questions remained unanswered: First, how many such sequences are there? Hakan and Mathematica both provide one sample de Bruijn sequence, not a counting function for de Bruijn sequences for a given alphabet and word length.

    It turns out the answer wasn’t even known for sure until 2002, when Vladimir Rosenfeld at the University of Haifa published a proof for the general formula in a paper titled “Enumerating de Bruijn Sequences.” It’s not online, as far as I can tell, but he does talk about his results in another paper of his, Enumerating Kautz Sequences (original link).

    According to Rosenfeld, in 1946 Dutch mathematician N.G. De Bruijn proved a formula for counting the number of shortest circular sequences containing all q-letter words using a 2-letter alphabet (base 2). Here it is:

    total number = 22(q-1)q

    Indeed, for q = 3 the result is 2 and for q = 4 the result is 16, as shown earlier. This made him famous, and led to this entire class of sequences being named after him, though it doesn’t seem like he proved a generalized formula for any given alphabet size (base). In 2002, Rosenfeld did just that, apparently. For an alphabet size s and word length size q, the number of circular de Bruijn sequences of length sqis! means factorial. 4! = 4x3x2x1:

    (s!)s(q-1)sq

    Making s=10 and q=4 gives us 10!1000/10000, or 5.79×106555Update 2004-10-09: Well, you can’t trust the internet for anything, can you? The paper by Rosenfeld misprints the formula for the number of circular de Bruijn sequences (I’ve corrected the main text now). The relationship between the number of circular de Bruijn sequences and the number of linear de Bruijn sequences didn’t look right when I read the paper — I thought the difference between these two formulas should be a factor of s^q, because every circular de Bruijn sequence seemed to me to contain that many starting points for linear de Bruijn sequences. But hey, I didn’t prove the results, and my blog doesn’t have “expert anonymous reviewers,” so what would I know? Today, though, I found a different formula for the number of circular de Bruijn sequences in Stephen Wolfram’s New Kind of Science and this one confirms my hunch. And so far, Wolfram has never failed me. (The formula for the linear de Bruijn sequence remains the same, though.)
     
    Update 2004-10-21:Vladimir sent me his original paper, Enumerating de Bruijn sequences [PDF] and indeed there the formula is correct. And an interesting read.
    , sequences. If we want to tally up the more real-world situation, where you need to key in 10,003 digits into the door-code keypad, then the formula is slightly modified:

    (s!)s(q-1)

    which gives 10!1000 or 5.79×106559 linear de Bruijn sequences. That’s our answer for question (4).

    Which leaves us only question (5): “How likely am I to spontaneously generate the shortest possible sequence if I pursue the second, random strategy described above?” To answer this, all we need is to divide the total number of linear de Bruijn sequences (which we just counted) into the total number of sequences that are 10,003 digits long (which is just 1010,003). The answer: 1 in 5.79×103444 times. Given that the number of stars in the universe is about 7×1022, it’s a safe bet to say that every pig will need to fly before anyone ever achieves this feat.

    Postscript: De Bruijn sequences are actually useful, and crop up in several seemingly unrelated fields. They provide the mathematical underpinning for DNA manipulation tools — DNA being nothing but sequences of words composed of the letters G, A, T and C. Rosenfeld’s paper, too, aimed to provide mathematical tools for understanding “minimal generating sequences” in DNA, that is, “the sequence of minimal length that produces all possible amino acids.” Also, de Bruijn sequences seem to get mentioned a lot in connection with cryptography, especially by Stephen Wolfram in the footnotes to A New Kind of Science, though how or why is something I think I need to look into next.

    NetNewsWire 2 vs. Shrook 2

    For the last few days I’ve been playing with the public beta of NetNewsWire 2, the successor to the first popular newsfeed reader for the Mac and new competitor to Shrook 2, the current favorite in the fieldA Newsfeed reader lets you collect, manage and display newsfeeds, which are stripped-down versions of the most recent stories published by websites such as newspapers, wire services and blogs. A site’s newsfeeds are updated whenever the site is updated; they come in standardized formats (such as RSS), and are fetched regularly by the newsfeed reader, making it possible to keep tabs on hundreds of websites automatically, instead of having to visit these sites individually to see what’s new..

    I have taken an instant liking to NNW2. While I was impressed by the features of Shrook 2 when that application came out, I found it to have too many quirks for me to use it as the predominant way of being apprised of what’s new on my favorite sites. Instead, I returned to hunting and pecking at websites with my conventional browser, these days Camino.

    NNW2 has quirks too, and it is still beta, but the frustrations I have with this application are of a different kind. NNW2 is so good that I can now see how I will surf the web in a year or two; it’s just that it’s not quite there yet.

    How will I surf the web in a year or two? The application I will use will be a browser, with tabs, history, bookmarks, pop-up blocking and what-have-you, but also with souped-up feed management and display tools much like what NNW2 and Shrook 2 offer today.

    Both these newsfeed readers now let you toggle between the stripped down version of a story and the much prettier browser version, rendered inside the application. But whereas Shrook 2 does this as a feature, it is much more central to NNW2, which has a back button, an address bar, tabs even! It feels so natural to begin using NNW2 as a starting point for one’s net adventures that it comes as a shock when you instinctively look for but can’t find a google search bar on the top right, or a browse history, or a home button, or lack the ability to access your non-feed shortcuts.

    But I now know what I want. And if you read NNW2’s feature request/bug report page, you can see that other beta users are also screaming for a full-fledged in-line browser. The future lies not with newsfeed readers that also render websites but with browsers that also manage and display newsfeedsApple’s next iteration of Safari, shown in the preview of OS X 10.4, lets you collect and display newsfeeds, though it is far from clear how impressive the newsfeed management system is..

    What’s more, my future browser will have a system for organizing my feeds that is a hybrid of how Shrook 2 and NNW2 do it today. Simply put, Shrook 2 uses the iTunes library/playlist metaphor, NNW2 uses the email folder metaphor. Both have drawbacks — Like iTunes, Shrook 2’s system is not hierarchical; like Mail or Microsoft Entourage, NNW2’s system does not allow aliases of the same item in different foldersSure, in NNW2 you can subscribe to the same feed as many times as you want and put these shortcuts in different folders, but that is not efficient on many levels..

    I want hierarchical folders like NNW2 has. If I highlight my top-level folder called “Sweden”, I get to see all new stories contained by all the feeds in all the subfolders (NNW2 calls these folders “groups.”) If I expand that folder and highlight a subfolder, say, “Politics/media,” I get to see new stories from just the feeds therein. In Shrook 2, on the other hand, you can’t expand folder/playlists in the left-most column — a whole extra column is required to see what’s in them, leading to a four-column layout that I find unwieldy. And, unlike with folders, playlists (Shrook calls them “channels”) can’t contain subplaylists. I have no idea why not, actually.

    But I also want to be able to scatter multiple instances of a newsfeed across folder/channels, like Shrook 2 allows. For example, I’d like www.kimthew.com to show up both in my “Friends” folder/channel and “New York” folder/channel. However, should I decide I want to banish my friend forever, it should take but a single swipe from a master library. NNW2 doesn’t let me do this.

    And then there are features I want that neither application has yet: For example: hierarchical smart foldersSmart folders are folders that are dynamically populated by news items that meet specific criteria, such as, say, those containing the text fragments “swed” or “svensk” or “sverige”.. Current implementations of these smart folders are a bit crude: You have to choose between matching any criteria or all criteria. You can’t currently make a smart folder that displays items containing either the text “SvD” or “DN” or “Expressen” or “Aftonbladet” but which must also contain the text “Stefan Geens”, for example. In other words, you can’t mix and match the logical OR and AND operators. One solution would be to allow hierarchical smart folders — A top level smart folder would look for the mention of Swedish daily newspapers, a sublevel smart folder would look for my name among those results.

    Many of the features in NNW2 were first introduced by Shrook 2. But a couple of them are unique to NNW2, to my knowledge:

    NNW2 can compare the current newsfeed’s contents to the previous one. The results can be highly amusing:

    gizmodo.gif

    NNW2 let’s you save a search in Feedster, Daypop or Blogdigger — they’re like Googles for newsfeeds — as a virtual newsfeed. The most recent news items that include your search term are in it. If you want to know if anyone at all is blogging “Belgium”, even if you don’t subscribe to their newsfeed or have never heard of them, this is the way to do it.

    Other clever ideas include connecting scripts to a virtual newsfeed, so you can go scrape websites that have no newsfeeds themselves; a customizable toolbar that provides excellent access to quick toggling of viewing styles; and a tool for finding “dinosaur” newsfeeds, which haven’t been updated in a while. There is also something that is supposed to share your newsfeed collection with others, but it doesn’t work yet in the beta.

    Update 2004-09-28: Ranchero have now announced they too will support syncing and bandwidth use managment in NNW2.Shrook 2, meanwhile, still has a much better solution in place for managing bandwidth issues: Their distributed checking feature is truly clever, whenever it works. They also let you access your newsfeed collection via any browser, so you can keep up when you are travelling.

    A list of other Mac newsreaders can be found here.So if you have already shelled out $25 for Shrook 2, your investment is safe, but if you have yet to adopt a newsreader as your own, take a very good look at same-priced NetNewsWire 2 when it comes out of beta.

    Skype

    As a consequence of the need to tend to several work-related projects, the presence of a looming urge to redesign this blog, a spurt of blogging over at MemeFirst and a rather heavy post on Leopold II last week to come down from, you will now be served a few posts blissfully free of any import or presumption: A roundup of recent Mac software I’ve used and can recommend. Today:

    A voice chat application for both PC and Mac that lets you talk to other users for free but also make cheap calls to real phones.Skype: This second beta for the Mac, out since a week ago, now works seamlessly. I actually only use one of Skype’s features, SkypeOut, but what a feature it is: 2-eurocent a minute “voice chat” to real phones in the US, Europe and Australia from anywhere I can plug my PowerBook to the internet — and with the same sound quality as a normal phone call.

    This changes everything. Consider my setup: Because I’ve been moving frequently these past few years, I haven’t bothered getting landlines, relying instead on my mobile phone to make calls, even if calling out is expensive, at 1 euro per minute to the US. Today the 10 euro credit I bought on Skype will let me talk 500 minutes, as opposed to the 10 minutes I would have gotten via Telia on my mobile. That’s also far cheaper than using landlines or calling cards.

    Friends in New York who are all avid Vonage users have asked me why they should use Skype instead. They shouldn’t. Unlike Vonage, Skype can’t receive calls from real phones. But Vonage, inexplicably for the business they’re in, only takes credit cards with US addresses (trust me, I tried); Skype takes all comers. And since I already have a mobile phone on which I can receive calls for free, I don’t miss Skype’s inability to do so.

    If you live in Europe, have broadband and still initiate long distance phone calls using landlines, you are being fleeced. Help kickstart the voice-over-internet-protocol (VoIP) revolution and get Skype. (Did I mention it’s made in Sweden?)

    Perfect Day 2 (July 23, 2004)

    Perfect Day 1 was May 5, 2002.If somebody were to ask me to describe a perfect summer vacation day, it might go something like this: It would be sunny in Ireland, where I’d be, and I would spend the morning in the garden making progress listening to Henning Mankell’s Innan Frosten, in Swedish, on the iPod, while following along in the book, much as I promised Erik I would.

    After lunch, I might take the bus into Dublin to pick up my copy of Prime Obsession, which would have been ordered at Hodges Figgis the week before. Then, I’d wade through swarms of Spanish English-language students just released from the day’s classes — all practicing their immaculate Spanish on each other — and make my way to the Irish Film Institute, which would be starting a Richard Linklater retrospective that day with a showing of Before Sunset, the sequel to Before Sunrise, a movie I saw twice in so many days in 1995 — once secretly, in order to avoid the barbs of so-called friends that I was showing the symptoms of terminal hopeless romanticism.

    Before Sunset is 80 minutes of real-time conversation between Jesse and Celine as they meet in Paris, not so accidentally, nine years after a first encounter in Vienna in Before Sunrise. A Linklater retrospective would show both films, of course, but I’d resist the tempation to see them chronologically, preferring instead to try to resurrect memories from 1995, much as the protagonists do in Before Sunset. And the movie would stir the heart: perversely, in the intervening years, Celine has gotten herself an MA in international relations, while Jesse has become a novelist — where I am & where I’d like to be. And they’ve both lived in New York. No wonder their duet is even more compelling to me the second time round, if that’s possible.

    After dinner, there’d be a public lecture by Roger Penrose — around the corner and down the street — capping a week-long international conference on quantum gravity. Penrose would title his talk Fashion, Faith, and Fantasy in Modern Physical Theories, and he would illustrate it with transparencies of mermaids.

    Penrose would point out that the famous (hypothetical) Shrödinger’s cat experiment still leads to a paradox in quantum mechanics: Whenever the cat is both alive and dead, the result of a superposition of possible states for a particle that decides the fate of the cat, there can also be an observer that is both happy and sad, until we observe him (or her). But we, in turn, might also both happy and sad, simultaneously, until observed by an outsider. The boundary for the macroscopic effects of quantum behavior seems to be arbitrary, in other words — or, at least, not well defined. This would be problematic, and hence quantum mechanics, while immensely useful, still requires a measure of faith. Penrose goes on to show a transparency describing an experiment, being built, that should probe this boundary.

    cat.jpgpenrosetalk.jpg

    And then I’d blog it. And that is exactly how I spent July 23, 2004.

    Hawking in Dublin: Link dump

    Stephen Hawking gave his talk yesterday, the media came and went, and now the interpretations are beginning to trickle onto the web. The transcript — apparently the actual text Hawking’s computerized voice read, minus the visuals — is here. The New York Times has by far the best reporting I’ve read. Physics Forum has a thread (scroll to the end), and several blogging physicists chime in here, here, here and here.

    The upshot, by consensus, is that Hawking’s proposal is a way to resolve in the semi-classical regime a paradox that is no longer a problem in the quantum regime, where the real search is now on for a mechanism whereby black holes release the information they contain back to the rest of the universe as they fritter away — with several viable options in the running.

    In other words, most physicists in the field have long accepted that black holes have entropy and that this information is released back to us — not forwarded into other universes — as black holes die out. Besides denying science fiction writers convenient plot devices, what Hawking is doing is reconciling the advances quantum approaches have given us with the more “traditional” way of describing black holes.

    His method is to postulate that, when observing a black hole from far away (in other words, when we look at the complete picture), it’s never possible to be sure what kind of black hole we’re looking at, or if it is a black hole at all. It’s a situation analogous to the famed double-slit experiment — easily the bizarrest thing I’ve ever experienced first-hand — where photons, having gone through either the left slit or the right slit, and detected individually, still aggregate to a classical interference pattern. So it goes with black holes, according to Hawking: They look like black holes from a classical perspective, but we can’t be sure, because of quantum uncertainty, what kind they are precisely — sometimes, they are pseudo black holes of the variety that do allow information leakage. When we collect all these possible black holes into the composite classical black hole that we see from afar, enough of them allow for information leakage to resolve the paradox. Or so says Hawking, in my limited understanding via other observers, most of whom seem slightly skeptical of the mathematical gymnastics in his talk.

    In a way, this is a catch-up maneuver for Hawking, and one more vote in favor of the inviolability of the basic laws of physics. The search for a theory of quantum gravity just got a little easier.

    Conference event horizon

    Dublin (where I happen to be on vacation) promises to be the scene of some pretty revolutionary physics next week, because Stephen Hawking has asked at the last minute to speak at the 17th International Conference on General Relativity and Gravitation being held here, claiming to have solved the black hole information paradox, one of main puzzles facing those toiling on quantum gravity.

    By far the best context I’ve seen for what Hawking might say can be found on pages 93-94You need to register with Amazon.com, but when you do you can read those pages. (If you need to, do a search for “unruh”). of Lee Smolin’s most excellent (as in you really should read this book instead of other popularizations, including the rather mediocre stuff Hawking himself has written) book, Three Roads to Quantum Gravity. Those few paragraphs set the stage for the question Hawking purports to have answered: If black holes have entropy, how do they store this information? According to classical physics, this information is destroyed, which would break the second law of thermodynamics.

    One possible solution was offered by Samir Mathur a few months ago, using string theory. What will Hawking sayUpdate 17/07: More hints here.? His conference abstract is suitably baffling. Will he flesh out Mathur’s approach, or come up with a completely new model — perhaps an equivalent solution using another theoretical base? Or will he be wrong? (It’s happened before.)