Friday, October 28, 2005

Darwin's and Einstein's (e)mail correspondence rates, or a rumination on power laws.

Over the last few days, there has been much discussion of a paper in Nature by Oliveira and Barabási on the correspondence patterns of Darwin and Einstein. One of the main conclusions of the paper was that at some point, their response patterns started following a power-law distribution, with coefficients such that a finite fraction of mail went unanswered.

Soon after this, there was a response suggesting that the analysis was flawed, and in fact the data followed a log-normal pattern, rather than a power-law. Cosma Shalizi and Aaron Clauset weighed in on this as well, on the side of the "log-normal folks".

As Geomblog denizens might know, I had a while ago mentioned a fascinating article by Michael Mitzenmacher on the difference (and similarity) between log-normal and power-law distributions, and the natural processes that generate them. I asked Michael if he'd like to weigh in on this controversy, and what follows below is his note. He comments not only on the technical issues involved, but on the whole issue of PR in science, and the contributions that Barabási has made to the field, and his perspective is very interesting indeed.

For those of you who might wonder why computer scientists (and theoretical computer scientists at that) should concern themselves about such matters, I'll refer you to the reading list for Jon Kleinberg's course on Information Networks, where you'll see how many aspects of link structure analysis on the web (which in turn is used for search engines like Google) relate to understanding power law distributions.

And now for the article...
Suresh asked me to comment on a current controversy: a paper by Barabasi et al claims that many human interactions, including the example of Darwin's and Einstein's letter response frequency, are governed by a bursty process that leads to power law tails. A rebuttal by Stouffer et al claims that the distribution is really lognormal.

I believe Suresh asked me to comment because (here comes the blatant self-promoting plug) I had written a survey on lognormal and power law distributions (and a slightly more fun, less edited earlier version). I'm embarassed to say I had not heard of the controversy, but it looks like a lot of fun for those of us who enjoy this sort of thing. Rather than focus on the specific technical details, let me make two high level comments.

First, as you'll know in more detail if you read the survey, this controversy does not surprise me. Lognormal and power law distributions are so closely related that if somebody wrote a paper claiming some distribution is a power law, you can count on somebody else writing a follow-up paper claiming it is actually lognormal (and vice versa). The only thing that is surprising is how fast the turnaround time is these days. The Web is truly amazing that way.

A more important second point, however, relates to how good Barabasi is at generating PR, and the recent panel discussion at FOCS about popularizing computer science. I think the reason this controversy erupted so quickly was because Barabasi sent a summary note on the work to Nature. It has been argued that we in TCS need a prophet, and Barabasi is indeed a modern prophet in the area of networks and power laws. Say what you like about him (and people say some bad things -- the Stouffer rebuttal does not use the words "technically incompetent", but I think the implication is in there; I and many others have had issues with Barabasi's grandiose claims and lack of sufficient citation of past work in the past), but he gets his message out there, and people actually know about it. His book Linked was very popular and had a big impact. While I think all his talk about networks is good for computer science in general, I'd rather see a computer scientist out in front of the world enlightening the masses as to why all this stuff is important.

For those who care about the technical issue, I will express some mild opinions. While the rebuttal suggests the data is a better fit for the lognormal distribution, I am not a big believer in the fit-the-data approach to distinguish these distributions. The Barabasi paper actually suggested a model, which is nice, although the problem of how to verify such a model is challenge, as I talk about in this MSRI video. This seems to be the real problem. Trust me, anyone can come up with a power law model. The challenge is figuring out how to show your model is actually right.

Michael Mitzenmacher
Post a Comment

Disqus for The Geomblog