## Tuesday, March 31, 2009

tick tock. ESA 2009 deadline is Apr 12.

## Monday, March 30, 2009

### MASSIVE !! DATA !! ALGORITHMS !!

Sorry. Just went all Bill G on y'all.

Anyway, this year there's a workshop on massive data algorithmics happening just after SoCG. It's structured like the Fall Workshop (i.e things submitted here can be sent elsewhere - there's no formal proceedings). The Apr 3 deadline is looming, so get those streams, disks, and I/Os ready and send them to Aarhus !

## Friday, March 27, 2009

### The FOCS submission experiment

Via Muthu, an explanation by Umesh Vazirani of the rationale for the new 'submmary after the deadline' experiment at TOCS. A key paragraph:
To understand the motivation for the abstract and better picture
its contents, think about how often the 20 minute presentation at
STOC/FOCS provides a better insight into the research than the
paper. While preparing the talk, the authors can step back and
try to explain something interesting about their work - either the
core of their proof, or a special case of their theorem, or the
new conceptual framework that they introduce. The one week
provide a chance to reflect, and additionally there would be an
incentive for the authors (just as in the conference presentations),
to simplify.
I think it's a great idea to try things like this. I hope it will work though. One of the reasons that papers often get written badly for STOC/FOCS is because there's an incentive to obfuscate and make things look rather technical. Whether the post-deadline calm will allow people to see beyond the chest-thumping will decide whether the 2-pages are useful.

## Wednesday, March 25, 2009

There's an excellent article by Emmanuel Viola in the latest SIGACT News ($$) on the problem of finding functions that have low correlation with low-degree polynomials over GF(2). p.s this is the kind of thing that one would merely twitter, but then I'd want it to somehow feed into my blog automatically. ## Monday, March 23, 2009 ### Adjacency matrix of a tree.. Is there a purely algebraic characterization of the adjacency matrix of a tree ? In other words, given an n X n boolean matrix, can I determine whether it's the adjacency matrix of a tree with purely algebraic conditions (rather than writing down the induced graph and checking). The reason I'd like this is because I want to talk about the space of such matrices, and I'd like to speak algebraically, rather than indirectly via a conversion to a graph. This seems like something that should be fairly well known: I've found some descriptions that involve all determinants of minors, and was hoping there was something more "compact". ## Sunday, March 22, 2009 ### (ab)use of wikipedia ? From IHE: Recently, a small journal entitled RNA Biology announced that it will now require all authors to also create Wikipedia pages about their discoveries. Specifically, the journal says: At least one stub article (essentially an extended abstract) for the paper should be added to either an author's userspace at Wikipedia (preferred route) or added directly to the main Wikipedia space (be sure to add literature references to avoid speedy deletion). This article will be reviewed alongside the manuscript and may require revision before acceptance. Upon acceptance the former articles can easily be exported to the main Wikipedia space. See below for guidelines on how to do this. Existing articles can be updated in accordance with the latest published results. I'm not a Wikipedia expert (hello 0xDE), but isn't this a violation of the Wikipedia policies on copyrighted material and (non)-publishing of original research ? Update: As 0xDE points out, Wikipedia is already on top of this. ## Saturday, March 21, 2009 ### Trolled from the arXiv... volume estimation... Just posted on math.PR, math.MG: A Polynomial Number of Random Points does not Determine the Volume of a Convex Body Ronen Eldan We show that there is no algorithm which, provided a polynomial number of random points uniformly distributed over a convex body in R^n, can approximate the volume of the body up to a constant factor with high probability. Specifically, you can't estimate the volume of a convex body merely by sampling. You need to generate new points that are "near" the old ones, and use a membership oracle as well. Neat... ### Blogroll alert, and some musings on community. Noam Nisan has a new blog on algorithmic game theory. This is great stuff. I must add that I'm finding Dick Lipton's blog fascinating reading, not only for the heavy research content, but the great historical perspective he brings to a number of well known problems. p.s (snark alert) It does seem unfortunate that computational geometry blogs (and data structures) don't appear to be recognized as part of the theoretical computer science blogosphere, but in a world where SODA is viewed as a conference not worth attending, I guess this is small potatoes. (end snark alert) p.p.s From an organizational perspective, it doesn't matter terribly whether geometry papers get published exclusively in SoCG/SODA, appear in STOC/FOCS, etc. Similarly, although Luca is concerned about crypto forking off from STOC/FOCS, I don't think there's a real problem with people naturally aggregating around a common topic area in their own conference. I'm also not too caught up with "name wars" despite my snark above. The problem is really along other dimensions: the tenure process (what is your field and who evaluates you/writes letters), and even more crucially, the funding process. For many years, geometry was funded from a separate sub program of CCF than the rest of theoryCS (at least theoryA). This was a good thing (more money) and a bad thing (CG was clubbed in with solid modelling, graphics, and symbolic methods). Now of course geometry has been folded back into the larger Algorithmic Foundations umbrella, and here's where perceptions start to matter. If AF gets defined (de facto) as STOC/FOCS stuff, geometry proposals are going to get short shrift (also because they tend to have more application-oriented material as well). This would be true for any other area that has forked off into its own community. I didn't submit to the theory program last December, and I don't know how panel deliberations are going, but I'm curious as to whether there's been any noticeable change ? ## Friday, March 20, 2009 ### Class Projects For some reason, class projects are all the rage in graduate classes here at the U. These are typically used in lieu of a final, and generally involve (for most classes) some kind of software artifact. Students seem to prefer projects to exams in fact. When I started designing my graduate classes, I had mixed feelings about class projects, but allowed students (especially in my comp. geometry class) to propose various ideas. Two years in, I still have mixed feelings about class projects. • It seems to me that for theory classes, a "pure theory" class project is difficult to execute. Most students here don't have a 'general theory' background, and so it would be well-nigh impossible for them to execute a class project of the form 'here's an open problem: go attack it'. • A cop-out option is a survey paper on an area. Now in principle survey papers can be an excellent way of distilling the knowledge in an area, organizing it, and then presenting it. But in practice, a survey ends up looking like a sequence of paper summaries, with little to no higher-level organizing • For some classes (geometry, specifically) a software artifact can make sense. In that regard, having CGAL around is handy because a lot of low-level grunt coding gets taken care of. But invariably the software artifact degenerates into a horse-race for a bunch of techniques, or "let me implement my research project that I'm working on somewhere else, and slap some O() notation on it to make it look like a CG project". The problem a class project (ideally) is trying to solve is providing an avenue to synthesize and apply the concepts being learned in class in some kind of holistic way. But I'm unconvinced that the class projects people are actually doing are effective in that regard. Incidentally, since some of my complaining has to do with topic choice, I should add that I've often given out topics for people to look at: no one likes my topics though :). I teach the grad intro algorithms class in the fall, and am tempted to do a class project where I require students to do some actual algorithmic modelling drawn from their home area (most students taking this class are not algorithms folk): which is to say, take a "real problem", and analyze it as an algorithms person would if given the problem by an applied researcher - formalize, prove hardness, come up with algorithms, repeat, .... If it weren't such a pain to design finals that aren't solvable via Google, I'd probably never use projects at all. ## Wednesday, March 11, 2009 ### Memory exercises and streaming. The latest guest column on Olivia Judson's "Wild Side" blog at the NYT is on increasing intelligence: Can we do exercises to increase it ? In the article, they describe a test that when performed repeatedly leads to measurable improvement in IQ: A common way to measure working memory is called the “n-back” task. Presented with a sequential series of items, the person taking the test has to report when the current item is identical to the item that was presented a certain number (n) of items ago in the series. For example, the test taker might see a sequence of letters like L K L R K H H N T T N X presented one at a time. If the test is an easy 1-back task, she should press a button when she sees the second H and the second T. For a 3-back task, the right answers are K and N, since they are identical to items three places before them in the list. Most people find the 3-back condition to be challenging. Of course, to folks who do streaming, this is easily recognized as a simple subcase of detecting a duplicate item in a stream (that problem allows n to vary over the entire stream), which in turn is a special case of the most frequent element estimation problem. These are hard problems for sublinear space stream algorithms. It's interesting that they're taxing for human working memory as well. ### Some SoCG 2010 news IPCO (Integer Programming and Combinatorial Optimization) is held in the summer, and can occasionally overlap with SoCG. I've been in touch with Friedrich Eisenbrand (organizer of IPCO 2010 in Lausanne) and it looks like the two conferences will be back-to-back and disjoint in case there are people who wish to attend both. Thanks to Friedrich for moving IPCO to earlier in June (9-11) to make this happen. In even more important news, the Zion Curtain is soon to come down in Utah. All this means for visitors is that you won't have to pretend to become a member of a 'private club' to drink at a bar in Utah any more. ## Tuesday, March 10, 2009 ### Barbara Liskov wins the Turing Award Via MM comes this news: ACM, the Association for Computing Machinery, has named Barbara Liskov of the Massachusetts Institute of Technology (MIT) the winner of the 2008 ACM A.M. Turing Award. The award cites Liskov for her foundational innovations to designing and building the pervasive computer system designs that power daily life. Her achievements in programming language design have made software more reliable and easier to maintain. They are now the basis of every important programming language since 1975, including Ada, C++, Java, and C#. The Turing Award, widely considered the "Nobel Prize in Computing," is named for the British mathematician Alan M. Turing. The award carries a 250,000 prize, with financial support provided by Intel Corporation and Google Inc. The first U.S. woman to be awarded a Ph.D. from a computer science department (in 1968 from Stanford University), Liskov revolutionized the programming field with groundbreaking research that underpins virtually every modern computer application for both consumers and businesses. Her contributions have led to fundamental changes in building the computer software programs that form the infrastructure of our information-based society. Her legacy has made software systems more accessible, reliable, and secure 24/7. ## Wednesday, March 04, 2009 ### DBR: moving forward I have no doubt that people are sick of this subject now, so I'll try not to rehash or re-argue points that many have brought up. It seems to me (as a supporter of DBR) that the objections to DBR can be categorized as • self-aware: "we are not biased. period. everything is good". • conservative: "DBR could cause other problems: why replace one flawed system with another" • logistic: "authors could slyly reveal info, how do we handle self-citations etc" • irritated: "why are you people rabble rousing: leave us alone" In this regard, I think people should really try to read Kathryn McKinley's essay on this topic (and the related links). There's much there for all: for us utopian DBR devotees, she points out that the most effective kind is a appears to be a staged unblinding approach, rather than a straight DB approach. For those who think that we can check our own biases, she provides references and evidence for why this goes against what we know about human psychology. For people concerned about logistical issues, she discusses many of the common problems (and also references other disciplines that have made their own attempts to solve this). A comment I read somewhere (union of Sorelle, Lance, Michael and myself) made what I thought was an excellent point: if people are really committed to trying out DBR, it might be good to experiment in a conference outside the big ones, so we can acquire some level of familiarity with the system (or realize that it's totally useless). As I had mentioned to Sorelle, this ultimately boils down to having a PC chair who wants to experiment in this way: Michael tried doing serious CoI at the STOC PC meeting and received no small amount of flak for doing so, but at least he tried, and it convinced him that it should happen even more. More on the dynamics of peer review (this at the level of funding panels) comes from a new book reviewed in IHE. The review reveals some of the key findings: nothing entirely surprising, but a clear indication that many "non-technical" factors go into a proposal getting funded (even things like who has a plane to catch and when). It's reading things like this that makes me impatient with people who claim that bias doesn't exist. ### NSF vs NIH stimulus disbursement From the Chronicle ($$):
In the absence of definitive guidance from either the U.S. Congress or the White House, the nation's two leading providers of federal science money to universities are apparently taking different approaches to what it means to help the U.S. economy.

The National Institutes of Health, which is getting $10.4-billion from the$787-billion economic-stimulus measure signed last week by President Obama, announced it will tweak its science-based distribution guidelines to ensure the largess some measure of geographic parity.

The National Science Foundation, which is getting $3-billion in stimulus money, has concluded that it will not. "We are sticking to NSF's mission," Arden L. Bement Jr., director of the National Science Foundation, told the agency's board on Tuesday. I don't have an opinion one way or another on this: just interesting to see the difference. As the article points out later, The NIH and the NSF are getting the two largest chunks of$21.5-billion in new federal money for research and development contained in the stimulus measure. But while the NSF is an independent federal agency, the NIH is part of the federal Department of Health and Human Services, which makes it more tightly bound to direction by Congress.

## Tuesday, March 03, 2009

### On Clustering and Density Estimation

A brief respite from SBR/DBR/DOOR review policies....

I'm running a clustering seminar, and although I thought at first that putting some order on the space of clustering methods was too daunting a task, I'm actually quite happy with the way the course organization has worked out. Obviously I can't cover everything, but what I've done is focus less on the slew of algorithms for a particular clustering formulation, and spend more time on the formulations themselves, comparing and contrasting to get a better sense of "which clustering technique should I be using", rather than "what's the best algorithm for k-median".

Various interesting observations have emerged along the way, some of which I might write about as time goes on. Right now, I want to point you to a post by my colleague Hal Daumé. We were talking about EM and its dual role as density estimation and clustering algorithm, and he has a very interesting observation about the the dangers of conflating the density estimation problem and the cluster-finding problem.

## Monday, March 02, 2009

### Graph Theory of monopoly

On TechCrunch, Eric Clemons, a professor at Wharton, does an analysis of whether Google could soon face antitrust charges. What's interesting is his analysis of when monopoly power may be viewed to have kicked in for electronic markets, which are apparently harder to analyse than "regular" markets.

His analysis boils down to a separator construction: analyzing two cases of market power that look similar but actually are different, he argues that the key difference is that in one case (airline reservations) the key monopoly-controlling entity separated the customers from the suppliers in a graph sense: all paths from customers to airlines went through the reservation systems. In a different example (bank ATMs), he showed that this was not the case, and in fact no actual monopoly developed, even though the entity had 100% market control.

He uses this to conclude that there is a potential (lots of caveats) antitrust case against Google, based partly on the fact that Google separates customers from companies wishing to ply their services, and controls the advertising mechanisms by which corporations talk to customers. I imagine that claiming that we ONLY search online is a stretch, and also it would be hard to argue that we are forced to use Google for this purpose - he addresses these points as well.

Overall an interesting argument: I was of course most interested in the graph-theoretic reasoning.

My student John Moeller has started a new blog 'On Topology':
My interest in topology comes from two directions: first, it was the reason that I got started in math (and subsequently the topic that broke me), and second, I found out over the last year that topology plays a prominent role in several research areas of computer science. These applications to CS are what really got me interested in the topic again. Algebraic (or combinatorial) topology is useful in studying the structure of data like meshes and graphs. Differential geometry is applicable when your data lives in another space, or on something called a manifold, and you want to do the same kinds of things that you’re used to doing in regular, old-fashioned space. Ultimately, both have to do with geometry, albeit in a very general way. That’s why these topics interest me; I want to know how to better get at the essence of data, and much of the time you can do that geometrically.

He actually enjoys thinking about Cartan-Hadamard manifolds and representation theory, but I don't hold it against him :).

## Sunday, March 01, 2009

### Double Blind Review, again...

Sorelle's excellent post and discussions got me thinking more about DBR, and then I read Michael's post, which captured an important part of what I wanted to distill out. So first, a summary of the summary, and then other important points.

There are a few serious reasons brought up against DBR. I should say up front that I don't view reasons of the form "We are unbiased/Theory papers can be reviewed with subjectivity/We are better than other fields" as serious, because there's ample empirical evidence for the existence of unconscious systematic biases in many fields (no, not in theory specifically, but we're are still human). And as Michael points out, there's sufficient evidence of actual bias (even if it's not considered malevolent)

The serious reasons against DBR are essentially three:
• DBR places an unacceptable restriction on the author's ability to disseminate the work
• There is no way a paper can get a fair review without the reviewers being able to google around to get a sense of the work being produced. Since such a process could easily reveal the names of the authors, this defeats the purpose of DBR, giving an illusion of fairness where none really exists
• You can't do subrefereeing if you don't know who the authors are.
Michael addressed the first in his post, and the second points more to weakness in the review system than DBR itself. There are mechanisms for dealing with the third in conferences with DBR.

But most importantly, I think the objections are missing the point. If the claim was, "SBR is unfair, and here's a perfect system to replace it", then these objections would be reasonable counters to that claim. But in all that I've read, the reason for DBR is:
To replace the systematic biases associated with SBR by a 'equal playing field of problems' when it comes to paper review.
In other words, the point is not to be perfect, but to be imperfect in a fair way, that doesn't unfairly work against group identity that has nothing to do with the quality of the research.

I really think that perspective is important to keep in mind when discussing this. I have yet to hear any argument for significant harm to authors in going from SBR to DBR. Inconvenience yes, but direct harm, no. And any harm is spread "across the board": I am as likely to suffer in this system as a new grad student or a famous researcher. But the benefits are disproportionate, and correct for structural biases, and that's the point.