Friday, January 28, 2011

Absentmindedness

And now for a break from SODA updates.

They say that professors are absentminded. I say that my mind is not absent: it's very very present - just somewhere else.

On that note, here are some levels of absentmindedness. I will refrain from any comment on how I came up with this list.

9. putting down your coffee while looking for keys and forgetting where you left it.
7. forgetting that you're wearing your glasses
6. looking for your phone while holding it
5. missing your bus stop because you're day dreaming
4. missing your bus because you're day dreaming at the bus stop
3. taking your bus because you forgot you had the car that day
2. having to be reminded by your spouse not to take the bus because you took the car that day

I used to remember #1, but I've forgotten it.

SODA Day 2.

I seem to be the only person blogging about SODA (even though Jeff, Glencora and Sorelle were around as well - come on people !)

SODA Day 2 was the official day for bidimensional dimensionality reduction in spaces of bounded doubling dimension (ok that almost made sense). Here are some highlights:
• I enjoyed Bidimensionality and EPTAS not because of the results themselves (which are interesting) but because the talk was great, and I learnt a lot about the basic framework of bidimensionality. The talk was low on technical details, but the main contribution of the paper is a stronger result connecting bidimensionality of a graph property to the existence of an (E)fficient PTAS for the problem. Earlier work needed a constant factor approximation algorithm to bootstrap the process, and they were able to relax that assumption.
• In the same session, Known Algorithms on Graphs on Bounded Treewidth are Probably Optimal was also cool. They showed that any progress on improving the exponential dependence on treewidth for a number of parametrized problems (for example from 2^tw to (2-eps)^tw) would break the Strong Exponential Time Hypothesis (i.e that SAT cannot be solved in (2-delta)^n time).
• A paper that drew a lot of attention (as seen by the huge influx of people into the room) was the work by Daskalakis and Papadimitriou (Christos gave a really great talk) on Continuous Local Search. In brief, they constructed a complexity class that combines the approximate fixpoints of PPAD and the local search of PLS and contains some natural problems from algorithmic game theory as well as numerical analysis. Indeed, gradient search can be viewed as a special example of elements of this class.
• There was a whole session on approximate geometry, and doubling dimension played a key role in constructing efficient spanners, and in getting below the JL limit. Sandwiched between these was a local JL embeddability result, the idea being that if you only care about preserving distances in your (k)-neighborhood, you can get dimensionality reduction in terms of k, rather than n. The proof itself uses a variant of the Nash embedding theorem.

Wednesday, January 26, 2011

SODA Day 1.

I'm slowly unloading my backlog of posts on SODA 2011. At this point, the purpose is less to be a live-stream of events, and more to be a reflection on things I found interesting. As always, there will be no attempt to be deep, insightful or comprehensive. If you think I've missed THE PAPER OF THE YEAR berate me in comments and then write a guest post :)

The Holiday Inn has two major problems. Firstly, the conference rooms were in the basement, which meant that 3G access was hard to come by. Secondly, the local wifi wasn't particularly strong, and didn't work too well in the rooms or in the higher-up hotel rooms. This unfortunately forced me to actually listen to talks rather than tweeting about them. Good for the conference, bad for the as many as TWO WHOLE people who couldn't attend the conference and were hanging on my every tweet waiting for updates.

•  T. S. Jayram and David Woodruff had an interesting paper on JL transforms beyond the constant error rate. One of the standard methods in JL proofs is to prove a constant error bound for distortions, and then use multiple parallel copies to reduce the error down to 1/n^2 so that the union bound can kick in. The question is: is this the optimal thing to do ? Might some more clever scheme avoid the need for parallel copies ? Their answer, it turns out, is no. They show lower bounds that match the best upper bounds, and in the process develop a new technique that gives lower bounds for communication complexity that depend on the error probability as well as the approximation guarantee.
• The next paper in the session also introduced a new tool for communication complexity lower bounds. Elad Verbin and Wei Yu proposed a generalization of Boolean Hidden Matching (which was used to separate quantum and classical communication complexity) and used it to show new streaming lower bounds for sorting by reversals, among other problems.
• A talk I didn't attend, but should have (DABSH), was by Flajolet, Pelletier and Soria on Buffon machines. You can think of the Buffon needle process as a way of generating $2/\pi$. So the question they answer in this paper is: what kinds of "simple processes" can generate other complicated functions like exponentials, trigonometric functions and the like.
• Continuing the 'analytic combinatorics' theme, Wojciech Szpankowski talked about his paper with Michael Drmota on discrete divide and conquer recurrences. The main results of the paper were quite neat: a master theorem like formulation to solve exactly recurrences that involve floors and ceilings, without resorting to domination arguments. The advantage is a more precise bound on the running time which also captures the herky-jerky behavior of such algorithms (because of uneven integer splits)
I didn't get as much out of Bruce Reed's talk as I would have liked, mostly because I made the mistake of sitting in the back and could only see half of each slide. The talk itself was rather technical, with less of the high level intuition that might be helpful to an outsider to this area like me. It is however a reasonable model for an invited talk.

If it's Sunday at SODA, it's NFL time. As usual, Kirk Pruhs wandered around wearing his Steelers shirt, and looking mighty pleased. David Johnson was alternately elated (Packers win !) and downcast (Jets lose !) and a number of us drifted towards the hotel bar by early evening to set up shop there in front of the big screen. For those of you sniffing disdainfully at my embrace of brutal American sports, I'll merely say that there are MANY football fans among the SODA community.

Postscript: I was feeling guilty about summarizing papers so briefly. I just found Oded Goldreich's page on papers he's interested in (via this cstheory question) and it appears to be a nice model with short comments on papers he likes.  I might try doing something like this either interspersed with other posts here, or on my web page, just to force me to read papers of interest.

Tuesday, January 25, 2011

ALENEX: Experiments with Johnson-Lindenstrauss

I'm three days behind on my postings, so I have the luxury of looking back and attempting a larger perspective.

As Dana Randall put it at the business meeting, this is the Johnson-Lindenstrauss SODA. And it seems apropos to start with our paper at ALENEX. This was work done by my student Qiushi Wang, who's applying for grad schools (Admit him ! or email me for more info!)

The Johnson-Lindenstraus Lemma is one of the most powerful tools in the theory of metric embeddings and dimensionality reduction. Simply put, it says that given any set of $n$ points in a Euclidean space, there exists a linear mapping into roughly $O(log n)$ dimensional Euclidean space that preserves all distances approximately.

There's a long series of proofs of this lemma: all of them yield essentially the same bound on the number of dimensions and the same dependence on the error term, and so the main efforts have focused on improving the running time of the mapping itself. If we're mapping from d dimensions to k, a linear mapping can take time $O(kd)$ per point, and this is the time to beat.

There are two strands of research along these lines: the first family of methods tries to speed things up by sparsifying the projection matrix to speed up the transformation. You can make the matrix quite sparse this way, but there's a limit on what you can do, because if the input vector being projected is itself quite sparse, then the resulting vector has mostly zeros in it, destroying its norm (and any hope of preserving distances)

The trick, which leads to the second strand of research, is to "precondition" the input. The idea is quite elegant: if you apply what is essentially a random rotation to the vector, it becomes dense w.h.p, where density intuitively means that no one coordinate is very large (we assume unit norm vectors w.l.o.g). Once you do this, the resulting projection matrix can be made quite sparse.

There's a catch though: you're now using two matrices instead of one, so you end up spending d^2 time on the first part, which dominates the original $kd$ time. The second trick you need then is a special random rotation that can be applied very quickly. Essentially, you need the walsh-hadamard transform. This is the core idea behind the 2006 paper by Ailon and Chazelle, and there's been much work since on improving the bounds and the preconditioner construction. A third line of work combines the two strands is to sparsify (by subsampling) a special code matrix that has a "preconditioning" effect.

But in all of this, no one has really looked at the actual behavior of these algorithms in practice. There are a number of reasons to do this: first of all $O(\log n)$ dimensions isn't so hot if the constant is large. Secondly the algorithm is randomized, which tends to give practitioners the heebie-jeebies. And finally, the dizzying array of algorithm options available is just plain confusing.

Our paper contains most of the details, so I'll spare you the long exposition, and summarize some of the surprising and not-so-surprising conclusions thus far:
• The constant in the dimension of the embedding is small. It's essentially 1 * log P/epsilon^2, where P is the number of "norm probes" you require (P = n^2 for distances and n for norms). This is good, because it means that there are no hidden large constants.
• The quality of all algorithms is basically the same, and is very consistent. In other words, the fact that JL is randomized (which often causes a lot of concern in practice), is not a problem for its use (unless you working in a distributed environment and need to share randomness - pointed out to me by TS Jayram).
• The distortion error itself is very nicely concentrated (normally) around 1. Unless you have highly clustered data, in which case the distortion distribution looks like a superposition of shifted Gaussians, one for each cluster center.
• Since all algorithms behave essentially the same on quality, speed is the main differentiator. Here, the 'best in class' depends  heavily on what you know about the data. For dense data, you can be pretty sparse (as predicted by some of the papers) and the embedding is fast. For sparse data, it turns out that at least in MATLAB, and for small dimensions, the dense method work better (a little ironic considering that much of recent work was designed to deal with the sparse case). This is because of MATLAB's heavy optimization for dense matrix multiplication.
• Of course, your dimensionality might be too high to store a dense matrix, or you might not even know what the data profile is like. In that case, preconditioning methods like the original Ailon/Chazelle method work great. and there are only small differences between the methods as d increases.
We're not even close to being done with our explorations: there are at least four or five new questions to explore based on feedback we got at SODA. But it's been an illuminating experience, and I've been heartened by all the interest the community has shown in this research, based on the feedback I got.

Monday, January 24, 2011

ALENEX/ANALCO II

Today, someone asked me to post something sensational just to stir up some controversy. It turns out that without realizing it, I already did it yesterday ! I was talking about the use of CPLEX to solve (very effectively) instances of 1-median over strings, and said this:
It's not the "sexiest" thing in the world to solve algorithms problems in practice by using large industrial strength packages. However, both CPLEX and  SAT solvers are examples of tools that can be used in practice to solve fairly intractable problems. It still takes a lot of engineering and skill to make the heuristics work well, but it's something that we should be using as a matter of course when designing heuristics before trying to invent an algorithm from scratch.
I should have known better than to bring down the fury of the entire field of OR on my head. Michael Trick, OR blogger extraordinaire, decided to round my variables for me: read what he had to say here.  As penance, I promise to download CPLEX and encode at least one problem on it in the next year :).

I've seen Bob Sedgewick give talks a few times now, and I'm always inspired by them. This latest one was titled 'Algorithms for the masses' and was hard to summarize: it was part exhortation to do more analytic combinatorics, part discussion of a new intro CS course he and Kevin Wayne have designed, and part emphasis on using the scientific method properly to design good models for algorithm behavior and data characeteristics.

The principle at the heart of this was a fitting one for this joint talk: we should do more scientific analysis of our algorithms to figure out exactly how our algorithms behave in practice, rather than relying on O() notation as a predictive and comparative tool (both of which it isn't). This goes back to Dick Lipton's coinage of 'galactic' algorithms: Bob made the assertion (not wrong in my view) that most algorithms at STOC and FOCS are 'galactic' and much of the work at SODA is too.

While I agree that it's high time we stopped using O() notation as a cudgel, I think it's harder than one might think. Engineers can model the real world in various ways, and when they want to test their models, they can - well - run it on the real world. Even if come up with a plausible model of how my algorithm works, and what the various cost functions are, I still need to hope that the data doesn't have weird characteristics that make all the results go wonky. Probably the way to see this is that even in "the real world", if we dont know how a particular genetic mechanism works, it's as good (or bad) as not having an accurate model of data that we're testing.

The second invited talk, by James Demmel, was a little harder for me to follow, because it was a much more technical talk about the challenges of designing linear algebra routines for future architectures. He described a machine the DoE is proposing to build, and it's likely to have 1 billion cores ! But even with that many cores, the main bottleneck is going to be communication, and the goal going forward is to design algorithms that parallelize well with minimal communication.

Or as he ended his talk:
Don't communic...

Sunday, January 23, 2011

ALENEX/ANALCO

A few quick hits from ALENEX, or SODA day 0:

• Moraru and Anderson used Bloom filters in a nifty way to implement exact pattern matching where you have a large set of patterns and an even larger text. The idea was to do a first pass over the text after storing all the patterns in a Bloom filter. Every subsequence of matching text was stored in a second Bloom filter, and in a second pass, all the patterns were run over this Bloom filter to take care of false positives. A final "exact" pass did the trick (at this point both sets are small enough to be reasonable). They have a companion paper at NSDI (which is a pretty good networking conference) on using this for malware detection, and that's a good example of pairing nice algorithms engineering with some interesting applications.
• Chimani, Woste, and BÃ¶cker were looking at the 1-median problem on a hamming space, and showed that the simple integer programming formulation actually does great in practice, when you throw CPLEX at it. This was surprising to me on two levels: firstly, that CPLEX is actually free for academic use (who knew!) and that such a simple approach is so effective.

It's not the "sexiest" thing in the world to solve algorithms problems in practice by using large industrial strength packages. However, both CPLEX and  SAT solvers are examples of tools that can be used in practice to solve fairly intractable problems. It still takes a lot of engineering and skill to make the heuristics work well, but it's something that we should be using as a matter of course when designing heuristics before trying to invent an algorithm from scratch.
• Stanton and Pinar had some experimental results (and some theory) on sampling from the space of graphs that have a prescribed joint degree distribution. While degree sequences are all the rage when trying to model various "naturally occuring" graphs like router graphs or social network graphs, there's a body of work that notes that graphs with the same degree distribution can have very different properties, and that in fact statistics on the number of edges connecting nodes of certain degrees (i.e higher-order statistics on degrees) are even more relevant.They propose a simple Markov chain that allows them to sample from the space of all graphs having a prescribed joint degree distribution, and while they don't yet appear to have theoretical results on the convergence of this chain, it converges quicly in practice.
Other notes: I'll be using the hashtag #soda2011 on twitter during the day. If you're tweeting from SODA (an don't want to let the NIPS tweeters show us up!), do use this hashtag as well.

Wednesday, January 12, 2011

The 5+5 Commandments of a Ph.D.

This article is written by the holy blogging trinity of the School of Computing at the University of Utah: Matt Might, John Regehr, and myself. If you don't read their blogs, you should, because you'll learn how to program robots using an iphone interface in subzero weather.

There have been a lot of Ph.D.-bashing articles lately. There have been some spirited defenses of a Ph.D. too. Most of these articles make good observations, but they're often about the larger Ph.D. ecosystem and therefore fail to provide actionable advice to (potential) Ph.D. students.

We observe that most failures of the Ph.D. system -- including both failure to get the degree and failure to see a good return on time and money invested in obtaining the degree -- boil down to a small set of root causes. These causes are on both sides of the implicit contract between advisor and advisee. Here's our pragmatic view of the conditions that need to be met for a Ph.D. to make sense. (Please keep in mind that we're all computer science professors, though we've made an effort to avoid field-specificity.)

1. Advise the student: help find a thesis topic, teach how to do research, write papers, give talks, etc.

2. Provide protection from and information about funding concerns (to the level of expectations of the field, which vary widely).

4. Provide early and clear guidance about the time frames and conditions for graduation.

5. Introduce the student to the academic community, through conference talks, invited talks, letters of recommendation, etc.

The student shall...

1. As early as possible, do due diligence in researching career prospects. It's not hard to get people to talk about this and there's also plenty of written advice, in books and on the web. Carefully filter what you read since the situations may be very different between engineering fields, science fields, and the humanities. There
may also be significant differences between sub-fields such as theoretical computer science vs. operating systems. A new student should glance at job postings and NSF statistics to determine the ratio of new Ph.D.s to open tenure-track slots.

2. As early as possible, determine if the actual career prospects are a reasonable match for their needs/expectations. Until the student makes her expectations clear, the advisor has no clue if she simply
must have an academic job or whether he'll be perfectly happy getting a Ph.D. and then going to law school or being a stay-at-home parent.

3. Not be deluded or blinded by catchphrases like "life of the mind." Indeed, this life does exist, but probably only during the ABD portion of a Ph.D. A professor would be extremely lucky to live the life of the mind 15 hours a week, leaving 60 hours of advising, teaching, reviewing, writing grant proposals, traveling, and sitting in meetings.

4. Be a good investment in terms of time and money. In other words, work hard. Students who periodically disappear for long bouts of skiing, soul searching, or contract work tend to be put on the back burner by their advisor, making it much more difficult to get re-engaged later on. An easy litmus test: if acting a certain way
would get you fired from a real job, then it's probably a bad idea to try that in a Ph.D. program too.

5. Jump through the administrative hoops appropriately. The hurdles are important and generally not too burdensome: take some classes, do a qualifying exam, write a proposal, and so on. These are easy to
ignore until they become a problem. Your advisor is not likely to remind you, or even remember that you need to do them.

Since nothing is obvious on the internet, a disclaimer: These edicts might come across as cold and overly pragmatic, and might suggest that we are ignoring the joy of discovery, the thrill of learning and the excitement of doing cutting-edge research that goes along with doing a Ph.D. Far from it: we've chosen this life because we experience all of this and enjoy it. But the easiest way to crash and burn in what is a long, multi-year haul is to forget about the brass tacks and float in the clouds.

Tuesday, January 11, 2011

Are open tech report sites taking off in CS ?

For a while now, the math and physics have amused themselves by wondering why the CS community is slow to adopt the arxiv. In the past year or so, I've noticed an uptick in postings on the arxiv (especially around conference deadlines).

Prompted by David Eppstein's review of 2010 in cs.DS, I decided to get some stats on publication counts at the arxiv and ECCC for the past four years. My method:
1. go to arxiv.org/list/FIELD/YY (thanks, David)
2. Read off the total number of papers listed
For the ECCC, papers are numbered by YEAR-COUNT, so looking at the last paper published each year sufficed to get the count.

I did this for cs.{CC, DS, CG, LG} (LG is machine learning/learning theory)

Caveat: I ignored cross submissions, so there's some overcounting. I'm hoping that at least to determine trends this is not a major issue.

Here are the results:

Overall, it's clear that arxiv submissions in theory CS are climbing (and rapidly in the case of cs.DS), which I'm quite pleased to see. The growth rates themselves seem quite steady, so it's not clear to me whether the fraction of papers going on the arxiv is itself increasing (there's good evidence that the total number of papers people are writing in general is increasing).

Micro-polymath

As usual, I've been discussing what topic to cover for this semester's research seminar with my students. We usually cover some advanced topic, or a topic that people should know that no one teaches in our department. Students give presentations, I try to foster discussion (and grumble about the presentation style), and hopefully people learn something.

This semester we have decided to try something different. With my students and my postdoc Jeff Phillips (HIRE HIM! He's GREAT ! and needs a job !), the plan is to try a polymath-style enterprise. Specifically, we made up a list of problems that satisfy the following criteria:
• The problem is interesting enough in core theory-land that a solution almost guarantees a paper without having to worry about motivation, marketing, etc etc.
• The problem has been around and is reasonably difficult, so it's not likely to yield a solution immediately
• There's some new idea/paper/line of attack that has emerged (either because I thought of it, or someone else in our group did) that might be fruitful
This last point is very handy to whittle down the set of problems, because there's no shortage of open problems but very few of them might be amenable to attack at this point without new ideas.

We then went over each problem and voted, picking a winner. Luckily the winning problem was a consensus winner, so everyone is hopefully motivated to work on it.

Of course you're waiting for the punchline: which problem did we pick ? Alas, I'm not going to give that out yet. Not because of paranoia on my part, but because I'd like the students to have a certain amount of mind-space to maneuver in without having to worry about the competition. I anticipate complaints over this :).

What I ideally hope to report on a few months from now is an actual solution to the problem. Failing that I'll at least report on the progress made, and how this 'micropolymath' concept worked out.

Sunday, January 09, 2011

Awards from the joint math meetings, and other notes..

It's the new year !! apparently, being on twitter makes blogging frequency drop, because the throwaway information one might blog about just gets tweeted. This is not a bad thing in general.

I've been away in India for much of the winter break, after dealing with the NSF proposal deadlines. On a side note, for anyone complaining about the SODA deadline close to the July 4 weekend, the NSF Dec 17 deadline is MUCH worse. I've submitted proposals now from a cruise ship in Hawaii and at 2:30 in the morning from my parent's place in Bangalore. argghhh

The Joint Math meetings are wrapping up in New Orleans, and award news is beginning to trickle out. Muthu mentions the prizes for Assaf Naor and Ingrid Daubechies. Much to my surprise and great pleasure, the wonderful and entertaining overhang papers by Paterson, Peres, Thorup, Winkler, and Zwick were given the David Robbins Prize for
a paper with the following characteristics: it shall report on novel research in algebra, combinatorics or discrete mathematics and shall have a signifi cant experimental component; and it shall be on a topic which is broadly accessible and shall provide a simple statement of the problem and clear exposition of the work
The citation for their work is:
The Mathematical Association of America proudly awards the 2011 David P. Robbins Prize to Mike Paterson, Yuval Peres, Mikkel Thorup, Peter Winkler, and Uri Zwick for their innovative work on two papers: “Overhang,” American Mathematical Monthly 116, January 2009;
“Maximum Overhang,” American Mathematical Monthly 116, December 2009.
The two papers together solve, to within a constant factor, the classic problem of stacking blocks on a table to achieve the maximum possible overhang, i.e., reaching out the furthest horizontal distance from the edge of the table. The January paper was written by Paterson and Zwick, and the December paper was written by all five people named above. The January paper proves the surprising result that n blocks can be (cunningly) stacked using suitable counterbalancing to achieve an overhang proportional to $n^{1/3}$. (Many people have assumed that the overhang of about log n, given by the standard calculus exercise, is optimal.)
The December paper gave a complementary argument showing that an overhang proportional to n(1/3) is, in fact, the largest possible for any balanced stack.
The papers describe an impressive result in discrete mathematics; the problem is easily understood and the arguments, despite their depth, are easily accessible to any motivated undergraduate.
In other news, cstheory is humming along after the winter break. We're closing on in 3000 users. I'm always hoping for more on-target questions, especially in areas like approximations, geometry and AGT: complexity theory seems over-represented (which isn't a problem, but diversity is great!). We're hoping to develop some formal links with SIGACT fairly soon: stay tuned.