Monday, June 29, 2009

SODA 2010 submission server

Remember: abstracts are due today at 5pm ET.

Sunday, June 28, 2009

Clustering: The "I don't like you" view

(this is part of an occasional series of essays on clustering: for all posts in this topic, click here)

Any clustering problem starts with a data set. And you can basically tell what algorithm you're going to need by asking, 'what structure do I have on the data ?'. The bare minimum you need is some way of taking two items and returning a number that tells you how similar they are, or more usefully, how far apart they are.

Of course, if I know that A and B are a certain distance apart, and B and C are a certain distance apart, I might like to infer something about the distance between A and C. This is what the triangle inequality gives you, and so the most elementary form of clustering takes items and a metric defined on these items (humans actually DON'T think about distances in this manner, as many psychological studies have shown: more on that later on).

I find it convenient to view clustering algorithms from a pop-psychology self-help perspective, and so I'll deem algorithms that operate with a distance metric the 'I don't like you' methods. The general goal with such methods is to group the items into "clusters" where there isn't too much hatred :).

There are many ways to quantify this "lack of hatred". The one most natural to theoreticians is the 'minimize the worst-case angst' viewpoint: in other words, find k clusters such that over all clusters, the maximum pairwise distance between points in a cluster is minimized. This is the well-known k-center problem.

If you don't like the lack of robustness (one point can mess up the cost), you could instead sum up all the pairwise distances to assign the cost for a cluster. This gives a variant of what's normally considered the k-median problem.

A third way of measuring cluster cost is to sum the squares of distances, rather than the distances themselves. This gives you a cost function that looks a lot like what k-means attempts to solve.

It's often useful to define a representative of a cluster: in the k-center view of the world, the center of a cluster is a point whose maximum distance to all points in the cluster is minimum. Note that by triangle inequality, this is within a factor of two of the k-center cluster cost, so it's often convenient to use this as the definition of the cluster cost (the cluster radius, instead of diameter, if you will). In the sum-of-all-costs view, the center correspondingly can be defined as as the point whose sum of all distances to all other points is minimized. Similarly for the case of sum-of-square-costs.

Remarks on complexity:
There is an elegant 2-approximation for the k-center problem by Gonzalez. The algorithm itself is the essence of simplicity: pick any point in the space, take its furthest neighbor, take the point furthest from these two, and so on. The first k points picked in this manner yield the desired cluster centers, and a simple triangle inequality argument yields the 2-approximation.

The k-median problem is a little harder to solve. The best known bound in a general metric space is a 3+eps approximation, and it's known that you can't get a PTAS. The approximation algorithm itself is not too hard: it's a local search heuristic that gets closer and closer to 3 as you permit larger and larger sets of centers to be swapped out.

Going to Euclidean space, or what's in a name ?
General metrics come either from some kind of similarity function between pairs of elements or from the induced shortest path metric on a graph (there are other metrics one can induce from a graph, and we'll talk about them later). But what if you have more information ? There's a general principle of clustering that's worth enunciating here: more structure on the data can lead you to more targeted algorithms that will probably work better, so resist the urge to be too general.

The most natural space to consider is a Euclidean space, and it is this space that explains the names 'median' and 'mean'. It's well known that on the line, the point minimizing the sum of distances to a set of points (the 1-median) is given by the actual median of the numbers. This explains why the problem of minimizing sum of distances is called the k-median problem. Similarly, the point that minimizes the sum of squared Euclidean distance is the centroid or the 'mean', giving rise to the k-means problem.

There's a long line of results on clustering under various measures in Euclidean space. In brief, for most of the above formulations, you can get a PTAS in Euclidean space, but you end up playing whack-a-mole with the parameters n, d, epsilon, and k (that is to say, something ends up exhibiting exponential dependence, and which parameter it is depends on what you care most about).

The Voronoi trick
But there's a general principle that governs much of the algorithm design for these problems. It's the so-called Voronoi trick, and can be summarized by pointing out that you can always improve the clustering cost by making sure that each point is assigned to its nearest neighbor. Effectively this means that the clusters define a Voronoi partition, with the center as a site, and all points within a Voronoi cell being assigned to the corresponding site. First of all, this means that the number of possible outcomes reduces from k^n (the number of ways of partitioning n things into k groups) to n^{kd} (since the number of Voronoi partitions is much smaller than the number of partitions). Secondly, it suggests a simple heuristic for improvement: Take a given clustering: reassign points so that each point is assigned to its nearest cluster center, and then recompute the center for the cluster associated with all points in the (erstwhile) Voronoi cell. Rinse and repeat....

This heuristic is the basis of the k-means algorithm, and is also the basis for a heuristic for the k-median problem clumsily named the k-medoids algorithm.

Notes:
  • all the above algorithms have the parameter k for the number of clusters. It's obvious why: all the above cost measures can be minimized by placing each item in a cluster by itself, but the resulting answer is meaningless, and "no man is an island"
  • In a general metric space defined over a finite set of points, the cluster centers will by default by elements of the input. This is often a desirable property, if you want representatives to be from the input. But in a smooth space like a Euclidean space, there's no reason to limit yourselves to data points, and this creates a dichotomy between the so-called discrete and continuous variants of a clustering problem. Often, you can use triangle inequality to argue that the answers don't differ significantly in cost, but it's important to ask yourself which variant you care about. The discrete problem can often be "easier" to solve, because you can enumerate over the choices, but this "easier" can be expensive: for example, minimizing the sum of squares is easy in a (continuous) Euclidean space, but is more expensive in a (discrete) metric space.
  • For some unknown reason, it's common to describe clustering problems by the algorithm used to "solve" them, which causes no end of confusion. I lost count of the number of times I had to point out that k-means is an algorithm that attempts to optimize a specific cost function. This is more than just pedantry, because unless you realize the difference between a problem and a specific solution, you'll never be able to conceive of an alternate approach, and you'll never even try to abstract out what's special about your problem. To theoreticians, this might be obvious, but it isn't really obvious IRL.
Next up: everything you ever wanted to know about k-means.

p.s this is written in a much more breezy style than I'd use for any formal document, but the organization and thinking reflects how I'd like to structure the material. Comments, as always, are welcome.

Friday, June 26, 2009

"Bloggers", the media and science reporting.

Sometimes I read an article that I just cannot understand. The words make sense, and I can understand the logical flow, but the entire premise just escapes me. Consider the following exhibit: an article in Nature News on the "problem" of blogging from a conference. A snippet from the article (which I recommend reading in its entirety):
By disseminating scientific results far beyond the lecture hall, blogging and social networking blurs the line between journalists and researchers. Scientists in competitive fields may be more reluctant to discuss new findings if they can be posted on the Internet within seconds.
and then later:
MacArthur's comprehensive postings were read by many scientists but they irked journalists attending the meeting. The meeting rules stated that reporters had to seek permission from speakers before publishing material on their work, rules that Cold Spring Harbor instituted in part because some journals, such as Nature, discourage scientists from talking to the press before their work is published. But those rules didn't apply to scientist-bloggers like MacArthur and, after he posted details from a few talks, reporters contacted Stewart for clarification on the policies. The complaint was a wake-up call: "For the first time, I became aware that people were blogging about the data at the meeting," Stewart says.
If I understand this correctly, the premise of the article is that blogging from a conference is bad because it
  • "scoops" the poor journalists under embargo ?
  • disseminates information faster than someone might like ?
to which my only response is "HUH " ? Isn't the whole point of a PUBLIC presentation to disseminate results to - you kn0w- THE PUBLIC ? and how on earth does it matter how fast this happens ? And although I feel for the journalists who agree to weird embargo rules by control-freak journals, why on earth should I, as a researcher who happens to blog from conferences, care about this ?

Sometimes I think the media needs to get over itself....

Thursday, June 25, 2009

Clustering: An Occasional Series

Clustering is one of those areas that epitomizes the 'ooze' principle in data mining: it's 3 miles wide and an inch deep. Every year, copious ink get spilt on some clustering method or another, and yet it's not really clear what key ideas are driving the field forward.

I ran a seminar this spring on clustering. I wanted to see if there was a way of organizing material on clustering in a way that was broad without necessarily being exhaustive, but that hit the "eigenvectors" of the field, so to speak. I was surprised that for a topic as important and broad-based as clustering, there was little out there in the way of textbook-level source material to work from. Most texts that I did find were too specialized for my taste, and didn't have the kind of broad perspective of clustering that I felt was critical to really understanding the area.

So I organized my material, and was surprised to find that there actually was a way of collecting the base material on the topic in a way that could be covered in a semester format without leaving crucial pieces out. What I did not try to do:
  • Cover the details of the best/greatest/fastest algorithm for each problem. In other words, I made a conscious decision to shy away from theoretical overkill.
  • Try to cover all the variants of different clustering methods. I felt that as long as one knew what k-means was, and what spectral clustering was, there was no real need to study papers that tried to put them together in strange ways.
What I did try to do was:
  • Where possible, try to evaluate the practical significance of the methods (so a linear time clustering algorithm that actually had a running time doubly exponential in epsilon would not necessarily make the cut, unless it had some other redeeming features)
  • Try to view the methods from the perspective of the user: given some data, with some structure, what are your options in terms of clustering algorithms ?
The results of this organization can be found here. I was relatively happy with the outcome, and I think the students were too. In fact, many of the discussions in class were interesting enough that I put them together in a collection of notes.

What I want to do in this occasional series is sketch out the "top-level" view of clustering as I see it, and as inspired by our seminar. My vague idea was that these notes might eventually be converted into some kind of monograph, or at the very least a web document that others might find useful. So I'm going to "test-drive" the ideas here, in the hope that my very well-informed readers will add to the discussion, and improve it (as usually happens).

First up: the basic trinity - k-(center/median/means).

Sunday, June 21, 2009

"I, for one, welcome our new TCS overlords"

Via Richard Ladner:
I am pleased to announce the new SIGACT officers for the term beginning July 1, 2009 and ending in three years, June 30, 2012.

SIGACT Chair
Lance Fortnow, Northwestern University

Members-at-large
Cynthia Dwork , Microsoft Research
Anna Lysyanskaya, Brown University
Michael Mitzenmacher, Harvard University
Mike Saks, Rutgers University

Quick hits

  • If you limit yourself to actually reading one Terry Tao post each week (I'm told this frequency prevents brain explosion), then read this one.
  • Joachim Gudmundsson wants to remind you that jausage is much taster than sausage. Oh yes, and JoCG is open for submissions.
  • Get those SODA papers in (but not too many, I have work to do !)
  • I kid, I kid....
  • I wanted to spend more time talking about Robert Lang's talk on origami. It had deliciously beautiful pictures, and some even more juicy mathematics, and he pitched it at JUST the right level. No slides available, alas.
  • Lecture notes on streaming from a workshop in Barbados...
  • I had to explain to someone that Denmark is not a city in Sweden. And this after watching the Denmark-Sweden world cup qualifying game in a bar with a hundred roaring Danes, and feeling like I had been transported to a Viking battlefield.
And because I can't help myself, "SKÅL"

Friday, June 19, 2009

SoCG 2009: Local search, geometric approximations and PTASs

I planned on this post coming a lot sooner after the last three, but got caught up in my own version of a Danish flu. sigh....

So let's get greedy. In the last post, I talked about new developments in the epsilon-net route to approximating covering/hitting problems. The problem with these approaches though (as Saurabh Ray pointed out) is that they (at best) get you to "some" constant. They don't get you down to a specific constant (unless you can really optimize the net-construction) and they certainly don't give you a PTAS.

So what's a poor approximator to do if their livelihood depends on getting a 1+epsilon-approximation ? Two papers at SoCG 2009 answer this question independently using essentially the same idea, and there's already a third paper that applies this idea as well. You know what they say: "three applications doth a general technique make" (or at least they should say it !), and the technique is so neat that I thought I'd spend some time outlining the basic principle.

But first, a detour. What follows is my own take on the underlying argument, and I'm the only one responsible for all the inaccuracies in interpretation and exposition. Consider the problem of computing a maximum cardinality matching in a bipartite graph. As we all know, this can solved using a max-flow algorithm. A well-known "spin" on this result is the following:
Suppose we construct a max flow F_k using augmenting paths of length at most 2k+1. Then OPT <= (1 + 1/(k+1)) F_k
Or in other words, F_k is a 1 + 1/(k+1) approximation to the true answer. The proof is fairly easy: consider any augmenting path of length at least 2k+3 that could increase the total matching size. Because we've tried all augmenting paths of length 2k+1, it must be that this path alternates between OPT (red) and algorithm (blue) edges, starting and ending in a red edge. In other words, if X is the number of blue edges along this path, then 2X+1= 2k+3, and X = k+1. This means that the number of optimal edges is at most (1 + 1/(k+1)) times the number of algorithm edges. Note that all augmenting paths must be disjoint, so some algebra yields the above result.

Another way of viewing the above fact about alternation is that if we think of the edges as vertices of a bipartite graph, (colored as the edges were), then each red set is adjacent to a reasonable number of blue vertices (in fact if this were not the case, the analysis would break). The edge-disjointness then allows us to scale things up to the entire graph

Notice that the process of generating longer and longer augmenting paths can be viewed as a local search operation. Starting from a given path, we modify it by constructing an alternating path using the current edges.In other words, we're able to make claims about local search yielding optimal solutions because we can control the way OPT interacts with the results of the algorithm, with the interaction parametrized by the "degree of locality".

Now let's return to the papers, keeping this in mind.

Mustafa and Ray apply this idea to get a PTAS for hitting set problems. Specifically, they show that for r-admissible regions in the plane and for halfspaces in R^3, the hitting set problem admits a PTAS that runs in time O(mn^(1/eps^2)) (m: number of regions, n: number of points). A collection of regions in the plane is r-admissible if any pair intersect in at most r points, and the differences are connected (these are often called non-piercing collections for that reason).

Chan and Har-Peled use the idea to get a PTAS for max-independent-set problems for pseudo-disks in the plane (which are 2-admissible). Their running time is similar.

Both papers run the following algorithm, parametrized by a number b. Take any feasible solution, and call it L. Now check all sets that differ from L in at most b places and see if any of them can be used to improve the quality of L by at least one unit. If so, make the swap and repeat. This process will terminate in at most n steps, and at each stage, you need to do a search over at most n^b sets. We'll call such a solution "b-local"

The idea that roughly governs both analyses is this: Consider the solution you're currently at (let's call it red) and the optimal solution (color it blue). Think of building a bipartite graph (this works slightly differently for the different problems) which contains the red and blue objects as vertices, and connects two (differently colored) objects by an edge if there's a range that contains both (for the hitting set problem) or if they intersect (for the MIS) (call this a 'locality property' as Mustafa and Ray do) Then they show some things:
  1. This graph is planar
  2. It "expands": neighborhoods of blue sets of size at most b are large
  3. As a consequence of 1,2, the size of the blue set is upper bounded by (1+1/sqrt(b))(size of red set)
See the resemblance to the max flow argument ? (or maybe not ;)). A reasonably quick corollary then shows that by setting b = 1/eps^2, the desired PTAS follows (since by 3), the optimal solution is reasonably close to the current solution. 2) holds by the locality argument, i.e if it weren't true, then there'd be some way to locally improve the red set using a "lookahead" of size at most b. 1) is case-by-case, but works for these problems. The proof of 3) itself requires separator machinery for planar graphs.

An upcoming paper by Gibson, Kanade, Krohn and Varadarajan applies the same idea for the 1.5D terrain guarding problem (one of the few nontrivial art gallery problems that has yielded to approximation attacks). They produce a planar graph that satisfies the locality property, and then apply the rest of the local search machinery as before. What's neat is that the machinery really does apply just as before: "all they have to do" (and I'm not trying to minimizing the effort here, just pointing out the smoothness of the technique) is construct the planar graph.

Some thoughts:
  • There's some room for improvement in the running time perhaps, at least down to a linear dependence on 1/epsilon in the exponent. We can't get a poly dependence because of strong-NP-completeness
  • I don't think the weighted versions of these problems would fall to the same "trick": indeed the Chan-Har-Peled paper considers a neat LP-based approach for the weighted MIS.
There have been numerous PTASs for geometric problems: the Arora/Mitchell TSP approximation is the most famous of course. But this technique is a relatively new kind of generic tool, and has demonstrated immediate applicability for different problems, which makes it noteworthy. Secondly, it's a way of reasoning about the efficacy of local search, which is always neat, since local search algorithms are generally not that complicated.

Saturday, June 13, 2009

SoCG 2009: Epsilon-nets and covering/hitting problems.

The 30,000 horsepower engine of approximation algorithms is linear programming. The fact that an LP gives you a bound on the cost of an NP-hard problem has been the major force behind ever more sophisticated methods for proving approximation guarantees.

Ironically enough, given the geometric nature of an LP, this engine has not been particularly useful for geometric approximation problems. The high level reason (somewhat over-simplified) is that it is generally hard to encode specifics about geometry in a linear program. What this means is that (for example) if you want to solve a geometric covering problem and can't encode the geometry in your LP, you're stuck with the log n approximation that combinatorial set cover gives you, inspite of the fact that there are numerous examples of problems where throwing geometry in actually improves the approximation bound considerably.

To put it mildly, this is a pain in the neck. It basically means that you have to design methods that work for a specific problem, and there are only a few general purpose techniques available, many of them outlined in this survey.

But if your problem can be framed as a covering/hitting problem, then there's a general body of work that has provided better and better results, (and for an added bonus, actually has an LP-formulation). The story dates back to a paper on iterative reweighting by Clarkson, and actually goes back even earlier if you jump to the ML literature: Arora, Hazan and Kale have a nice survey on this general approach.

But I digress: the paper by Bronnimann and Goodrich put the technique in the language we're familiar with today: in short,
Given a range space that admits epsilon-nets of size (1/epsilon)f(1/epsilon), you can solve the hitting set problem to an approximation of f(OPT).
In particular, range spaces that admit epsilon-nets of size 1/epsilon admit constant-factor approximations for the hitting set problem. This led to a "formula" for approximating these problems: find an epsilon-net of an appropriate size, and declare victory. There's been much work on constructing epsilon-nets since then, partly motivated by this.

Another angle to this story was put forward by Clarkson and Varadarajan in their 2005 SoCG paper: the brief summary from that paper was:
Given a range space in which you can bound the complexity of the union of a random subcollection of size r by rf(r), you can find an epsilon net of size (1/epsilon)f(1/epsilon).
This approach gave another tool for finding good approximations for covering problems (and in particular gave a nice constant factor approximation for a thorny guarding problem).

At SoCG 2009, Kasturi Varadarajan presented an improved bound along these lines: his main result improves the transfer complexity: modulo some technical details,
Given a range space in which you can bound the complexity of the union of a random subcollection of size r by rf(r), you can find an epsilon net of size (1/epsilon)log f(1/epsilon).
Although this doesn't improve results in the constant factor regime, where f() is a constant, it makes a significant difference when f() is non-constant.

The interesting story here is that Kasturi's result was simultaneous with another paper by Aronov, Ezra, and Sharir that just appeared in STOC 2009. They get the same bound without the technical details that limit his result, however their methods are quite different.

Returning to LPs, what I failed to mention is that the epsilon-net construction technique used by Bronnimann and Goodrich can be seen as a byproduct of an LP-algorithm, as was shown by Even, Rawitz and Shahar, and Philip Long before that (thanks, Sariel)

SoCG 2009: A neat lower bound example

I've had a number of posts queued up to write (for those who ask me "how do you get the time?", the answer is sometimes, "I don't !"). This SoCG was full of really interesting papers, and even collections of papers that together moved our understanding forward in very nontrivial ways. Definitely read David E's post from earlier today.

Gabriel Nivasch has been doing some great work on lower bound constructions. He recently had a paper on analysing upper and lower bounds for Davenport-Schinzel sequences (got a best student paper award at SODA 2009), and the kind of tight bounds he gets are very impressive (upto terms involving powers of alpha(n) in the exponent). Here, he presented joint work with Boris Bukh and Jiri MatousekMicha Sharir on a new lower bound for weak epsilon-nets.

A central construction here, that might be useful for other lower bound constructions is a very fast growing exponential grid. Roughly speaking, if we think of how it would map to a "regular grid", then the point (i1, i2, ....) on the [0..m] grid maps to the point whose jth coordinate is x_j = 2^{i_j * [m^{j-1} + m^{j-2} + ... + 1]d}. So things start getting really jumpy as the coordinates go up. A couple of observations do the heavy lifting for this grid:
  • A straight line segment between two points in the stretched grid maps to a collection of d line segments in the "regular" grid: d-1 that goes almost straight up from the first (say lower) point, and another that goes across the final dimension (it's easier to think of this recursively)
  • Convex sets in the stretched grid correspond to so-called "stair-convex sets" in the regular grid
  • A diagonal in the stretched grid maps to a curve that doesn't touch many points in the regular grid.
Basically, the grid construction allows them to reason about lower bounds by going back to the regular grid. They use these ideas to prove the lower bound for weak epsilon-nets, but also for some incidence problems (using the third property). It seems to me that this construction might prove useful for other lower bounds in the future as well.

Coming up next: max flows, local search and a plethora of PTASs (try saying that fast)...

Wednesday, June 10, 2009

Probability of a graph being connected...

I ran across the following question in a problem I'm working on, and thought I'd ping the collective wisdom of the blog-reader-o-sphere.
Given an undirected graph G with (possibly different) probabilities on edges, consider the usual random process where we pick each edge independently, but with probability p_e (this is the difference from the standard erdos-renyi model).

What is the probability that the resulting graph is connected ?

This seems possibly to have something to do with the spanning tree polytope, but it seems quite nontrivial. In general, I'm given a subset of vertices S in this graph, and want to ask for the probability that S forms a connected component. Making sure that S is disconnected from the rest of the graph is easy, but then the remainder of the probability comes from the answer to the above question.

I should add that the subset connectivity question is trivial if G is a tree: it's the higher connectivity case that makes things harder.

p.s More SoCG posts should be on their way in the next few days after I return.

Sunday, June 07, 2009

SoCG 2009: Day 0

At least in my mind, things have been somewhat overshadowed by the sad news of yesterday, thus preventing me (thankfully) from writing the more frivolous posts that I might have penned.

Today was the 25th anniversary celebration for SoCG (1985-2009), and we had four speakers to do a retrospective on the field.

David Dobkin was first, talking about how some of the first geometric algorithms came to be. He interspersed many fun facts about the early days of SoCG:
  • One of the reasons CG is tied to the STOC/FOCS community is because when Michael Shamos first asked him where to publish a geometry paper, STOC was the first conference that came to mind.
  • The Preparata-Shamos book only came about because (after Shamos left for law school) Ron Graham brokered a deal for Franco Preparata to write a book based on Shamos' thesis.
  • SoCG was almost called the Symposium on Computer Geometry (ugh).

It was interesting to see that even way back then, there was serious intersection between numerical algorithms, the graphics and vision folks, and the nascent geometry area. Some things haven't really changed. It also explains (for me) why there is somewhere deep down a sense that CG doesn't wholly situate within the larger algorithms community, but has a core identity that's slightly different.

One point that David made was very interesting. He talked about how Michael Shamos essentially created a problem book from scratch, listing basic problems in the new area of computational geometry, and methodically solving them one by one. He then asserted that it was the creation of this problem book that played a major role in the unification of the area. I think that makes a lot of sense. It's not so much that a list of problems got created. It's not too hard to do that. What I think was the critical factor was the listing of problems within-reach, that encouraged more people to start thinking about them. Muthu is very good at doing this with any field he jumps into, and I think he's able to jumpstart a lot of research for this exact reason. It's not easy to do: the problems have to be core to the area, but still not that difficult to solve.

Micha Sharir went next with a talk on the evolution of work in arrangements. I've always felt that work on arrangements seems to have reached the point of diminishing returns after some point: that new results were mildly interesting, but didn't necessarily hook into algorithm design in a nontrivial way. I still have somewhat of the same opinion after the talk, but it was good to see the evolution of the problems into higher dimensional surface arrangement questions.

Emo Welzl gave a nice overview of the exciting years between 1987-1992 when randomization began to bloom into a powerful tool for doing geometry. Of course, the very first randomized algorithm was a geometric one (Rabin's closest pair algorithm), but the really powerful methods came into being really with Clarkson's work, and the eps-net work of Haussler and Welzl. It was amusing that almost every single paper mentioned in Emo's talk was written or co-written by Ken: just goes to show his profound influence on the field both for new techniques, as well as new results.
For me, another great influence was Raimund Seidel's amazingly lucid exposition of backwards analysis: there's something to be said for crystal-clear writing and the impact it can have.

The final talk was by Kurt Mehlhorn, on the history that led to the development of LEDA and CGAL. I must confess that I've always felt rather intimidated by CGAL and LEDA: there's something about the phrase "traits class" that strikes fear into my heart. But Kurt's talk was immensely enjoyable: he gave several simple examples of inputs where floating point implementations of geometric algorithms fail spectacularly, and really reinforced the importance of precise implementations, something (if I may add) the rest of the algorithms community often can avoid if they're dealing with graphs or numeric data. I REALLY need to weave CGAL into the next incarnation of my geometry class: must be less lazy, must be less lazy, .....

Notes from the conference: I was fervently hoping that the organizers would lower expectation for next year, but no such luck :(. Impeccable organization, food, and free beer. Sigh......

I'll be tweeting occasionally from the conference. If you're reading this at the conference and want to do the same, please the hashtag #socg so all tweets can be collected in one place.

Saturday, June 06, 2009

Rajeev Motwani, R.I.P

Rajeev Motwani passed away unexpectedly yesterday. He will be remembered for many things: his influence on the creation of Google, the foundational randomized algorithms book that he wrote with Prabhakar Raghavan, his role in the PCP theorem, and his extensive presence in the Bay Area investor community. Within the theory community, he will also be remembered by the legion of students he trained, including your humble blogger. Update: Ashish Goel @ Stanford has a page where you can also post comments.

The influence an advisor has on their students can be subtle: as a grad student, I spent much of my time resisting, pushing back, trying to find my own voice. But more and more, as I look back, I see how he influenced my taste in problems and the sensibility I bring to my work.

He had incredible breadth to go along with his depth. I remember scouring his web page in the early years of grad school, and seeing papers in robotics, databases, compilers, parallel systems, as well as in algorithms. Having spent time working in more applied areas, I can appreciate now how difficult it must have been to do that, and have the kinds of impact he had. That kind of incisiveness, that could cut to the heart of a modelling problem, was a hallmark of his more "applied" work.

There's something special that comes with the ability to have a great career and train so many students. There are many famous researchers in the field who do one or the other, and Rajeev had a knack of catalysing the best from his students, and helping them succeed no matter what area they happened to work in. To this day, I'll catch myself thinking, "what did Rajeev do in this situation" when I'm interacting with my own students.

As an advisor, the best way to describe him was 'laid-back'. He wasn't the kind to breathe down your neck, or demand regular updates. But I felt the terror all the same when going into his office to describe what I'd been upto. But he'd sit there with those half-closed eyes staring off into space as I spoke, and then the observations would come, carefully crafted, and enough to make me scurry off and think hard before our next meeting.

As a teacher, he was brilliant. The very first time I took a class with him, I saw this. His style of presentation, how he organized his material, and even his impeccable boardwork: these are all things I teach my students today. His lectures weren't flashy: they were solid, and profound, and in his own mellow way, he was able to convey intuition, rigor and beauty with the kind of clarity one dreams of possessing. To flip the truism about doing and teaching around, he did, and taught even better as a result of it.

I had less contact with him after graduation than I should have had. I was doing the whole 'be your own person, strike out on your own' thing. I'd bump into him occasionally at conferences, but as he got more involved in his investment efforts, he stopped attending conferences regularly (or so it seemed) and even that contact dried up. As someone just said to me in an email today morning, 'Life is fleeting'.

It's a strange feeling, to lose your advisor so suddenly like this, and to hear about it on twitter, of all things. I'm not even sure what I'm feeling, except a sense of numbness, and a feeling that something has shifted, permanently, underneath.

To his wife Asha and his family: my deepest condolences.

Wednesday, June 03, 2009

New(?) NSF policies

While everyone recovers from what appears to be a gripping STOC, I thought I'd flag two NSF-related items that came my way.

The first, via FSP, is on post-doc mentoring. Now I've always been told that the NSF frowns on inserting post-doc support in proposals, but apparently they don't mind it, and they have a new policy in place for proposals including postdoc support:
Starting this year, NSF proposals that request funding for postdoctoral researchers must include a statement about how the postdoc(s) will be mentored. For a brief time this mentoring statement was supposed to be part of the body of the proposal, but, perhaps in response to complaints, the postdoc mentoring text is now a supplementary document, up to a page in length. As with the required Broader Impacts component of proposals, NSF is serious about the mentoring statement: proposals that request funding for postdocs but that do not contain the mentoring supplement will not even be reviewed.

I'm curious as to what experience people have had with postdoc support in proposals, and whether this is really new, or just a restatement of things that were already tacitly known.

The second item affects us more broadly, and perniciously. A new policy guideline of May vintage appears to suggest (it's somewhat ambiguous) that equipment purchased for grants must be MUCH more closely tied to grant activities than previously expected. Specifically, it essentially rules out the purchase of machines for general PI use (like laptops etc), and raises a number of questions about how to purchase software for old machines, how to upgrade new machines etc. Again, I'm curious to know if people have already run afoul of this new rule, and what the deal is.

Disqus for The Geomblog