Before you read further, do check out posts by DavidE (II), Michael Mitzenmacher (I, II), and a "stranger in a strange land" take by Hal Daume, my colleague at Utah and an ML/NLP person whose blog you should definitely follow if you're interested in such topics.
There's been a lot of work on clustering in high dimensional spaces, and I want to say a lot more about this, especially the "abstract framework" results of Kumar, Sabherwal and Sen, but that will have to wait for another post. For now, I want to highlight the paper, "Clustering for Metric and Non-Metric distance measures" by Ackermann, Blomer and Sohler. Essentially what they do is take the framework for approximate linear time clustering in high dimensions, and remove the need for either symmetry or triangle inequality in the distance measure, replacing it by a set of sampling conditions that have a more statistical feel to them. Doing this allows them to cluster with more general distance functions, including the relative entropy or Kullback-Leibler divergence. I'm a sucker for results that illustrate "what's going on", especially in the realm of approximate high-D geometry, and this is a great example of attempting to get to the core strucutural properties needed to make an algorithm work.
- After many years, I actually missed the business meeting this time, so no drinking games or updates. From what I hear, Austin is the first choice for the venue in 2010 (we're in NYC for 2009). Claire Mathieu is the PC chair for SODA 2009.
- Overall, I really enjoyed SODA this year: somehow I felt that there were far too many interesting papers (too many for me to attend the talks, that is), and the papers were of really high quality.