Sunday, January 23, 2011

ALENEX/ANALCO

A few quick hits from ALENEX, or SODA day 0:

  • Moraru and Anderson used Bloom filters in a nifty way to implement exact pattern matching where you have a large set of patterns and an even larger text. The idea was to do a first pass over the text after storing all the patterns in a Bloom filter. Every subsequence of matching text was stored in a second Bloom filter, and in a second pass, all the patterns were run over this Bloom filter to take care of false positives. A final "exact" pass did the trick (at this point both sets are small enough to be reasonable). They have a companion paper at NSDI (which is a pretty good networking conference) on using this for malware detection, and that's a good example of pairing nice algorithms engineering with some interesting applications. 
  • Chimani, Woste, and Böcker were looking at the 1-median problem on a hamming space, and showed that the simple integer programming formulation actually does great in practice, when you throw CPLEX at it. This was surprising to me on two levels: firstly, that CPLEX is actually free for academic use (who knew!) and that such a simple approach is so effective.

    It's not the "sexiest" thing in the world to solve algorithms problems in practice by using large industrial strength packages. However, both CPLEX and  SAT solvers are examples of tools that can be used in practice to solve fairly intractable problems. It still takes a lot of engineering and skill to make the heuristics work well, but it's something that we should be using as a matter of course when designing heuristics before trying to invent an algorithm from scratch.
  • Stanton and Pinar had some experimental results (and some theory) on sampling from the space of graphs that have a prescribed joint degree distribution. While degree sequences are all the rage when trying to model various "naturally occuring" graphs like router graphs or social network graphs, there's a body of work that notes that graphs with the same degree distribution can have very different properties, and that in fact statistics on the number of edges connecting nodes of certain degrees (i.e higher-order statistics on degrees) are even more relevant.They propose a simple Markov chain that allows them to sample from the space of all graphs having a prescribed joint degree distribution, and while they don't yet appear to have theoretical results on the convergence of this chain, it converges quicly in practice.
Other notes: I'll be using the hashtag #soda2011 on twitter during the day. If you're tweeting from SODA (an don't want to let the NIPS tweeters show us up!), do use this hashtag as well. 
    Post a Comment

    Disqus for The Geomblog