Sunday, February 22, 2015

STOC 2015 Call for Workshops and Tutorials

I've had two week from hell with proposal deadlines, paper deadlines, faculty candidates and ever-more demanding hungry cries of the baby chickadees  meetings with my students.

Thankfully, all that is now in the past and I can sally forth to the next deadline. Which is, you ask ?

STOC 2015 will have a workshops and tutorials day on Sunday June 14, the day before the conference starts. +Sanjeev Khanna and +Chandra Chekuri are running the event, and they want your proposals.
We invite groups of interested researchers to submit workshop or tutorial proposals. The goal of a workshop is to provide an informal forum for researchers to discuss important research questions, directions, and challenges. Connections between theoretical computer science and other areas, topics that are not well represented at STOC, and open problems are encouraged as workshop topics. Organizers are completely free to choose their workshop formats (invited speakers, panel discussions, etc.). The program for June 14th may also involve tutorials, each consisting of 1-2 survey talks on a particular area, and we welcome tutorial proposals as well.
Deadline for submissions is Mar 15, 2015. For more details on the format etc, visit the workshops/tutorials page

Monday, February 16, 2015

Research customs for the 21st century.

 I've begun to notice a number of customs that seem unique to our modern process of doing collaborative research in the 21st century. Most of them are technology-driven, and most of them involving annoying debates about the multitude of choices available for collaborating.

  • Research Meetings: whether to do them over Skype, or Google Hangouts, or or, or even with phones on a conference call. 
  • Audio/Video/Chat: if over skype/G+, whether to do audio, or video, or chat. And then the obligatory pre-videconference primping to look presentable (or the reliance on bandwidth failure to NOT go on video)
  • Coordination: careful calculations among multiple time zones and daylight savings times to plan meetings. The use of Doodle, Google calendar, or even random emails passed around haphazardly to plan said meetings.
  • Writing I: the protracted negotiations over whether to use git, svn, rcs, or cvs, or Dropbox, or random emails passed around haphazardly (and yes I've done all of this, sometimes at the same time)
  • Writing II: if using an actual modern VCS, then the equally protracted negotiations over who's hosting it, who needs account access, and where public keys need to be placed, and why CS researchers in the 21st century STILL need tutorials on ssh. 
  • Writing III: Dealing with comments on a writeup: as fixmes, as todonotes, as issues in the git repository hosting the document, or as random emails passed around haphazardly. 

And of course all the collaborator "digital" personalities that emerge irrationally and violently. Alice hates using bibtex, and Bob will only use personally crafted bibtex files. Charlie loves his own special fonts and will have raging debates over $\varepsilon$ vs $\epsilon$. Dan has never used version control and doesn't see the point. Erin handcoded her own version control system in a cutting-edge fragment of Haskell and refuses to use anything else. Frank hates online meetings, and Mallory only thinks online during a meeting. Oscar insists that Postscript is Turing-complete and is therefore sufficient for all drawing needs. Peggy insists that git rebase is Turing-complete and is therefore sufficient to fix all commit disasters... eventually.

Note to all my collaborators who I'm currently writing papers with: you are awesome and have NOTHING AT ALL to do with this list.

Wednesday, February 11, 2015

One Body Problems

It's hiring season, and we've all heard about two-body problems.

But you may not have heard of the one-body problem:
Is the place a single person takes a job in likely to have enough other single people around to facilitate searching for a partner ? 
I was 'spoken for' before I even finished my Ph.D, and never had to deal with this (rather, I had to work with the more traditional (ha!) two-body problem).

Friday, February 06, 2015

Friday chest-thumping.

  • My paper with Amirali Abdullah (or should I say Amir's paper with me) got into STOC ! One aspect not discussed in the linked blog post is the connection to Partial Match (a notoriously hard problem in the data structures literature).  In brief, our result allows for a "smooth-ish" interpolation between approximate near neighbor lower bounds in $\ell_1$ and general partial match lower bounds, reinforcing the intuition that Partial Match is an "extremely asymmetric" nearest neighbor problem. 
  • Another paper with 'the Streaming Kings' (said to the sound of a flamenco strum) of Cormode, Chakrabarti, McGregor and Thaler got into CCC (and this is my first paper at Complexity). I'll have more to say about this work in a separate post: in brief, we looked at streaming interactive proofs (of the kind first developed by Cormode, Thaler and Yi) where you have a prover and a streaming verifier and wish to verify a computation with constant rounds of communication. There's a 3-round nearest neighbor protocol in this paper, as well as a number of subtle results about the nature of multi-round communication protocols in the streaming setting. 
"Well Suresh, you have two new papers, what are you going to do next" ?

Wednesday, February 04, 2015

No one is in control...

The New Scientist has a cover this week on the way in which algorithms have taken over our lives. They interviewed some of the participants at the FATML workshop I participated in (thanks to Solon Barocas for sending the reporter our way), and Sorelle Friedler got some nice quotes in regarding the work she, Carlos Scheidegger and I have been doing.

The article is currently paywalled, but a friend of mine sent me the text. The relevant paragraph, lightly edited (the "she" here is Sorelle):

By understanding the biases inherent in the underlying data, she hopes to eliminate bias in the algorithm. Her system looks for correlations between arbitrary properties – like height or address – and demographic groupings like race or gender. If the correlation is expected to lead to unwanted bias, then it would make sense to normalise the data. It is essentially affirmative action for algorithms, she says.
I'm not sure I'd describe our work as affirmative action :), but that's a longer argument.

Monday, February 02, 2015

(the dangers of) Data Mining hits the Super Bowl (ads) and prime time comedy

I get strange looks from people in my work circles when I admit to watching American football, and stranger looks from people in my not-work circles when I admit to watching the Super Bowl for the game.

I didn't avoid the ads though. And two ads (both from Esurance) riffed off the idea of segmenting: that the business goal of data mining customer data is to segment people into little categories that can be targeted the same way. The ads are a little heavy-handed, but then I don't think most people actually know how data mining works to segment them. And getting Bryan Cranston to play a "sort-of" pharmacist was brilliant.

Esurance: "Sorta Pharmacy"


Esurance: "Sorta Mom"

And finally, an entire 30 minute show centered around the idea of a Facegoogle-like company invading our privacy by reading our texts/emails etc. If you don't watch Parks and Recreation, then it's too difficult to summarize six-odd seasons of shenanigans, but in brief: A facebook-google mashup-like company is setting up shop in a small town in Indiana, and is sending people free gifts via Amazon-like shopping drones based on analyzing their private communication, and then shenanigans ensure. If you can't stand the horror of watching network TV, then there's a short clip here:


 and the full episode is on Hulu (for now) here.

Thursday, January 29, 2015

More FOCS 2014-blogging

In the spirit of better late than never, some more updates from Amirali Abdullah from his sojourn at FOCS 2014. Previously, he had blogged about the higher-order Fourier analysis workshop at FOCS.

I'll discuss now the first official day of FOCS, with a quick digression into the food first: the reception was lovely, with some nice quality beverages, and delectable appetizers which I munched on to perhaps some slight excess. As for the lunches given to participants, I will think twice in future about selecting a kosher option under dietary restrictions. One hopes for a little better than a microwave instant meal at a catered lunch, with the clear plastic covering still awaiting being peeled off. In fairness to the organizers, once I decided to revert to the regular menu on the remaining days, the chicken and fish were perfectly tasty.

I will pick out a couple of the talks I was most interested in to summarize briefly. This is of course not necessarily a reflection of comparative quality or scientific value; just which talk titles caught my eye.

The first talk is "Discrepancy minimization for convex sets" by Thomas Rothvoss. The basic setup of a discrepany problem is this: consider a universe of $n$ elements, $[n]$ and a set system of $m$ sets ($m$ may also be infinite), $S = \{S_1, S_2, \ldots, S_m \}$, where $S_i \subset [n]$. Then we want to find a $2$-coloring $\chi : [n] \to \{-1, +1 \}$ such that each set is as evenly colored as possible. The discrepany then measures how unevenly colored some set $S_i \in S$ must be under the best possible coloring.

One fundamental result is that of Spencer, which shows there always exists a coloring of discrepancy $O(\sqrt{n})$. This shaves a logarithmic factor off of a simple random coloring, and the proof is non-constructive. This paper by Rothvoss gives the first algorithm that serves as a constructive proof of the theorem.

The first (well-known) step is that Spencer's theorem can be recast as a problem in convex geometry. Each set $S_i$ can be converted to a geometric constraint in $R^n$, namely define a region $x \in R^n : \{ \sum_{j \in S_i} | x_j | \leq 100 \sqrt{n} \}$. Now the intersection of these set of constraints define a polytope $K$, and iff $K$ contains a point of the hypercube $\{-1 , +1 \}^n$ then this corresponds to the valid low discrepancy coloring.

One can also of course do a partial coloring iteratively - if a constant fraction of the elements can be colored with low discrepancy, it suffices to repeat.

The algorithm is surprisingly simple and follows from the traditional idea of trying to solve a discrete problem from the relaxation. Take a point $y$ which is generated from the sphercial $n$-dimensional Gaussian with variance 1. Now find the point $x$ closest to $y$ that lies in the intersection of the constraint set $K$ with the continuous hypercube $[-1, +1]^n$. (For example, by using the polynomial time ellipsoid method.) It turns out some constant fraction of the coordinates of $x$ are actually tight(i.e, integer valued in $\{-1, +1 \}$) and so $x$ turns out to be a good partial coloring.

To prove this, the paper shows that with high probability all subsets of $[-1 +1]^n$ with very few tight coordinates are far from the starting point $y$. Whereas with high probability, the intersection of $K$ with some set having many tight coordinates is close to $y$. This boils down to showing the latter has sufficiently large Gaussian measure, and can be shown by standard tools in convex analysis and probabilitiy theory. Or to rephrase, the proof works by arguing about the isoperimetry of the concerned sets.

The other talk I'm going to mention from the first day is by Karl Bringmann on the hardness of computing the Frechet distance between two curves. The Frechet distance is a measure of curve similarity, and is often popularly described as follows: "if a man and a dog each walk along two curves, each with a designated start and finish point, what is the shortest length leash required?"

The problem is solvable in $O(n^2)$ time by simple dynamic programming, and has since been improved to $O(n^2 / \log n)$ by Agarwal, Avraham, Kaplan and Sharir. It has long been conjectured that there is no strongly subquadratic algorithm for the Frechet distance. (A strongly subquadratic algorithm being defined as $O(n^{2 -\delta})$ complexity for some constant $\delta$, as opposed to say  $O(n^2 / polylog(n))$.)

The work by Bringmann shows this conjecture to be true, assuming SETH (the Strongly Exponential Time Hypothesis), or more precisely that there is no $O*((2- \delta)^N)$ algorithm for CNF-SAT. The hardness result holds for both the discrete and continuous versions of the Frechet distance, as well as for any $1.001$ approximation.

The proof works on a high level by directly reducing an instance of CNF-SAT to two curves where the Frechet distance is smaller than $1$ iff the instance is satisfiable. Logically, one can imagine the set of variables are split into two halves, and assigned to each curve. Each curve consists of a collection of "clause and assignment" gadgets, which encode whether all clauses are satisfied by a particular partial assignment. A different such gadget is created for each possible partial assignment, so that there are $O*(2^{N/2})$ vertices in each curve. (This is why solving Frechet distance by a subquadratic algorithm would imply a violation of SETH.)

There are many technical and geometric details required in the gadgets which I won't go into here. I will note admiringly that the proof is surprisingly elementary. No involved machinery or complexity result is needed in the clever construction of the main result; mostly just explicit computations of the pairwise distances between the vertices of the gadgets.

I will have one more blog post in a few days about another couple of results I thought were interesting, and then comment on the Knuth Prize lecture by the distinguished Dick Lipton.

Disqus for The Geomblog