Compressed sensing is a fascinating topic that (depending on how you look at it) is influenced by approximation theory, sketching techniques, compression, and geometry. Posts by Terry Tao and Igor Carron are good places to learn more about this, and the paper repository at Rice is another great resource. What I wanted to talk about in this post however, is an interesting result by Donoho and Tanner that connects back to structures that geometers are intimately familiar with, as well as helping me understand the nature of convex hull lower bounds.

Let's briefly sketch the motivating problem. Suppose you're given a signal s that you want to express in terms of linear combinations of basis functions drawn from a dictionary. A key idea in compressed sensing is that the dictionary itself is redundant, in that there are multiple ways of representing a signal via linear combinations of basis functions. A useful analogy is that of barycentric coordinates. We can represent a point in the plane uniquely via a linear combination of three non-degenerate points (i.e a simplex), but this stops being true if we have more than three points that form the "basis".

Given that there are multiple representations, which is the best one ? The next important idea is that of sparsity: namely, that the number of coefficients needed is very small compared to the size of the dictionary. Again, returning to our analogy, it would be as if the dictionary had 100 points in it, and all we needed to determine was a representation using a linear combination of three points.

So one rendition of the compressed sensing problem is: Suppose there's a signal that you interact with via a 'measurement matrix' (which means you supply a vector of coefficients, and get the value of the inner product of the signal coefficients with the measurement coefficients; each row of the matrix is one such measurement). You retain the output values y; formally, you can write this as y = Ax, where x is the unknown signal vector. Can you find an x with the fewest number of nonzero coefficients (i.e can you minimize the l0 norm) that satisfies this equation ?

This problem is NP-hard; intuitively, you can encode SUBSET SUM in this problem, by setting up an appropriate y vector. Suppose however that we relax the optimization, and rather than asking to minimize the l0 norm, we ask to minimize the l_1 norm of the target vector x ? It turns out that this problem can be solved via linear programming.

Under what conditions then can the l_1 solution yield the l_0 solution ? Suppose that we could reconstruct the vector x uniquely from y. In that case, it doesn't really matter what cost function we use: if there's only one solution, then we'll find it either way.

This is where neighborly polytopes enter the picture. A polytope in R^d is said to be m-neighborly if any m vertices lie on a face of the polytope. Note that the simplex is always d-neighborly: essentially this definition tries to capture the idea of a "simplicial core" of a polytope: crudely put, an m-neighborly polytope has an m-simplex encoded within it somewhere.

If we think of the measurement y as being a point reconstructed from m coefficients, then if y lies on the m-simplex, there are unique coefficients we can assign to the vertices of the simplex to reconstruct it. Donoho and Tanner showed that this is essentially sufficient: namely that the polytope AT^d (where T^d is the d-simplex) is m-neighborly if and only if there's a unique reconstruction of y (implying that the l_1 optimization yields the desired sparse vector).

This is quite an elegant result in its own right: further, they generalize this to a "robust" variant, where the polytope might be approximately m-neighborly, in which case they show that with high probability a unique solution can be reconstructed (this latter result resembles the Johnson-Lindenstrauss theorem). What's interesting about these results is that they connect a very old concept in geometry (neighborly polytopes) to a reconstruction algorithm in compressed sensing.

A side effect of understanding neighborly polytopes from this perspective is that I finally saw why cyclic polytopes "work" as a lower bound for estimating the size of a convex hull. We can ask what the largest value of m can be for which some d-dimensional polytope is m-neighborly. It turns out that the answer is

Why is this interesting ? If we go back to the intuition of m-neighborly polytopes containing an m-dimensional simplicial core, it becomes clear. All vertices of a simplex lie on the convex hull. If a polytope is m-neighborly, it should come as no surprise that its convex hull has size n^m. We thus recover the lower bound for a convex hull size in d dimensions from this fact.