Wednesday, June 06, 2007

SoCG 2007: Geometric Views of Learning

I've been fairly remiss in my SoCG blogging; I'll blame it on having session chair duties, and not wanting to lug my laptop around.

The invited talk was by Partha Niyogi from the University of Chicago on 'A Geometric Perspective on Machine Learning'. You may remember his work from my earlier post on the estimation of the surface area of a convex body (read the comments). More importantly, he is part of the group that developed a method known as Laplacian Eigenmaps for learning a manifold from a collection of data.

Manifold learning is a new set of problems in machine learning that has interesting connections to algorithms and geometry. The basic problem is as follows. Given a collection of (unlabelled) data inhabiting some high dimensional space, can you determine whether they actually lie on some lower dimensional manifold in this space ?

Why do we care ? The problem with any data analysis problem in high dimensionality is the rather poetically named, 'curse of dimensionality' which basically says that any interesting data analysis algorithm runs in time exponential in the dimension of the data. For data that lives in 100s of dimensions, this is rather bad news.

However, "data that lives in 100 dimensions" is really an artifact of the way we represent data, slapping on dimensions willy-nilly for every attribute that might be relevant. What one often expects is that data doesn't really lie in 100 dimensions, but in some lower dimensional manifold of it. A beautiful example of this was given by Partha in his talk: he described the problem of inferring a sound wave signal generated at one of a tube by listening in at the other hand. By Fourier analysis, you can think of both signals as living in an infinite dimensional space, but suppose we now vary the tube length, for a fixed input signal. Then the output signal varies smoothly along a curve (i.e a 1-d manifold) in this infinite dimensional space.

"So what ?" one might ask. The problem is that the standard method of doing data analysis is to translate the problem of interest into some property of the distances between points in the high dimensional space. If the data really lies on some kind of curved surface, then the "distance in ambient space" does not reflect the true structure of the data. What we really need is "distance along the manifold", which we could do if we could reconstruct the manifold !

The key idea of the Laplacian Eigenmaps work is this: If you set up an appropriate weighted graph on the data points (where each edge has a weight that is exponentially related to the distance between the points) and compute the Laplacian of this graph, you get a approximation that converges (as the data size increases) to the Laplacian of the underlying manifold !! This assumes that that the data was sampled uniformly (or mostly uniformly) from the manifold. Moreover, the convergence happens at a rate that depends only on the dimension of the manifold, rather than the dimension of ambient space.

There are many ramifications of this idea, that connect to shape modelling, homology, and even volume estimation. But it reinforces the idea of the Laplacian as a key geometric construct that can be modelled combinatorially in a consistent way.

3 comments:

  1. For a more general extension of the idea check out Lafon's and Maggioni's (among others) work on Diffusion maps and diffusion geometry.

    ReplyDelete
  2. "This assumes that that the data was sampled uniformly (or mostly uniformly) from the manifold."

    In this context, is uniform sampling the same as random sampling?
    I'm not very familiar with manifold learning methods -- from what I have read, linear random projections from a high dimensional space to a sub-manifold preserve distance and angle (Johnson-Lindenstrauss lemma). The Laplacian in a non-linear function. Is there something similar to the JL lemma for non-linear embeddings?

    ReplyDelete
  3. I'd view these as fundamentally different operations. In the Laplacian case, you're hoping that the data (as presented to you) was from a well-behaved sampling process. In the case of JL-type transformations, you construct a random transform (via Gaussians, or orthogonal transforms, or whatever)

    ReplyDelete

Disqus for The Geomblog