## Wednesday, May 28, 2008

### P vs NC IV: Linear PRAMs I

1. Prelude

Taken together, the P vs NC paper is quite intimidating in its size and scope. And yet, when it's cut down to bite-sized blogging chunks, the pieces seem to follow inexorably in a manner that leaves one asking, 'how else could it have been'. There's the key inspiration of using parametric complexity, the magic trick we'll see more of today. But the rest of it uses the kind of geometric arguments that are familiar to anyone who's read papers on the complexity of arrangements. Make no mistake: taking care of issues of bit complexity requires delicate arguments. But the main thrust of the approach is surprisingly easy to explain, ONCE we've accepted the magic trick.

In this post and the next, we'll begin to see glimmers of the WHY for the parametric argument. In brief, using parametric complexity allows us to navigate an affine subspace of the space of all possible inputs, and then we let geometry take over and guide us.

2. Linear PRAMs

We're close to seeing the main killer argument at the heart of the paper. But as with many arguments, it's easier if we look at a skeleton argument applied to a somewhat simpler case. We already saw a toy example of this argument, and now we'll move to a more nontrivial model, that of linear PRAMs.

In the first post, we defined a linear PRAM as a PRAM-without-bitops in which all multiplications are by a constant. We'll see a slightly stronger lower bound for computations in this model. Specifically, we will show that for a problem with parametric complexity $\phi(n, \beta(n))$ and bitsize $\beta(n)$, there exists a large enough constant b such that the problem cannot be solved on a linear PRAM with p processors in time $\log \phi/(b \log p)$. This generalizes the bound that will hold for the stronger model: namely, time $\sqrt{\log \phi}/b$ with $2^{\sqrt{\log \phi}/b}$ processors.

Substituting what we've already seen for the parametric complexity of max-flow, we recover a lower bound of $n/(b\log p)$ time using p processors, for an n-node network.

Notes:
• For clarity, I'm skipping over the specification of bitlengths: these will fold in later, but throwing in all these numbers right now confuses the flow.
• As before, the lower bound applies for approximations and randomization: how this happens will be discussed later on.
3. Bounding the shape and number of branchings

The high-level proof strategy first requires us to bound the number of possible branchings of an algorithm in this model, and then requires us to describe the "cells" of these branchings in some geometric manner. As we did earlier, we'll say that two inputs x, x' are t-equivalent if the machine executes the same set of instructions on x and x' for the first t steps of the computation.
This defines an equivalence relation, and our goal will be to estimate the number of equivalence classes in this relation (as a function of t).

Now let's introduce the parametrization. Suppose that each input x is a function of d parameters $z_1, \ldots z_d$, where each particular input variable $x_i$ is expressed as a linear function of the parameters. We'll further assume that these linear forms are integral, and the parameters range over integer values with some bounded bitlength.

The partitioning of the inputs x into equivalence classes induces a partitioning of the parameters in the obvious way (two parameter vectors z, z' are t-equivalent if the corresponding x, x' are). Let $\sigma(t)$ denote the number of t-equivalent classes in the parameter space. The main result we can show now is that:
For all t,$\sigma(t)$ is bounded by $O(p^{dt})$
Here's the proof argument (and for those of you who've waded through parametric search, this will be familiar): First of all, remember that since all operations are linear (since one term of each multiplication is a constant), each arithmetic operation is some linear function of the input. From this, and the assumption that memory pointers are not functions of the input (this is the circuit view of the computation), it follows that the contents of each memory location are a linear function of the equivalence class the input lies in, and the time t.

Now imagine simulating the execution of the algorithm on an as-yet unspecified parameter vector z (as we do in parametric search). What this means is that arithmetic operations are linear forms defined over the symbolic value of the parameter. When a branch occurs, the branch outcome is determined by comparing two linear forms, which reduces to testing the sign of a linear expression. Let's take some z in a fixed t-equivalence class C. From the above remarks, each branch involves testing at most p linear forms on z, each of which can be viewed as a hyperplane in $R^d$. If we now look at the arrangement of these hyperplanes, then each cell represents a fixed set of decisions for the branch outcomes. By standard arguments, the complexity of this arrangement is $O(p^d)$.

Now how does this arrangement evolve at the next time step ? Clearly, any single piece of the equivalence class C can be split by the arrangement into smaller pieces, based on decisions that are made at the next time step. But from above, there are only $p^d$ such splits possible. Therefore, $\sigma(t+1) \le O(p^d)\sigma(t)$, yielding the desired bound.

A corollary is that if we track any particular t-equivalence class, it is split by at most pt linear forms (one for each time step and each processor), and so it is sufficient to check the signs of these forms to determine whether any particular z belongs to C.

Finally, taking a union over all possible equivalence classes, we get the following result:
If a set of inputs parametrized by d parameters is accepted in the linear PRAM model with p processors in t time, then there is a way of partitioning$R^d$ using$O(p^{dt}pt)$ hyperplanes, so that each cell of the resulting arrangement can be labelled 0 or 1 so that an input z is accepted iff it lies in a cell labelled with a 1.
In the next part of this argument, we will show that a problem with high parametric complexity cannot be represented this way: that is, there must be some cell that contains inputs in the language, and inputs not in the language.