Tuesday, July 12, 2005

Word Algebras and Strange Grammars

Ars Mathematica recently announced the issue of Algebraic Combinatorics on Words, by the anonymous group of authors called 'M. Lothaire'. This is a sequel to the original Combinatorics on Words.

It may come as a surprise to some (at least it did to me) that you can define interesting algebraic structures on words (sequences of letters from a (finite) alphabet). Combinatorics on Words is a great book that builds up the theory of free algebras over words. It was in this book that I first came across a beautiful proof of the precursor of the Robertson-Seymour program:

Consider any infinite sequence of finite-lengthwords w1, w2, ..., over a finite alphabet. There always exists an i and a j > i such that wi is a subsequence of wj.
Viewed another way, what this says is that if we define a partial ordering on strings such that one string is greater than another if it contains the second as a substring, then

there is no infinite antichain

Why is this a precursor of the Robertson-Seymour program ? Well, the above fact shows that the set of strings with the given order is a well-quasi order, which in turn means that any filter (i.e any set that is "upward" closed, or for each element x contains all elements y greater than x) has a finite basis, which means that there is a finite set of strings whose upward closure defines the filter). Conversely, this means that any ideal (a set of strings closed under the substring operation) has as complement a filter, and thus membership in the ideal can be described in terms of "forbidden" strings (the elements of the finite basis)

The Robertson-Seymour program can be stated as: graphs with the order defined by minors (G is greater than G' if G' is a minor of G) form a well-quasi order, which thus means that any upward closed family has a finite basis, and thus any idea (any minor-closed family) has a finite set of excluded graphs that defines it.

The result on strings is not hard to prove: I used it in this paper to show that membership for a certain grammar was decidable. Unfortunately, we were never able to prove a stronger result, and the grammar seemed so simple that we felt there must be a more efficient membership result.

Here's the grammar, for your entertainment.

All production rules are of the form
a -> ba
cd -> d

where a, b, c, d are elements of the alphabet. There is a special symbol x, and the language of such a grammar is defined as all strings w such that wx ->* x. Notice that in the productions, a symbol either inserts or deletes a new symbol to its left. We called these grammers (somewhat clunkily) "leftist grammars".



Update (12/23/05): Leftist grammars are not context-free. In FCT 2005, Tomasz Jurdzinski and Krzysztof Lorys present a context-sensitive language that can be expressed as a leftist grammer.

Categories
Post a Comment

Disqus for The Geomblog