Thursday, February 06, 2014

On "reverse engineering" algorithms, contextual readings, and the invisible choices that add value to research.

tl;dr: Understanding the value in a piece of work (a paper, an algorithm, a system) is not just a matter of understanding the various parts and how they fit. It is also a matter of understanding why the problem is decomposed that way. This observation has implications for how we read, write and evaluate research. 

Nick Seaver is an cultural anthropologist at UC Irvine.  He just wrote an article at Medium on reverse engineering algorithms. In it he distinguished between the factual reconstruction of an algorithm ("how does it work") from a more probing examination of the way it was designed ("why it is broken down in a particular way").

He comes to a conclusion that is quite sound and not at all controversial:
I want to suggest here that, while reverse engineering might be a useful strategy for figuring out how an existing technology works, it is less useful for telling us how it came to work that way. Because reverse engineering starts from a finished technical object, it misses the accidents that happened along the way — the abandoned paths, the unusual stories behind features that made it to release, moments of interpretation, arbitrary choice, and failure. Decisions that seemed rather uncertain and subjective as they were being made come to appear necessary in retrospect. Engineering looks a lot different in reverse.
But along the way, he makes an insightful observation about the very idea of structuralism as applied to algorithm design: namely, the idea that by breaking down the parts of the algorithm and understanding the pieces and how they fit together we can "understand" the algorithm at some higher level.
When you break an object down into its parts and put it back together again, you have not simply copied it — you’ve made something new. A movie’s set of microtags, no matter how fine-grained, is not the same thing as the movie. It is, as Barthes writes, a “directed, interested simulacrum” of the movie, a re-creation made with particular goals in mind. If you had different goals — different ideas about what the significant parts of movies were, different imagined use-cases — you might decompose differently. There is more than one way to tear apart content. (emphasis added)
In other words, the value of a reductionist analysis of a piece of work is not just in understanding the parts and how they fit, but in understanding the choices that led to that particular decomposition.

I think there are important lessons here for anyone involved in the creation and evaluation of new work. In particular:

Reading: While this is mostly advice for students, it applies whenever you're reading unfamiliar material. The particular form of the paper -- how the proofs are structured, how the system is built, or how the algorithm components work -- is a choice made by the authors and should not be left unexamined or unquestioned. All too often a student will read a paper as a factual news report, rather than reading it as a specific interpretation of a problem/algorithm/theorem that could lend itself to multiple interpretations (just a few days ago I was discussing some JL results with a colleague and realized that we had totally distinct and valid interpretations of some recent work in the area).

Reviewing: It's very easy (and common) to read a paper, understand the proofs, and then be completely underwhelmed by it: the dreaded "too simple" feeling that leads papers to get rejected from unnamed theory conferences. This is especially true when you're not familiar with an area, and don't realize that it was the choices the author(s) made that made the paper look simple. And so your assessment has to factor that choice in, rather than taking it for granted.

Writing/Presenting: Of course all of this impacts how you as a creator choose to present your work. There is not one way to tell a story (in writing or in a talk). But you do make choices about how to tell it. And so it's important to make those choices visible (either by explaining why they are needed, or why different choices don't work, or how they get around certain roadblocks) so that your work can receive a fair evaluation.

This can be excruciating, especially when (like many good papers) your work admits multiple interpretations, and you have to gamble on the right one to please the fickle and time-stretched reviewer. But be mindful of these choices in your work.

p.s On a separate note, it's intriguing to me how so many tools from the study of literature (narrative, contextual analysis, critical reading) show up in other forms of writing far removed from the humanities. This is yet another reason why I'm saddened (even though I'm on the "right" team) by the relentless focus on STEM in opposition to the liberal arts. It's a particularly virulent form of divide-and-conquer (the British colonial technique, not the algorithmic paradigm).

p.p.s Has it really been three months since I wrote anything ? I must be working !!

Disqus for The Geomblog