“Real Artists Ship”

Colin Johnson’s blog

Archive for the ‘Making’ Category

The Extensional Revolution

Wednesday, July 16th, 2014

We are on the threshold of an extensional revolution.

Philosophers draw a distinction between two ways of describing collections of objects. Intensional descriptions give some abstract definition, whereas extensional descriptions list all examples. For example, consider the difference between “the set of all polyhedra that can be made by joining together a number of identical, regular, convex polygons with the same number of polygons meeting at each vertex” (intensional), and “the set {tetrahedron, cube, octahedron, dodecahedron, icosahedron}” (extensional).

Despite its claims to be (amongst other things) the science of data, computer science has been very intensional in its thinking. Programs are treated as realisations of descriptive specifications, satisfying certain mathematically-described properties.

As more data becomes available, we can start to think about doing things in an extensional way. The combination of approximate matching + the availability of large numbers of examples is a very powerful paradigm for doing computing. We have started to see this already in some areas. Machine translation of natural language is a great example. For years, translation was dominated by attempts to produce even more complex models of language, with the idea that eventually these models would be able to represent the translation process. More recently, the dominant model has been “statistical language translation”, where correlations between large scale translated corpora are used to make decisions about how a particular phrase is to be translated. Instead of feeding the phrase to be translated through some engine that breaks it down and translates it via some complex human-built model, a large number of approximations and analogies are found in a corpus and the most dominant comparison used. (I oversimplify, of course).

More simply, we can see how a task like spellchecking can be carried out by sheer force of data. If I am prevaricating between two possible spellings of a word, I just put them both into Google and see which comes out with the most hits.

Once you start thinking extensionally, different approaches to complex problems start appearing. Could visual recognition problems be solved not by trying to find the features within the image that are relevant, but by finding the all the images from a vast collection (like Flickr) that approximately match the target, and then processing the metadata? Could a problem like robot navigation or the self-driving car be solved by taking a vast collection of human-guided trajectories and just picking the closest one second-by-second (perhaps this corpus could be gained from a game, or from monitoring a lot of car journeys)? Can we turn mathematical problems from manipulations of definitions into investigations driven by artificially created data (at least for a first cut)?

The possibilities appear endless.

Making (1)

Wednesday, November 27th, 2013

“What have you been doing?” “I’ve been Sugruing a duckhead”