Organizing the world’s information, one reference at a time


Desk Set, 1957

In his book Weaving the Web, Tim Berners-Lee writes:

I was excited about escaping from the straightjacket of hierarchical documentation systems…. By being able to reference everything with equal ease, the web could also represent associations between things that might seem unrelated but for some reason did actually share a relationship. This is something the brain can do easily, spontaneously. … The research community has used links between paper documents for ages: Tables of content, indexes, bibliographies and reference sections… On the Web… scientists could escape from the sequential organization of each paper and bibliography, to pick and choose a path of references that served their own interest.

With this one imaginative leap, Berners-Lee moved beyond a major stumbling block for all previous information retrieval systems: The pre-defined classification system at their core. This insight was so counter-intuitive that even during the early years of the Web, attempts were made to do just that: To classify (and organize in pre-defined taxonomies) all the information on the Web.

Google’s founders were the first to seize on Berners-Lee’s insight and build their information retrieval business on tracking closely cross-references (i.e., links between pages) as they were happening and correlate relevance with quantity of cross-references (i.e., popularity of pages as judged by how many other pages linked to them). This was what set Google apart from its competitors (Yahoo had a Chief Ontologist on staff).

Berners-Lee’s insight is frequently linked to Vannevar Bush who wrote in 1945, “Our ineptitude at getting at the record is largely caused by the artificiality of systems of indexing… Selection [i.e., information retrieval] by association, rather than by indexing may yet be mechanized.”  But I prefer to start the history of the Web (and organizing information) with what was, to my knowledge, the earliest use of cross-references.

This was Ephraim Chambers’ Cyclopaedia, published in London in 1728. While lacking the worldwide platform for “crowd-sourcing” references that Berners-Lee invented, Chambers shared with him (and Bush) a dislike for hierarchical, alphabetical, indexing systems. Here’s how Chambers explained in the Preface his innovative system of cross-references:

Former lexicographers have not attempted anything like Structure in their Works; nor seem to have been aware that a dictionary was in some measure capable of the Advantages of a continued Discourse. Accordingly, we see nothing like a Whole in what they have done…. This we endeavoured to attain, by considering the several Matters [i.e., topics] not only absolutely and independently, as to what they are in themselves; but also relatively, or as they respect each other. They are both treated as so many Wholes, and so many Parts of some greater Whole; their Connexion with which is pointed out by a Reference. So that by a Course of References, from Generals to Particulars; from Premises to Conclusions; from a Cause to Effect; and vice versa, i.e., in one word, from more to less complex, and from less to more: A Communication is opened between the several parts of the Work; and the several Articles are in some measure replaced in their natural Order of Science, out of which the Technical or Alphabetical one had remov’d them.

Chambers’ Cyclopaedia was the earliest attempt to link by association all the articles in an Encyclopedia or, in more general terms, of everything we know at a given point in time. And like the World Wide Web, it moved some people to voice their concern about what Google is doing to our brains. The supplement to the 1758 edition of the Cyclopaedia says:

Some few however condemn the use of all such dictionaries, on the first pretence, that, by lessening the difficulties of attaining knowledge, they abate our diligence in the pursuit of it; and by dazzling our eyes with superficial shew, seduce us from digging solid riches in the mine itself.

The fear of what tools for organizing information could do to our thinking (and livelihood) was renewed many-fold with the advent of modern computers. “They can’t build a machine to do our job; there are too many cross-references in this place,” says the head librarian (Katharine Hepburn) to her anxious colleagues in the research department when a “methods engineer” (Spencer Tracy) is hired to “improve workman-hour relationship” in a large corporation. By the end of the film, Desk Set (released in 1957), she proves her point by winning, not only the engineer’s heart, but also a contest with the ominous looking “Electronic Brain” (aka Computer).