Best of 2019: 60 Years of Progress in AI

New Zealand flatworm

New Zealand flatworm

[January 8, 2019] Today is the first day of CES 2019 and artificial intelligence (AI) “will pervade the show,” says Gary Shapiro, chief executive of the Consumer Technology Association. One hundred and thirty years ago today (January 8, 1889), Herman Hollerith was granted a patent titled “Art of Compiling Statistics.” The patent described a punched card tabulating machine which heralded the fruitful marriage of statistics and computer engineering—called “machine learning” since the late 1950s, and reincarnated today as “deep learning,” or more popularly as “artificial intelligence.”

Commemorating IBM’s 100th anniversary in 2011, The Economist wrote:

In 1886, Herman Hollerith, a statistician, started a business to rent out the tabulating machines he had originally invented for America’s census. Taking a page from train conductors, who then punched holes in tickets to denote passengers’ observable traits (e.g., that they were tall, or female) to prevent fraud, he developed a punch card that held a person’s data and an electric contraption to read it. The technology became the core of IBM’s business when it was incorporated as Computing Tabulating Recording Company (CTR) in 1911 after Hollerith’s firm merged with three others.

In his patent application, Hollerith explained the usefulness of his machine in the context of a population survey and the statistical analysis of what we now call “big data”:

The returns of a census contain the names of individuals and various data relating to such persons, as age, sex, race, nativity, nativity of father, nativity of mother, occupation, civil condition, etc. These facts or data I will for convenience call statistical items, from which items the various statistical tables are compiled. In such compilation the person is the unit, and the statistics are compiled according to single items or combinations of items… it may be required to know the numbers of persons engaged in certain occupations, classified according to sex, groups of ages, and certain nativities. In such cases persons are counted according to combinations of items. A method for compiling such statistics must be capable of counting or adding units according to single statistical items or combinations of such items. The labor and expense of such tallies, especially when counting combinations of items made by the usual methods, are very great.

In Before the Computer, James Cortada describes the results of the first large-scale machine learning project:

The U.S. Census of 1890… was a milestone in the history of modern data processing…. No other occurrence so clearly symbolized the start of the age of mechanized data handling…. Before the end of that year, [Hollerith’s] machines had tabulated all 62,622,250 souls in the United States. Use of his machines saved the bureau $5 million over manual methods while cutting sharply the time to do the job. Additional analysis of other variables with his machines meant that the Census of 1890 could be completed within two years, as opposed to nearly ten years taken for fewer data variables and a smaller population in the previous census.

But the efficient output of the machine was considered by some as “fake news.” In 1891, the Electrical Engineer reported (quoted in Patricia Cline Cohen’s A Calculating People):

The statement by Mr. Porter [the head of the Census Bureau, announcing the initial count of the 1890 census] that the population of this great republic was only 62,622,250 sent into spasms of indignation a great many people who had made up their minds that the dignity of the republic could only be supported on a total of 75,000,000. Hence there was a howl, not of “deep-mouthed welcome,” but of frantic disappointment.  And then the publication of the figures for New York! Rachel weeping for her lost children and refusing to be comforted was a mere puppet-show compared with some of our New York politicians over the strayed and stolen Manhattan Island citizens.

A century later, no matter how even more efficiently machines learned, they were still accused of creating and disseminating fake news. On March 24, 2011, the U.S. Census Bureau delivered “New York’s 2010 Census population totals, including first look at race and Hispanic origin data for legislative redistricting.” In response to the census data showing that New York has about 200,000 less people than originally thought, Senator Chuck Schumer said, “The Census Bureau has never known how to count urban populations and needs to go back to the drawing board. It strains credulity to believe that New York City has grown by only 167,000 people over the last decade.” Mayor Bloomberg called the numbers “totally incongruous” and Brooklyn borough president Marty Markowitz said “I know they made a big big mistake.” [The results of the 1990 census were also disappointing and were unsuccessfully challenged in court, according to the New York Times].

Complaints by politicians and other people have not slowed down the continuing advances in using computers in ingenious ways for increasingly sophisticated statistical analysis. In 1959, Arthur Samuel experimented with teaching computers how to beat humans in chess, calling his approach “machine learning.”

Later applied successfully to modern challenges such as spam filtering and fraud detection, the machine-learning approach relied on statistical procedures that found patterns in the data or classified the data into different buckets, allowing the computer to “learn” (e.g., optimize the performance—accuracy—of a certain task) and “predict” (e.g., classify or put in different buckets) the type of new data that is fed to it. Entrepreneurs such as Norman Nie (SPSS) and Jim Goodnight (SAS) accelerated the practical application of computational statistics by developing software programs that enabled the widespread use of machine learning and other sophisticated statistical analysis techniques.

In his 1959 paper, Samuel described machine learning as particularly suited for very specific tasks, in distinction to the “Neural-net approach,” which he thought could lead to the development of general-purpose leaning machines. The neural networks approach was inspired by a 1943 paper by Warren S. McCulloch and Walter Pitts in which they described networks of idealized and simplified artificial “neurons” and how they might perform simple logical functions, leading to the popular description of today’s neural networks as “mimicking the brain.”

Over the years, the popularity of “neural networks” have gone up and down a number of hype cycles, starting with the Perceptron, a 2-layer neural network that was considered by the US Navy to be “the embryo of an electronic computer that.. will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.” In addition to failing to meet these lofty expectations—similar in tone to today’s perceived threat of “super-intelligence”—neural networks suffered from a fierce competition from the academics who coined the term “artificial intelligence” in 1955 and preferred the manipulation of symbols rather than computational statistics as a sure path to creating a human-like machine.

It didn’t work and “AI Winter” set in. With the invention and successful application of “backpropagation” as a way to overcome the limitations of simple neural networks, statistical analysis was again on the ascendance, now cleverly labeled as “deep learning.” In Neural Networks and Statistical Models (1994), Warren Sarle explained to his worried and confused fellow statisticians that the ominous-sounding artificial neural networks

are nothing more than nonlinear regression and discriminant models that can be implemented with standard statistical software… like many statistical methods, [artificial neural networks] are capable of processing vast amounts of data and making predictions that are sometimes surprisingly accurate; this does not make them “intelligent” in the usual sense of the word. Artificial neural networks “learn” in much the same way that many statistical algorithms do estimation, but usually much more slowly than statistical algorithms. If artificial neural networks are intelligent, then many statistical methods must also be considered intelligent.

Sarle provided his colleagues with a handy dictionary translating the terms used by “neural engineers” to the language of statisticians (e.g., “features” are “variables”). In anticipation of today’s “data science” and predictions of algorithms replacing statisticians (and even scientists), Sarle reassured them that no “black box” can substitute for human intelligence:

Neural engineers want their networks to be black boxes requiring no human intervention—data in, predictions out. The marketing hype claims that neural networks can be used with no experience and automatically learn whatever is required; this, of course, is nonsense. Doing a simple linear regression requires a nontrivial amount of statistical expertise.

In his April 2018 congressional testimony, Mark Zuckerberg agreed that relying blindly on black boxes is not a good idea: “I don’t think that in 10 or 20 years, in the future that we all want to build, we want to end up with systems that people don’t understand how they’re making decisions.” Still, Zuckerberg used the aura, the enigma, the mystery that masks inconvenient truths, everything that has been associated with the hyped marriage of computers and statistical analysis, to ensure the public that the future will be great: “Over the long term, building AI tools is going to be the scalable way to identify and root out most of this harmful content.”

Facebook’s top AI researcher Yann LeCun is “less optimistic, and a lot less certain about how long it would take to improve AI tools.” In his assessment, “Our best AI systems have less common sense than a house cat.” An accurate description of today’s not very intelligent machines, and reminiscent of what Samuel said in his 1959 machine learning paper:

Warren S. McCulloch has compared the digital computer to the nervous system of a flatworm. To extend this comparison to the situation under discussion would be unfair to the worm since its nervous system is actually quite highly organized as compared to [the most advanced artificial neural networks of the day].

Over the past sixty years, artificial intelligence has advanced from being not as smart as a flatworm to having less common sense than a house cat.

Originally published on