Best of 2019: The Misleading Language of Artificial Intelligence


[September 27, 2019]

Language is imprecise, vague, context-specific, sentence-structure-dependent, full of fifty shades of gray (or grey). It’s what we use to describe progress in artificial intelligence, in improving computers’ performance in tasks such as accurately identifying images or translating between languages or answering questions. Unfortunately, vague or misleading terms can lead to inaccurate and misleading news.

Earlier this month we learned from the New York Times that “…in just the past several months researchers have made significant progress in developing A.I. that can understand languages and mimic the logic and decision-making of humans.” The NYT article reported on “A Breakthrough for A.I. Technology” with the release of a paper by a team of researchers at the Allen Institute for Artificial Intelligence (AI2), summarizing their work on Aristo, a question answering system. While 3 years ago the best AI system scored 59.3% on an eight-grade science exam challenge, Aristo recently correctly answered more than 90% of the non-diagram, multiple choice questions on an eighth-grade science exam and exceeded 83% on a 12th-grade science exam.

No doubt this a remarkable and rapid progress for the AI sub-field of or Natural Language Understanding (NLU) or more specifically, as the AI2 paper states, “machine understanding of textbooks…a grand AI challenge that dates back to the ’70s.” But does Aristo really “reads,” “understands” and “reasons” as one may understand from the language used in the paper and similar NLU papers?

“If I could go back to 1956 [when the field of AI was launched], I would choose a different terminology,” says Oren Etzioni, CEO of AI2. Labeling “anthropomorphizing” this “unfortunate history,” Etzioni clearly states his position about the language of AI researchers:

“When we use these human terms in the context of machines that’s a huge potential for misunderstanding. The fact of the matter is that currently machines don’t understand, they don’t learn, they aren’t intelligent—in the human sense… I think we are creating savants, really really good at some narrow task, whether it’s NLP or playing GO, but that doesn’t mean they understand much of anything.”

Still, “human terms,” misleading or not, is what we have to describe what AI programs do, and Etzioni argues that “if you look at some of the questions that a human would have to reason his or her way to answer, you start to see that these techniques are doing some kind of rudimentary form of reasoning, a surprising amount of rudimentary reasoning.”

The AI2 paper elaborates further on the question “to what extent is Aristo reasoning to answer questions?” While stating that currently “we do not have a sufficiently fine-grained notion of reasoning to answer this question precisely,” it points to a recent shift in the understanding by AI researchers of “reasoning” with the advent of deep learning and “machines performing challenging tasks using neural architectures rather than explicit representation languages.”

Similar to what has happened recently in other AI sub-fields, question answering has gotten a remarkable boost with deep learning, applying statistical analysis to very large data sets, finding hidden correlations and patterns, and leading to surprising results, described sometimes in misleading terms.

What current AI technology does is “sophisticated pattern-matching, not what I would call ‘understanding’ or ‘reasoning,’” says TJ Hazen, Senior Principal Research Manager at Microsoft Research.* Deep learning techniques, says Hazen, “can learn really sophisticated things from examples. They do an incredible job of learning specific tasks, but they really don’t understand what they’re learning.”

What deep learning and its hierarchical layers of complex calculations, plus lots of data and compute power, brought to NLU (and other AI specialties) is unprecedented level of efficiencies in designing models that “understand” the task at hand (e.g., answering a specific question). Machine learning used to require deep domain knowledge and a deep investment of time and effort in coming up with what its practitioners call “features,” the key elements of the model (called “variables” in traditional statistical analysis—professional jargon being yet another challenge for both human and machine language understanding). By adding more layers (steps) to the learning process and using vast quantities of data, deep learning has taken on more of the model design work.

“Deep learning figures out what are the most salient features,” says Hazen. “But it is also constrained by the quality and sophistication of the data. If you only give it simple examples, it’s only going to learn simple strategies.”

AI researchers, at Microsoft, AI2, and other research centers, are aware of deep learning’s limitations when compared with human intelligence, and most of their current work, while keeping within the deep learning paradigm, is aimed at addressing these limitations. “In the next year or two,” says Etzioni, “we are going to see more systems that work not just on one dataset or benchmark but on ten or twenty and they are able to learn from one and transfer to another, simultaneously.”

Jingjing Liu, Principal Research Manager at Microsoft Research also highlights the challenge of “transfer learning” or “domain adaptation,” warning about the hype regarding specific AI programs’ “human parity.” Unlike humans that transfer knowledge acquired in performing one task to a new one, a deep learning model “might perform poorly on a new unseen dataset or it may require a lot of additional labeled data in a new domain to perform well,” says Liu. “That’s why we’re looking into unsupervised domain adaptation, aiming to generalize pre-trained models from a source domain to a new target domain with minimum data.”

Real-world examples, challenges, and constraints help researchers address the limitations of deep learning and offer AI solutions to specific business problems. A company may want to use a question answering system to help employees find what they need in a long and complex operations manual or a travel policy document.

Typically, observes Hazen, the solution is a FAQ document, yet another document to wade through. “Right now, most enterprise search mechanisms are pretty poor at this kind of tasks,” says Hazen. “They don’t have the click-through info that Google or Bing have. That’s where we can add value.” To deploy a general-purpose “reading comprehension” model in a specific business setting, however, requires successful “transfer learning,” adapting the model to work with hundreds of company-specific examples, not tens of thousands or even millions of examples.

Microsoft researchers encounter these real-world challenges when they respond to requests from Microsoft’s customers. A research institute such as AI2 does not have customers so it created a unique channel for its researchers to interact with real-world challenges, the AI2 Incubator, inviting technologists and entrepreneurs to establish their startups with the help of AI2 resources. is one of these startups, offering NLU software that organizes and reads contracts, and extracts the specific terms employees need for their work.

Unfortunately, human ambition (hubris?) hasn’t stopped at solving specific human challenges as sufficient motivation for AI research. Achieving “human-level intelligence” has been the ultimate goal for AI research for more than six decades. Indeed, it has been an unfortunate history, as a misleading goal has led to misleading terms which in turn lead to unfounded excitement and anxiety.

Fortunately, many AI researchers continue to expand what computers could do in the service of humanity. Says TJ Hazen: “I prefer to think about the work I’m doing as something that will help you do a task but it may not be able to do the full task for you. It’s an aid and not a replacement for your own capabilities.” And Oren Etzioni: “My favorite definition of AI is to think of it as Augmented Intelligence. I’m interested in building tools that help people be much more effective.”

*Opinions expressed by Microsoft’s researchers do not necessarily represent Microsoft’s positions.

Originally published on