Big Data Self-Delusion

Big data human faceThe most compelling story told in the new documentary “The Human Face of Big Data” (PBS, February 24), is about the collection and analysis of data to predict the onset of potentially deadly infection in premature babies.

By the time these babies are physically showing signs of infection they are very unwell, a condition a caregiver could not predict by looking at their chart, where once an hour their vital signs are recorded.  “What shocked me was the amount of data loss,” says Dr. Carolyn McGregor of the University of Ontario. The solution was Project Artemis in which computers collect all relevant data continuously and watch for certain changes in vital signs. “If something starts to go wrong with that baby we have the ability to intervene [before physical symptoms appear],” says Dr. McGregor.

It’s a story of how more data may lead to better outcomes, in this case even save lives. The term “Big Data” has come to represent in recent years this promise, a potential that can only be realized if we clearly establish what we want to achieve by collecting more data and why more data is better than less data in each particular case.

Unfortunately, in our technology-obsessed world, new technologies and new technology applications tend sometimes to become buzzwords that are hyped, celebrated and often discussed irresponsibly by technology vendors and the media. Unfortunately, “The Human Face of Big Data” by and large falls into this trap, the fascination (self-delusion?) with the idea of we are living a momentous time in history thanks to technology. Going beyond “big data,” it is a paean to information technology and computerization, as Jay Walker of TEDMED declares in the film:

Billions and billions of people who have been excluded from the discussion, who couldn’t afford to step into the world of being connected, step into the world of information, learn things they could never learn, are suddenly on the network… Suddenly the world has a lot more minds connected in the simplest, least expensive possible way to make the world better… I don’t think there’s any question that we are at a moment in human history that we will look back on in fifty or a hundred years and we’ll say right around 2000 or so it all changed.

I’ll go out on a limb and venture to predict that a hundred years from now, the time around 2000 will be marked by Americans’ loss of the security they have enjoyed since the end of the Second World War, not by the rise of the Internet.  And it will be clear to most observers, as it is clear to many today, that some of the additional minds that are now connected to the Internet, do not see it as a tool “to make the world better” (they may see it that way but I’m guessing Walker probably doesn’t agree with their definition of a better world).

“The Human Face of Big Data” demonstrates that giving more people access to the Internet does not automatically include them in “the discussion.” China has more people connected to the Internet than any other country, but there is no one from China among the two dozen “experts” identified by name in the film—all are based in the U.S.  No one from Russia, India, Japan, Brazil—countries where one may find talking heads or, even better, data scientists, that may represent a different point of view about the role of technology, the Internet, and big data. It would have enriched this documentary tremendously if we heard their take on the pros and cons of big data, how they define it, what it means to them, and what specific types of data collection and analysis will make a difference in their countries. (In line with Anil Dash’s response to Mark Zuckerberg’s post regarding the recent decision by India’s telecom regulator: “What about pausing the Internet Basics effort and spending some time on a real effort to listen to Indian voices about what would help them have connectivity on their own terms, in a way they find acceptable?”).

The lack of diversity in the voices and opinions heard in the film, its relentless emphasis on accentuating the positive and the speculative—the two segments discussing the negative aspects of big data last a total of 7 minutes—is particularly astonishing given that there has been no shortage of intelligent discussions of the potential pitfalls in the rush to collect and analyze data.

Take, for example, Kate Crawford’s list of myths associated with big data, which includes the belief that bigger data is always better data,  that correlation is as good as causation, that big data eliminates biases, and that it doesn’t invade our privacy. Instead of using these and similar objections to prompt a rich discussion and debate, the documentary—promoted by PBS as an examination of “the promises and perils of this unstoppable force”—deals only with the issue of privacy, a post-Snowden requirement.

Here and there, the documentary almost gets into what could have been turned into an engaging and educational discussion of big data, only to stop in its tracks for fear of losing its shiny, sunny, positive packaging.

One example is the discussion of Google Flu Trends which “accurately predicts flu outbreaks up to two weeks before the CDC,” based on flu-related Google searches. To its credit, the documentary then shows Stephen Downs of the Robert Wood Johnson Foundation talking about the “flip side,” the time Google Flu Trends ”got it way wrong,” because media coverage of the flu season got people to search for “flu” even if they were not sick. But then the film moves on to the next topic. A missed opportunity to talk about what has Googled learned from its failure and the dangers of blind faith in big data and algorithms, to say nothing about raising the question of how “world changing” is finding out about flu outbreaks 2 weeks before the federal government and whether it justifies the generalized claim we hear from Rick Smolan that “now we can see in real-time what’s going on and respond to it.”

Another example of a missed opportunity for an intelligent discussion is when we hear from Tim O’Reilly “I am optimistic but not blindly foolishly optimistic. Remember, the financial crisis was brought to us by big data people.” Finally, you hope, we are going to get into an interesting discussion of the empirical—what has actually happened and why, not what may happen—and practical perils of big data. But you quickly find out that (at least in this regard) you are foolishly optimistic because all we get are platitudes such as “we have to earn our future… we have to make the right choices.”

The missed opportunities are compounded by outright fiction. Here is some of the data we discover in this documentary about big data:

  • All the data processing we did in the last 2 years is more than all the data processing we did in the last three thousand years;
  • We are now being exposed to as much information in a single day as our 15th century ancestors were exposed to in their entire lifetime;
  • Every two days the human race is now generating as much data as was generated from the dawn of humanity through the year 2003 (from the PBS website).

Really? What big data time-machine tells us exactly how much “information” and “data” and “data processing” there was in the last 3000 years or the 15th century or at the dawn of humanity?

The documentary provides a definition of big data, something that is often missing from discussions of the topic. While poetic, it is quite meaningless:  Big data is a nervous system for the planet. This global definition leads to discussions in the film that have more to do with the Internet than with big data.

For example, in the segment titled “Data: The future of revolution,” Joi Ito talks about how the “Arab Spring” started with a photo shared on Facebook and then picked up by Al Jazeera and broadcast on TV as an example of linking activists, social media and mainstream media. “Technology has fundamentally changed the way people interact with everything,” says Ito.  If big data is the planet’s nervous system, than every interaction is big data. QED.

In the same segment, Ito also comments “that’s one of the challenges of big data—it has so much opportunity for both good and also for screwing up our system.” But it is not clear (at least to this viewer) why he says that in this context.  As with the other voices we hear in the documentary, there may have been something else there that got lost in the editing process. The impression the film makers wanted to leave with the viewer is summarized by John Battle and quoted in the press release: “The era of Big Data is an important inflection point in human history and represents a critical moment in our civilization’s development.”

The theme of we-are-living-in-a-historic-moment-because-of-technology-and-we-have-to-make-critical-decisions-because-it-may-turn-negative-but-let’s-accentuate-the-positive has been the hallmark of technology talk for a while, moving rapidly from one hype cycle to the next, with little connection to reality (big data has already been eclipsed as the buzzword of the day by the Internet of Things, Artificial Intelligence, and Virtual Reality). There’s no escape from this escapist, technology-centric, US-centric myth-making, shared and promoted by the global chattering classes. Here’s danah boyd reporting on last month’s meeting of the World Economic Forum in Davos, Switzerland:

I started to sense that what the tech sector was doing at Davos was putting on the happy smiling blinky story that they’ve been telling for so long, exuding a narrative of progress: everything that is happening, everything that is coming, is good for society, at least in the long run.

Shifting from “big data,” because it’s become code for “big brother,” tech deployed the language of “artificial intelligence” to mean all things tech, knowing full well that decades of Hollywood hype would prompt critics to ask about killer robots. So, weirdly enough, it was usually the tech actors who brought up killer robots, if only to encourage attendees not to think about them.

Not only did any nuance get lost in this conversation, but so did the messy reality of doing tech. It’s hard to explain to political actors why, just because tech can (poorly) target advertising, this doesn’t mean that it can find someone who is trying to recruit for ISIS. Just because advances in AI-driven computer vision are enabling new image detection capabilities, this doesn’t mean that precision medicine is around the corner. And no one seemed to realize that artificial intelligence in this context is just another word for “big data.” Ah, the hype cycle.

It’s going to be a complicated year geopolitically and economically. Somewhere deep down, everyone seemed to realize that. But somehow, it was easier to engage around the magnificent dreams of science fiction. And I was disappointed to watch as tech folks fueled that fire with narratives of tech that drive enthusiasm for it but are so disconnected from reality as to be a distraction on a global stage.

Similarly, veteran tech observer Steven Levy says that the virtual and augmented reality demos at TED 2016 were redundant because “At TED, you are already immersed in a kind of artificial reality.” Is there a tech backlash brewing? Are we finally going to have more sober and multi-dimensional discussions of technology?

I don’t think so, I don’t think we (especially in the U.S.) will let go of soothing escapism. Expect to see in a few years, when we will already move to other buzzwords, a PBS documentary titled “The Human Face of Artificial Intelligence.”

4 Responses to Big Data Self-Delusion

  2. Tim Wessels says:

    Well, the tech industry is notorious for breathing its own exhaus,t and its pronouncements around big data are true to form. Yeah, machines and people do generate a lot of data, and now that we have the means to store it, there will be efforts to analyze it or mine it for various purposes, some good and some bad.

    You can be sure that if you partake in social media, and buy stuff on-line, the businesses involved will use your data to get you to provide more personal data about yourself or buy more of whatever it is they are selling. This is how Google, facebook and Amazon have made their fortunes. Practically the only way to avoid this is to never use social media and pay by cash or check for everything you buy, which is admittedly more cumbersome than clicking on a screen or sliding a card through a reader.

    The data generated by machines apparently now surpasses the data generated by human beings. The analysis of this form of big data could be useful to people responsible for the safety of machines, like airplanes, trains and elevators. Pretty boring stuff, but potentially valuable, especially if there are serious liabilities for damage when a machine fails.

    Then there is the “spying” big data collection that goes with the use of cameras on public streets, and the NSA gathering all of the phone numbers you make calls to. This has perhaps been the most abusive and threatening aspect of big data. You can probably make a public safety argument for cameras on public streets, but not for phone calls you make unless a judge has specifically issued a warrant for the calls you, as an individual, are making. The warrant-less, bulk collection of this so-called phone metadata is a violation of the 4th Amendment, but our political system and the courts have been weak defenders of personal freedom until Mr. Snowden enlightened a few of them. The jury is still out on how this data should be disposed and who should hold it until it is disposed.

    The writer is correct that big data is not a unmitigated blessing, and the tech industry does us no service by failing to honestly face the personal, social and political issues that surround it.


