Big Data Is Neither An Atomic Bomb Nor A Holy Grail

grail-and-bombAlbert-László Barabási, a physicist and well-known expert in network analysis, published last week an op-ed calling for his fellow scientists to spearhead “the ethical use of big data.” Barabási accuses the NSA of breaking “the traditional model governing the bond between science and society” and argues that big data is like other dual-use “breakthrough technologies” such as atomic energy and genetic engineering. “Powered by the right type of big data, data mining is a weapon,” he says. “It can be just as harmful, with long-term toxicity, as an atomic bomb.”

Drawing an analogy with the established model of nuclear nonproliferation, Barabási argues that “science can counteract spying overreach by developing tools and technologies” that lock in the principles of data ownership (control of personal data by the individual) and safe passage (protection of data transmission).

Sounds like a timely call-to-arms. And I like Barabási’s observation that “identifying terrorist intent is more difficult than finding a needle in a haystack—it’s more like spotting a particular blade of hay.” As I wrote before in this column, the first question about the NSA activities should be the question of its effectiveness.

Unfortunately, Barabási’s good intentions are mired by the contribution of his op-ed to the proliferation of certain obstinate myths about big data. It is far less powerful than he would like us to believe. Moreover, big data, the NSA overreach, and privacy concerns have nothing to do with science.

“Big Data is the Holy Grail: It promises to unearth the mathematical laws that govern society at large,” says Barabási. The claim that big data is going to change our poor state of knowledge of the “laws” that govern society, mathematical or otherwise, is simply astounding given the examples he provides: “My team confirmed the promise of Big Data by quantifying the predictability of our daily patterns, the threat digital viruses pose to mobile phones and even the reaction people have when a bomb goes off beside them.” These are the findings that lead us to the promised “laws”? Why say “confirmed” before you deliver?

The obvious answer is that the people that support academic research fall for the “promise,” especially when it is accompanied by dazzling mathematics.  No wonder that the good people of the NSA were also dazzled by the “promise” of big data. Or as Barabási says, apparently not seeing the connection to his own promises: “A political leadership, intoxicated by the power of these tools, failed to keep their use within the strict limits of the Constitution.”

Exactly. What the NSA is doing is in the realm of politics, of the rights of U.S. citizens and the obligations of their government. It has nothing to do with science and even less with big data—U.S. government surveillance overreach has happened before computers and (later) Hadoop were invented.

But Barabási chastises “scientists whose work fueled these advances” (that facilitated the NSA’s comprehensive data collection) because they “failed to forcefully articulate the collateral dangers their tools pose.” I’m sure that the engineers at Google and Cloudera and many startups and those who joined the NSA to further develop the tools in-house are happy to be called “scientists.” But software development or data cleaning or data analysis are not science.

Many scientists today use these tools to help them answer questions that will help us better understand empirical reality, just as scientists have done, without these tools, for four centuries. The progress of science has been driven by scientists’ imagination, not the amount of data—big or small—at their disposal. This is why the 2013 Nobel Prize in Physics was awarded to the two scientists who theorized in 1964 the existence of a special kind of particle, the Higgs particle, and not to CERN which confirmed, thanks to its big data processing capabilities, the scientists’ imaginative speculation almost half a century later.

No harm is done when software engineers and/or statisticians call themselves “data scientists,” just like there is no harm in engineers calling themselves “computer scientists.” But we should not be carried away by the words (or buzzwords) we use and understand the distinction between a scientific activity and a very valuable, but non-scientific, activity. The confusion could lead, as in this case, to falsely accusing “science” of aiding and abetting government overreach, the type of accusation that may eventually help fray “the bond between science and society.”

Politics is also where the question of privacy concerns lie. If we don’t like big data or if we want to defend its beneficial applications, it’s up to all of us to do something about it, as we have done in other society-wide debates and decisions regarding what we can and cannot do and what organizations serving us can or cannot do. Who knows, we may even go through a “data collection prohibition” period and then repeal it.

That the private is political is made clear by the sometimes inane reactions to the supposed “violations” of privacy by scientists. The work of Barabási demonstrates this well. He and his colleagues bent over backwards to anonymize the identity of the people whose cell phone records they analyzed to come up with the aforementioned discovery of the “predictability of our daily patterns” or, more specifically, that people don’t travel that much. It’s obvious to any rational observer, I think, that the researchers were genuinely driven by their passion for finding the “laws” that govern us and (just maybe) by the desire to obtain the next research grant, not by any evil intention to find out the travel habits of specific individuals.

But this was the reaction of bioethicist Arthur Caplan at the University of Pennsylvania: “There is plenty going on here that sets off ethical alarm bells about privacy and trustworthiness.” The 2008 story from which this quote was taken led with “Researchers secretly tracked the locations of 100,000 people outside the United States through their cellphone use… The first-of-its-kind study… raises privacy and ethical questions for its monitoring methods, which would be illegal in the United States.”

Some call it double and triple-blind research, others call it “monitoring.” That’s the nature of political, non-scientific, debate. Privacy concerns will be resolved (or not) in the messy political domain. Which means we should try to have an intelligent debate about what is done with our personal data, and specifically, what is done with it after we agree to give up our ownership of the data when we sign up for free online services. The group that should lead this discussion—and hopefully come up with workable solutions—are not scientists or regulators or data scientists, but the service providers, i.e., Google, Amazon, Facebook, etc. It’s in their long-term business benefit, the best motivation in our market-driven world.

Decisions based on non-biased data are almost always better than decisions that are not based on data. That’s the promise of big data or data analysis. No need to exaggerate its potential or attribute to scientists powers and intentions that are beyond them. Better focus on small steps where the collection and analysis of data measurably and demonstrably lead to better allocation of resources and improved quality of life.

[Originally published on]