Big Data Quotes: Data Science is Like Crack

Data_center“…data science is more like crack. Businesses want more not when it’s held entirely out of their reach but when they’ve been allowed to successfully smoke a little. Then they’re hooked… Data scientists’ job security is bound up in industry perceptions of the value of data science… Let’s go ahead and find ways to get people hooked before they move on to the next fad”—John Foreman

“Messy, ambiguous data and hidden biases underscore a growing need to hire and train data vigilantes to watch over and ask ‘why?’ about our every interpretation from big data. Big data kitsch promotes a world of blissful ignorance in its focus on correlation without explanation. But the data vigilantes do need to understand ‘why,’ sometimes to debug a spurious correlation or systemic failure (like we saw with Google Flu Trends), and other times to be able to develop a smarter method to measure the thing that we really want to measure.

It can be tempting to use data as a crutch in decision-making: ‘The data says so!’ But sometimes the data lets us down and that exciting correlation you found is just a by-product of a messy, biased sample. More advanced algorithms can sometimes help cut through the mess and correct the sample, and smart skeptics can help step back, reflect, and ask if what the data is ‘saying’ actually fits with what you know and expect about the world. Hiring and training these data vigilanties as well as inculcating a healthy dose of data skepticism throughout your culture and team can only help bolster the quality of decisions you ultimately make”–Nicholas Diakopoulos

“Data, big or small, is only as useful as the questions you ask it.

Big data dulls the strategic senses. It produces a myopia, or ‘tyranny of small decisions,’ where responding to the short-term pressures that ultra-fine data produces lays the groundwork for much bigger problems later. It shows you the missiles, and lets you ignore the greater historical reality. That’s how Ford, GM and Chrysler got into trouble beginning in the 1970s: they obsessively watched each other instead of the truly strategic threats of Toyota and Nissan. All the data in the world doesn’t help if the right questions aren’t asked, and big data does not generate such questions, or even contribute to their formulation. One of the biggest data sets in the world is that which is available about US public companies financial performance: if big data in itself is an unalloyed blessing, how does one explain the uneven performance of fund managers and stock pickers?

Nevertheless, organizations keep accumulating data and there is no denying that it can be useful. But how? As Peter Drucker remarked, “Executives who make effective decisions know that one does not start with facts. One starts with opinions… To get the facts first is impossible. There are no facts unless one has a criterion of relevance.” Hence while sometime data analysis can highlight some patterns, it is essential to question and make explicit the core assumptions held by analysts and managers, because these assumptions will drive the search for and analysis of data. And these assumptions are a reflection of our personality, biases and beliefs. Put otherwise, how you see the world will determine the questions you ask and those you don’t ask, and therefore, what blindsides your organization, no matter how much ‘big data’ you accumulate.

What does that mean? Certainly big data is useful when it is non-ambiguous: Web site connection logs for fraud detection, gene sequencing, sales statistics, etc. But when you deal with social facts such as trends, new markets, disruptions and socio-political change, don’t get caught in the lamppost syndrome. Big data will keep your engineers busy, and make IT vendors rich, but because data is always backward looking, it will not deliver a vision of the future”–Milo Jones and Philippe Silberzahn