Big Data Observations: The Science of Asking Questions

“I am a firm believer that without speculation there is no good and original observation”—Charles Darwin

“It is the theory that determines what we can observe”—Albert Einstein

“I suspect, however, like as it is happening in many academic fields, the NSA is sorely tempted by all the data at its fingertips and is adjusting its methods to the data rather than to its research questions. That’s called looking for your keys under the light”—Zeynep Tufekci

“Large open-access data sets offer unprecedented opportunities for scientific discovery—the current global collapse of bee and frog populations are classic examples. However, we must resist the temptation to do science backwards by posing questions after, rather than before, data analysis. A scant understanding of the context in which data sets were collected can lead to poorly framed questions and results, and to conclusions that are plain wrong. Scientists intending to make use of large composite data sets need to work closely with those responsible for gathering the data. Standard scientific principles and practice then demand that they first frame the important questions, then design and execute the data analyses needed to answer them”—David B. Lindenmayer and Gene E. Likens

“The wonderful thing about being a data scientist is that I get all of the credibility of genuine science, with none of the irritating peer review or reproducibility worries… I thought I was publishing an entertaining view of some data I’d extracted, but it was treated like a scientific study… I’ve enjoyed publishing a lot of data-driven stories since then, but I’ve never ceased to be disturbed at how the inclusion of numbers and the mention of large data sets numbs criticism”—Pete Warden