Big Data Bytes of the Week: What’s a Data Scientist?

What’s a Data Scientist? Joshua Konkle, Vice President at DCIG, quoted (scroll down) a few definitions earlier this week:

I’m a big fan of Hilary Mason, chief scientist at, so I’ll cite her definition: a data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Daniel Tunkelang, Principal Data Scientist at LinkedIn
By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo – starry eyed explorers and skeptical detectives. Monica Rogati, Senior Data Scientist, LinkedIn

A data scientist is someone who analyzes an organization’s big data to discover actionable trends that lead to business results. Data scientists look at what questions business people need to ask to remain competitive. They work directly with C-level executives, advising them on how to drive maximum value from big data and integrate new information. In many ways, a data scientist serves as a change agent in today’s workforce, pushing organizational collaboration and information integration. Anjul Bhambhri , IBM’s vice president of Big Data Products

More definitions were offered in National Public Radio’s Yuki Noguchi 2-part report on “Following Digital Breadcrumbs To ‘Big Data’ Gold” and “The Search For Analysts To Make Sense Of ‘Big Data’.  A good data scientist, according to U of Washington Oren Etzioni, “can write algorithms that filter data, understand what they’re telling you, and then graphically represent the information. The end result is like getting a bird’s-eye view of a vast territory of information.” Greylock Partners’ DJ Patil boils it down to one trait: “An intense curiosity to understand what’s behind the data.”

Patil says his successful recruits have included an oceanographer and a neurosurgeon, as well as people who barely graduated from high school but were brilliant at math. He approaches math majors the way baseball scouts look for young stars. “I have a list that I track. There’s one student who I think is phenomenal; I’ve been tracking him since he was 16.”

That data scientist in the making is Dylan Field, now 19 and a junior at Brown University. His goal? To start a big data company.

The non-start-ups, the large companies trying to mine the benefits of the big data wave, not only have to compete hard for data scientists, but also to make sure that they have clean data to work with. According to Gartner (quoted here), more than 35 per cent of companies’ IT budgets will be outside of the IT department’s control in 2012 and beyond. “Increasing volumes of data, while a boon in terms of customer insight and engagement, will create challenges around managing and interpreting that information. As budgetary control shifts, there is a concern that the IT department will no long be able to guarantee just how accurate or consistent that data is.”

Well, I always said consistency is over-rated. And besides, that data scientist you just hired, may turn any data, accurate or otherwise, into music. That’s what Domenico Vicinanza did at SC11, turning big data into a concert of electronic music.