Big Data Quotes of the Week: August 10, 2012

“With big data, you have only two concerns, but they are, naturally, big ones: where the data will come from and what your company will do with it. Solve these and you have big data licked… IT projects have to be fully buzzword-compliant or they’ll fail. For a big data project, this means Hadoop. If you don’t want to invest staff time and energy learning this technology, do what my client did: Build a virtual server, install MySQL on it, and assign the name “Hadoop” to the server. When your BDSC (big data steering committee) asks if you’ve installed Hadoop, you can answer in the affirmative with a clear conscience”—Bob Lewis   

“We’re in the middle of a Big Data and Hadoop hype cycle, and it’s time for the Big Data bubble to burst… Bursting the Big Data bubble starts with appreciating certain nuances about its products and patterns–Stefan Groschupf, CEO, Datameer

“The disruptor du décade is called Big Data and it involves the collection, slicing and dicing of fragments of information that can be rapidly assembled to identify subtle macro trends or create actionable profiles that precisely target unique individuals”–Alan D. Mutter

“We believe that Big Data, like the PC revolution of the ’80s, the emergence of the Internet in the ’90s, and Web 2.0 in the 2000s, represents a several-hundred-billion-dollar wealth creation opportunity”—DCVC

“It’s hard to generalize from this data, as it call came within the context of a platform. You can think of Kaggle as a fisherman that has gradually invested in better technology and better bait. Over time, more data scientists and more companies have been ‘caught,’ but whether that reflects the better bait, or the fact that the number of fish in the ocean is increasing, it is hard to say.

So what can we conclude? A few key items jump off the page:

  1. People will enter the field of data science, but only if they can find something interesting/rewarding to work on. We see a lot of active unique entrants in a few competitions that have low barriers to entry or offer commensurately high rewards. We also see a rising amount of new users surrounding particularly interesting competitions.
  2. Problems that are less exciting, or perhaps less accessible, may need to be reformulated to appeal to the mainstream data community, and crossovers from other fields. If a company wants to attract high quality talent, they need to interest and engage them. We see a lot of competitions get very little traction.
  3. The amount of new users on Kaggle seems fairly steady. This may indicate that demand may soon outstrip supply, as more competitions are run without a commensurate increase in the number of participants, but it does seem like the number of participants and competition count is pretty correlated.

The fact that there is a constant stream of new users is also encouraging, because, anecdotally, most people in the data community heard about Kaggle months ago. This indicates that both existing data scientists are always looking for interesting problems to tackle, and that new people are moving into data science as they see interesting problems.

  1. Corporate interest in data science overall seems to be increasing more quickly than the supply of new data scientists”–Vik Paruchuri