Jim Kaskade’s crystal ball shared with him recently a list of upcoming big data acquisitions: EMC will buy MapR, Oracle will buy Cloudera, and Teradata will buy Hortonworks. Kaskade, the newly-appointed CEO of Infochimps, believes CIOs are ready to embrace open source big data software and that the established IT players, lacking open source experience, will have to buy their way into the market.
In a wide-ranging talk with Kaskade, I got the impression that this could also be the planned exit for Infochimps and the main reason for putting Kaskade in charge of communicating the value of what Infochimps offers to Fortune 1000 enterprises and its unique position in the rapidly evolving big data landscape.
Infochimps opened for business in 2009 as a data marketplace but earlier this year it started to sell what it had developed for its own use, a platform for doing big data. Says Kaskade: “I just love companies who build technologies for particular use cases that are actually tackling key problems that a broader audience has.”
It was fun talking to Kaskade, because he has the kind of perspective I like, that of someone who has been there before. As an engineer with Teradata in the 1990s, he witnessed first-hand what I call the Small Big-Data Bang and as a result, can draw interesting parallels with today’s Big Big-Data Bang. The explosion of data in the early 1990s, according to Kaskade, drove the need for “a single point of truth,” the one version of the data everybody in the enterprise could rely on. Once all enterprise data was consolidated in a single data warehouse, executives could see, for the first time, what was happening throughout the enterprise. They could ask questions across geographies or query the data across business lines. Today, says Kaskade, “we are asking the same business questions, we are just saying ‘what if I add in 5 to 10 more data elements and get more intelligent answers.’ We are not talking about a new architecture, we are just talking about putting more data to work.”
With all the new data sources, again there is a need for “a single point of truth.” The key difference is that the old-new architecture, based on open source software, has dramatically lowered the cost per terabyte at the time when businesses would like to tap into and analyze many more terabytes of new data. Kaskade claims that “every one of the CIOs I’ve met [recently] were very comfortable now with open source technology” and he seems to be supported by a recent survey and the actions of a number of big company IT executives as reported by CIO Journal. (Quotable quote from Bill Ruh of GE’s new global software center: “Our goal is not to use open source. Our goal is to be able to develop applications in a three-to-five month time frame”).
But adoption of open source software does not necessarily mean adoption of open source-based big data technologies. Still, Kaskade is unfazed: “Hadoop is far from perfect but it has so much momentum that it’s like SQL was back in the day.” He sees three trends at work in support of the big momentum:
- The democratization of computing—“’I can’t compute this question’ is something you are no longer going to hear from people”;
- The democratization of access to data—“bridging the gap between the data and the business analyst. The only reason we have this is because Yahoo wanted to have a sandbox so all their people could come up with proposals for improving the business”; and
- The democratization of sophisticated analytics—“let’s get rid of the data scientists.”
Get rid of the data scientists? “The politically correct way to say it,” says Kaskade, “is that I will turn your business users and application developers into data scientists. Put the business users and application developers in touch with the data and make it so simple that they don’t need other people to get the job done.”
Infochimps wants to make it simple, and most important, safe for enterprises to experiment with big data and discover new insights. Their secure, cloud-based, platform-as-a-service is what allows them “to go after the Fortune 1000 as opposed to the Fortune 20,000.” And Kaskade is confident about what makes them special: “No one is looking at this opportunity in a managed private cloud setting like we do. We’re ahead of others in thinking about how truly to build this out for the enterprise.”
Pingback: EMC will buy MapR, Oracle will buy Cloudera and Teradata will buy Hortonworks… | Data Cheeky