The day the Hadoop bubble started to quiver

By GilPress
December 31st, 2014
- Big Data Bubble

After a bubble peaks, we look back at the year or two before the crash and say “what were we thinking?” The term originated with the South Sea Bubble which peaked in 1720 when the share price of the South Sea Company went up over the course of a single year from about £100 to £1000. “Among the many companies to go public in 1720,” Wikipedia quotes Charles Mackay’s Extraordinary Popular Delusions and the Madness of Crowds, “is—famously—one that advertised itself as ‘a company for carrying out an undertaking of great advantage, but nobody to know what it is.’” *

In a year or two we may look back at November 10, 2014 as the beginning of the end of the Hadoop Bubble. On that day, “Hortonworks, the big data processing platform built on top of the open source Apache Hadoop filed its S-1 paperwork… as its first step toward an initial public offering,” TechCrunch reported, adding: “In doing so it beat competitors Cloudera and MapR to the punch.”

Three-year-old Hortonworks has raised $248M in venture funding to-date and was most recently valued at $1 billion. Six-year-old Cloudera is valued at $4 billion and has raised $1.2 billion and five-year-old MapR has raised $174 million to-date. Earlier this year, Wikibon estimated Cloudera’s 2013 revenues at $73 million, growing 30% from $56 million (revised down from a previous estimate of $61 million) and MapR with 2013 revenues of $35 million, growing 52% from $23 million.

Wikibon estimated that Hortonworks had 2013 revenues of $55 million, tripling from $18 million in 2012. But unlike Cloudera and MapR, we now have official numbers from Hortonworks. Its S-1 filing does not give the calendar year revenues for 2012 and 2013, but it reports $4.8 million for the “eight months ended December 31” for 2012 and $17.9 million for the same period in 2013. It’s hard to believe that the company managed to collect $37 million in the first four months of 2013, so we can safely assume that Wikibon’s estimates were wide of the mark. Is this also true for Cloudera and MapR?

It’s possible that past estimates (fueled no doubt by vendor-generated hype) are not important and we should only look at this year’s actual numbers as a leading indicator of where the Hadoop market is heading. Hortonworks reports that for the first 9 months of 2013 it had revenues of $15.9 million, growing about 109% to $33.4 million for the same period in 2014.

So, is the future of Hadoop bright? Forrester Research certainly thinks so. It predicts that in 2015, Hadoop will become “a cornerstone of your business technology agenda.” Forrester does not mince words: “The jury is in. Hadoop has been found not guilty of being a hyped-up open source platform. Hadoop has proven re al value in any number of use cases… All these use cases are powered by what Forrester calls “Hadooponomics”—its ability to linearly scale both data storage and data processing and leverage pay-per-use public cloudonomics. Many enterprises are dabbling in Hadoop to see what it can do. Many already have mission-critical capabilities running on Hadoop… The remaining minority of dazed and confused CIOs will make Hadoop a priority for 2015.”

If they do, they are in for a few surprises about Hadoop’s deployment. Wikibon conducted a survey of 303 big data practitioners in May 2014 and found that 36% have already deployed Hadoop. (As this was a web survey that included only practitioners personally familiar with big data technologies, I assume that the percentage of actual Hadoop users in the larger population of enterprises is even smaller).

“The majority of respondents using Hadoop, 64%, are doing so in proof-of-concept environments,” Wikibon reports. What is preventing these projects from moving to full-scale production supporting mission-critical applications? The top barriers, according to Wikibon, are concerns about a lack of enterprise-grade backup and recovery in Hadoop (53%); concerns about a lack of enterprise-grade high-availability in Hadoop (48%); and concerns about maintaining performance at scale in Hadoop (45%).

As for the future fate of Hortonworks, Cloudera, and MapR, even Forrester’s enthusiasm for “Hadooponomics” does not necessarily bode well for them. If Hadooponomics indeed becomes the new economics of IT, I would assume that many IT vendors will get into their market, lured by the transformation of Hadoop into a “hot… multi-function, enterprise application platform.”

Indeed, Forrester predicts that a number of large enterprise IT vendors will create their own Hadoop distribution (similar to the current offerings from IBM and Pivotal). In addition, Forrester predicts that in 2015 we will see other Hadoop distributions from startups focused on the public and private clouds and on specific industries. Furthermore, “the market is primed for a bold, new startup to take on the entire Hadoop world by declaring that Apache Spark, not Hadoop, is the future of computing.”

To this we may add the following observations from the Wikibon report: “We also asked Hadoop practitioners about how they sourced the technology. Only 25% of Hadoop practitioners are paying customers of one or another Hadoop vendor. 24% use a free distribution provided by a vendor, but the majority, 51%, roll their own Hadoop downloaded from the Apache Software Foundation.”

Here are a few things to ponder when considering the potential success of the current leading Hadoop vendors and whether Hadoop in general is in the first stage of a rapid market expansion or the last stage of a bubble inflating:

You don’t make money from open source: “The only open source company that’s ever made money (at scale) is RedHat” says Dave Kellog, CEO of Host Analytics, quoting (in addition to his own analysis), Andreessen Horowitz partner Peter Levine on Why There Will Never Be Another RedHat.

Cloud-based Hadoop negates the need for “distribution”: “Enterprises that implement static on-premises [Hadoop] clusters,” says Forrester, “may suffer the cost of inefficiency if they run jobs sporadically or even on a regular basis but only for a fraction of the day. Hadoop cloud services offer enterprises a way to manage the resources much more efficiently.”

Established enterprise software vendors will incorporate Hadoop in their offerings, making it enterprise-ready: “All parties agree that Hadoop-based data management and governance solutions have a ways to go before they provide the functionality sophisticated enterprises expect from their app platforms” says Forrester. “We believe enterprise software vendors will beat the open source community to the punch.” Forrester also predicts that Hadoop will be added for free to Linux and Windows operating systems. “This would disrupt the existing model used by Hadoop distribution vendors to charge $2,000 to $3,000 per node per year for their current distributions.”

There will be no market for Hadoop services: Vendors supporting open-source software derive a substantial portion of their revenues from services, supporting users in navigating unfamiliar territory. Here too, Forrester has some bad news for Hadoop vendors: “The shortage of Hadoop skills will quickly disappear as enterprises turn to their existing application development teams to implement projects such as filling data lakes and developing MapReduce jobs using Java… CIOs won’t have to hire high-priced Hadoop consultants to get projects done.”

You can do big data without Hadoop: before the invention of Hadoop, companies such as Reuters and LexisNexis processed large amounts of unstructured data with their own, in-house developed software. Today, there are other alternatives to Hadoop, the most prominent (hyped?) being “in-memory” databases. There’s even “in-chip” technology from SiSense, which claims to be 10 times faster than in-memory.

Hadoop is so 2004 (at least at Google): At Google I/O 2014, Google announced: “A decade ago, Google invented MapReduce to process massive datasets using distributed computing. Since then, more devices and information require more capable analytics pipelines — though they are difficult to create and maintain. Today at Google I/O, we are demonstrating Google Cloud Dataflow for the first time. Cloud Dataflow is a fully managed service for creating data pipelines that ingest, transform and analyze data in both batch and streaming modes. Cloud Dataflow is a successor to MapReduce.”

Hadoop is an example of a much larger trend—information technology R&D has been largely done for the last ten years by the web-based “digital natives” such as Google, Facebook, Yahoo, Amazon and Netflix. Unlike traditional IT vendors, however, they have no “legacy” products to protect and they keep evolving their software with the rapidly changing requirements placed on them by their millions of users and new data formats and characteristics. And unlike traditional IT vendors, they are happy to share their innovative software with the world as open-source code.

Earlier this week, Hortonworks set its IPO price, filing a $78 million IPO with 6 million shares at a price range of $12-14 per share. TechCrunch: “At the midpoint of the proposed range, Hortonworks would command a fully diluted market value of $659 million, Renaissance Capital reported. It’s worth noting that the amount seems low when compared to the valuation reported just last March when the company raised $100 million on $1 billion valuation.”

Hortonworks rush to IPO at what could be reduced valuation may be an attempt to get a willing buyer (HP?) to jump in with an offer to buy the company. As Dave Kellog notes, a number of startups selling open-source software had successful exists in the past.

When the price of the South Sea Company’s shares reached £1,000 in early August 1720, Charles Mackay wrote in Extraordinary Popular Delusions and the Madness of Crowds, “The bubble was then full-blown and began to quiver and shake preparatory to its bursting.”

Update: Hortonworks’ shares rose 65% in their public trading debut on December 12, 2014, valuing the company at $1.1 billion, Bloomberg reports. The IPO price valued Hortonworks at about 60 times sales for its 2013 fiscal year, which ended in April 2013, compared with 2.06 for IBM and 4.97 for Oracle, according to data compiled by Bloomberg. Red Hat Inc., another publicly traded open-source technology company, trades at a sales multiple of 7.3.

* I believe that the attribution in the Wikipedia article is incorrect and that the correct reference is Edward Chancellor’s The Devil Takes the Hindmost: A History of Financial Speculation.

[Originally published on Forbes.com]

Last updated on December 31st, 2014.