Best of 2019: Big Data AI


[July 1, 2019]

In December 2014, I asked whether we were at the beginning of “the end of the Hadoop bubble.” I kept updating my Hadoop bubble watch (here and here) through the much-hyped IPOs of Hortonworks and Cloudera. The question was whether an open-source distributed storage technology which Google invented (and quickly replaced with better tools) could survive as a business proposition at a time when enterprises have moved rapidly to adopting the cloud and “AI”—advanced machine learning or deep learning.

In January 2019, perennially unprofitable Hortonworks closed an all-stock $5.2 billion merger with Cloudera. In May 2019, another Hadoop-based provider, MapR, announced that it would shut down if it were unable to find a buyer or a new source of funding. On June 6, 2019, Cloudera’s stock declined 43% after it cut its revenue forecast and announced that its CEO is leaving the company. Valued at $4.1 billion in 2014, Cloudera’s current market cap is $1.4 billion.

Is this just the end of Hadoop or is it the death of Big Data? Was our fascination with lots and lots of data only a temporary bubble?

The news last month were not all negative for the Data is Eating the World phenomenon. Google announced its intent to acquire data discovery and analytics startup Looker for $2.6 billion and Salesforce announced its intent to acquire data visualization and analytics leader Tableau for $15.7 billion.

“The addition of Looker to Google Cloud,” said an Alphabet press release, “will provide customers with a more comprehensive analytics solution — from ingesting and integrating data to gain insights, to embedded analytics and visualizations — enabling enterprises to leverage the power of analytics, machine learning and AI.” The Google Cloud blog explained that “A fundamental requirement for organizations wanting to transform themselves digitally is the need to store, manage, and analyze large quantities of data from a variety of sources… The addition of Looker to Google Cloud will help us offer customers a more complete analytics solution from ingesting data to visualizing results and integrating data and insights into their daily workflows.”

Digital transformation is finding out what data can do to your business decisions and actions. It’s focusing your company on mining and benefiting from its second-most important resource after its people: Data. While digital-born, Web-native, data-driven companies such as Google and Salesforce have been doing this for twenty years, many other businesses around the world, large and small, are now in full digital transformation mode, exploring the power of data eating the world. In the process, they tap into IT resources and data science tools in the cloud and experiment with advanced machine learning or deep learning. The remarkable and rapid progress in computer vision and natural language processing capabilities over the last 7 years has been enabled by big data—lots of tagged and labeled online data. Deep learning is Big Data AI.

Here’s what two CEOs of startups providing data mining services have to say about where we are in the evolution of Big Data to Big Data AI:

“The value of the data analytics market can’t be ignored. The Looker and Tableau acquisitions demonstrate that even the biggest tech players are snapping up data analytics companies with big price tags, clearly demonstrating the value these companies have in the larger cloud ecosystem. And in terms of what this means for the evolution of AI, we’ve reached a point where we have more than enough anonymized data to train the system, and now it’s a matter of honing how we use the AI to extract the maximum value from data”—Amir Orad, CEO, Sisense

“The Google Cloud/Looker and Salesforce/Tableau acquisitions are a direct reaction to the rate at which analytics workloads have been shifting to the cloud over the past few years. The state of AI is a reflection of this shift as machine learning, AI and analytics have become the primary growth opportunities for the cloud today. Yet, it’s this same growth that is causing barrier to success as AI project overwhelming face the same problem — data quality”—Adam Wilson, CEO, Trifacta

Sisense is a business intelligence startup providing “a complete solution for preparing, analyzing and visualizing big data.” It has raised $174 million over 5 rounds and in May 2019, it acquired Periscope Data. Trifacta has raised $124.3 million over 6 rounds and is focused on data preparation. It announced today a partnership with IBM to develop a new data preparation tool.

A search for “big data” in the Crunchbase database results in close to 15,000 entries. A search for “AI” results in close to 12,000 entries. There is probably a huge overlap between those two categories. And the real-world overlap will only intensify in the near future.

How many of the hundreds of the “big data” startups will merge with one another or be acquired by established data-driven companies as “big data” evolves into “big data AI”?

Originally published on