In today’s New York Times, Steve Lohr surveys the rise of big data as a term and as a marketing tool, from “the confines of technology” to the mainstream. “The Big Data story is the making of a meme,” says Lohr, “and two vital ingredients seem to be at work here. The first is that the term itself is not too technical, yet is catchy and vaguely evocative. The second is that behind the term is an evolving set of technologies with great promise, and some pitfalls.”  

Over at Barron’s, Tiernan Ray writes: “One of my responsibilities as a Barron’s tech columnist is to be the keeper of buzzwords. Buzzwords are the fuel for elaborate stock theses on Wall Street, and they can drive the interest in stocks sometimes more than, say, revenue and earnings. The buzzword du jour is ‘big data,’ the information that piles up in the databases of companies everywhere and that those companies would like to manage and understand better.”

Big data has certainly become a mainstream meme in 2012. Does this mean an opportunity for IT professionals, the people that manage “the information that piles up”? Or will it become a threat, just as the other recent technology-related meme, “cloud computing,” promised to drive IT into obsolescence as companies would rely (so the argument went) more and more on outside (“Public Cloud”) IT services?

It seems to me that big data represents the ultimate manifestation of a new facet of yet another meme, albeit one limited to IT circles and the “trade press,” the “consumerization of IT.” The term is widely used to describe “the growing tendency for new information technology to emerge first in the consumer market,” and only later to be adopted, sometimes reluctantly, by enterprises and their IT functions.

But we see now, with the mainstream press discussing cloud computing and big data, that it can also work in the opposite direction–what used to be the preoccupation of a select group of people managing “data centers,” has become a topic of discussion for everybody. The work of IT, if not IT itself and IT professionals, has been in the limelight like never before in recent years.  Words like “gigabyte” or even “petabyte,” previously uttered only by a few “information handlers,” are now understood, more or less, by anyone with a computer and/or access to the Internet (note that Lohr didn’t bother to explain “exabytes, zettabytes and yottabytes,” a common practice in the mainstream press not so long ago). “Backup” used to indicate “move backwards” or “traffic congestion ahead.” These are not anymore the first possible meanings which come to mind for people not working in IT or in the IT industry.

IT is in the limelight now, and it could use this big opportunity to demonstrate leadership that is based on deep experience with and understanding of what data, big or small, is all about—its management, its analysis, its productive use. Or maybe not.

I would like to offer a few scenarios for the impact of big data on IT or what could be IT’s impact on big data. These scenarios focus not on the technologies of big data, but on the people at the vanguard of using them for the benefit of individuals, enterprises, and society—data scientists. This new breed of data handlers is more important than the exciting big data technologies they use because their success and usefulness will determine whether the big data meme will outlast the typical 2-year half-life of technology-related memes. (I would argue that it’s highly probable that even if they succeed in proving the usefulness of big data, the big data meme will eventually give way in a few years to new memes. But I would also argue that their new discipline, data science, buttressed by professional certification and university-based training, will survive for a very long time).

The question, then, boils down to the relationships between data scientists and IT. Here are five scenarios for how this relationship could evolve:

  1. IT will continue to play a supporting, infrastructure-related role and will not get involved with data science. Data scientists will work in their own, advanced R&D-type function, reporting to a chief strategy officer, chief technology or research officer, or even the CEO. Tom Davenport has argued in support of this scenario here and here. Or see my interview with Mok Oh for how PayPal’s data scientists work in an organization that is not part of IT.
  2. IT will hire and train data scientists that will work in collaboration with data scientists in the enterprise’s business units. See my interview with EMC’s CIO for a description of how it’s done there.
  3. IT will move beyond being just the custodian of the data to becoming the key function responsible for leading the data-driven transformation of the enterprise, with its data scientists leading the charge. See Beth Schultz’s spirited defense of IT. I called this scenario “IT is the new Intel Inside.”
  4. The question is irrelevant, because IT will be absorbed by marketing. With so much of the data coming from outside of the enterprise and used to manage both customer-related and product-related activities, this will be the ultimate incarnation of the “consumerization of IT.”  See here for a prediction that CMOs will spend on IT more than CIOs by 2017.
  5. The question is irrelevant, because data science will evolve to become a service-only business, provided to enterprises by companies with large teams of data scientists and access to vast stores of public data and/or proprietary data collections. This scenario may become the new “IT doesn’t matter.”

“Rising piles of data have long been a challenge,” says Steve Lohr. Indeed, my own surveys of the history of big data and the history of data science found that the term “data science” has been first used in the 1960s (and in its current meaning at least a decade ago). The term “big data” has been used in computer science circles already in the late 1990s, in the context of the large amounts of data generated by computer visualization.

What we see today is the continuation of what began six decades ago and was summarized eloquently in a 2008 paper cited by Lohr, written by three prominent computer scientists who called big data “the biggest innovation in computing in the last decade. We have only begun to see its potential to collect, organize, and process data in all walks of life.”

The same could be said about any other previous stage in the evolution of IT, a constant progress that could not have happened without IT professionals, developing and maintaining the foundation for innovative and productive uses of data. So what will IT do with big data? Lead, follow, or get out of the way?