Paul McFedries in IEEE Spectrum:
When Gartner released its annual Hype Cycle for Emerging Technologies for 2014, it was interesting to note that big data was now located on the downslope from the “Peak of Inflated Expectations,” while the Internet of Things (often shortened to IoT) was right at the peak, and data science was on the upslope. This felt intuitively right. First, although big data—those massive amounts of information that require special techniques to store, search, and analyze—remains a thriving and much-discussed area, it’s no longer the new kid on the data block. Second, everyone expects that the data sets generated by the Internet of Things will be even more impressive than today’s big-data collections. And third, collecting data is one significant challenge, but analyzing and extracting knowledge from it is quite another, and the purview of data science.
Just how much information are we talking about here? Estimates vary widely, but big-data buffs sometimes speak of storage in units of brontobytes, a term that appears to be based on brontosaurus, one of the largest creatures ever to rattle the Earth. That tells you we’re dealing with a big number, but just how much data could reside in a brontobyte? I could tell you that it’s 1,000 yottabytes, but that likely won’t help. Instead, think of a terabyte, which these days represents an average-size hard drive. Well, you would need 1,000,000,000,000,000 (a thousand trillion) of them to fill a brontobyte. Oh, and for the record, yes, there’s an even larger unit tossed around by big-data mavens: the geopbyte, which is 1,000 brontobytes. Whatever the term, we’re really dealing in hellabytes, that is, a helluva lot of data.
Wrangling even petabyte-size data sets (a petabyte is 1,000 terabytes) and data lakes (data stored and readily accessible in its pure, unprocessed state) are tasks for professionals, so not only are listings for big-data-related jobs thick on the ground but the job titles themselves now display a pleasing variety: companies are looking for data architects (specialists in building data models), data custodians and data stewards (who manage data sources), data visualizers (who can translate data into visual form), data change agents and data explorers (who change how a company does business based on analyzing company data), and even data frackers (who use enhanced or hidden measures to extract or obtain data).
But it’s not just data professionals who are taking advantage of Brobdingnagian data sets to get ahead. Nowhere is that more evident than in the news, where a new type of journalism has emerged that uses statistics, programming, and other digital data and tools to produce or shape news stories. This data journalism (or data-driven journalism) is exemplified by Nate Silver’s FiveThirtyEight site, a wildly popular exercise in precision journalism and computer-assisted reporting (or CAR).
And everyone, professional and amateur alike, no longer has the luxury of dealing with just “big” data. Now there is also thick data (which combines both quantitative and qualitative analysis), long data (which extends back in time hundreds or thousands of years), hot data (which is used constantly, meaning it must be easily and quickly accessible), and cold data (which is used relatively infrequently, so it can be less readily available).
In the 1980s we were told we needed cultural literacy. Perhaps now we need big-data literacy, not necessarily to become proficient in analyzing large data sets but to become aware of how our everyday actions—our small data—contribute to many different big-data sets and what impact that might have on our privacy and security. Let’s learn how to become custodians of our own data.