I disrupt, you disrupt, we all disrupt

From the blurb to the book No Ordinary Disruption: The Four Global Forces Breaking All the Trends

Based on years of research by the directors of the McKinsey Global Institute, No Ordinary Disruption: The Four Forces Breaking all the Trends is a timely and important analysis of how we need to reset our intuition as a result of four forces colliding and transforming the global economy: the rise of emerging markets, the accelerating impact of technology on the natural forces of market competition, an aging world population, and accelerating flows of trade, capital and people.

Gradually evolving and highly predictable demographic trends are now a “disruption”? You can’t sell a business book without having a disruptive title?

Posted in Predictions, Stats | Leave a comment

The Rise of the Outsiders and the Future of the IT Industry

NotYourFatherYou probably heard the news about the CFO of Morgan Stanley becoming the new Google CFO and the on-going migration of Wall Street bankers to Silicon Valley (here and here). But did you know that this is just a surface manifestation of an underlying secular trend, nothing less than a new restructuring of the IT industry and the emergence of completely new developers and sellers of information technology products and services?

You could argue that at any point in time in the last half a century you could say about the fast-changing  IT industry that it is not your father’s IT industry. But remarkably, the structure of the IT industry has been very stable, with only one major redrawing of its landscape throughout its history. Today, the “IT industry” is being redefined with two major clusters of companies increasingly competing with the incumbents and reshaping the old IT world order.

First, what did your father’s IT industry look like? Until about 1990, the industry was dominated by IBM and a few smaller companies (e.g., DEC, HP, Wang, Prime). These companies were offering one-stop-shopping, providing to enterprises the entire IT stack, from hardware platforms (including proprietary chips) to peripherals to operating systems to networking to applications and to services.

The advent of the PC and, in particular, the networking of PCs in the 1980s, gave rise to a new tidal wave of digital data. As a result, between 1990 and 2000, the industry’s structure has expanded to include large vendors focused on one layer of the IT stack: Intel in semi-conductors, EMC in storage, Cisco in networking, Microsoft in operating systems, Oracle in databases. IBM saved itself from the fate of DEC, Wang, and Prime by focusing on services. DEC could have saved itself if it realized in time the value of focusing on its biggest competitive advantage—networking—instead of letting a number of focused players enter this market (which Cisco eventually dominated). At the same time, a number of focused PC vendors, primarily Compaq and Dell, carved a larger role for themselves by gradually expanding their reach into other layers of the IT stack, emulating the old model of a vertically-integrated IT vendor. By and large, they were less successful than the new, horizontally-focused IT vendors.

The restructured IT industry, and specifically, the focused, “best-in-class” vendors answered a pressing business need. Digitization and the rapid growth of data unleashed new business requirements and opportunities that called for new ways to sell and buy IT. The new competitive and business pressures to keep more and more data online and for longer duration, to mind and mine the data, to share and move it around, all contributed to the demand for a flexible IT infrastructure where buyers assemble together the pieces of their IT infrastructure from different vendors. Most important, in the late 1980s and early 1990s, businesses fundamentally changed their attitude towards and the scope of what they did with data: From primarily an internal, back-office bookkeeping, “how did we do last quarter?” focus on the past, to external, customer-, partner-, and supplier-oriented data creation, collection, and mining, with a focus on the present and “let’s understand how to serve our stakeholders better.”

Now, how is today’s or tomorrow’s IT industry different from what it was just fifteen years ago? We still have more or less the same dominant players—IBM, Cisco, EMC, Oracle, HP, a reinvigorated Dell, etc. If there has been any marked change, it has been a shift back to a vertically-integrated model, with all of these players providing the whole IT stack (or what they call, with their insatiable appetite for new buzzwords, “converged infrastructures”).

I would argue that this “back to the future” model is on its last legs and the real future belongs to two clusters of companies: Digital natives (e.g., Google, Facebook, Amazon) and large, IT-intensive enterprises (e.g., financial services companies).

In the mid-2000s, a new business need emerged: “Let’s make sense of the mountains of data we continue to accumulate,” focusing on the future and the mining of data to understand better the likely results of different courses of actions and decisions. Unlike the previous shift in the fortunes of the IT industry, the companies driving this shift were not old or new “IT vendors.” They were the companies born on the Web, the digital natives, with data (and ever more data) at the core of everything they did. Google, Facebook, Amazon, Netflix—all built from scratch their IT infrastructure to their own specifications, inventing and re-imagining in the process the entire IT stack. They were the first outsiders insisting that IT was such a core competency for them that they could not trust it to “IT vendors” and could do a more efficient and effective job on their own, thank you.

Google and Amazon later became insiders by offering their infrastructure as a service to small and large businesses. But other companies born on the Web, such as Netflix, simply continued to amass unparalleled knowledge about state-of-the-art IT, develop innovative IT solutions which they then made available to the world as open source software, and hire (or train) the best and the brightest IT professionals. This became a new IT ecosystem, a whole new world of IT far surpassing what was happening in the “IT industry” on any measure of innovation and competitiveness.

Some of the large enterprises which used to be the most important customers of the traditional IT vendors are now joining this new IT ecosystem, reinforcing the reinvention of how IT is developed, sold and bought.  Consider these recent news items:

  • Santander is the first global bank to offer cloud data storage services to corporate customers. “As I think how I am going to compete with all these new technology players, I can offer the same services as some of these big guys,” Santander’s chairman, Ana Botín told the Financial Times.
  • By 2018, Bank of America plans to have 80% of its workloads running on software-defined infrastructure inspired by Web companies, a process that began in 2013. “The transformation at Bank of America reflects the migration of state-of-the art information technology developed by Internet companies into the broader economy,” reports the Wall Street Journal.
  • Facebook announced a new type of server, a collaboration with Intel based on its own design which it hopes other companies will adopt as well. The announcement was one of many at a recent gathering of the Open Compute Project, a nonprofit group formed by Facebook in 2011 to adapt principles of open-source software to hardware. “Members develop and share designs for servers, networking gear and storage devices that any company can build and sell, creating competition that helps hold down hardware costs,” reports the Wall Street Journal.
  • Fidelity Investments reconfigured its data centers to better fit its business needs, engaging its engineering team in redesigning a revolutionary new rack, and reducing energy consumption by 20%. The announcement from the Open Compute Project reads in part: “Fidelity’s Enterprise Infrastructure team also wanted to transform the way its members worked. Instead of maintaining a closed shop, the team was looking to open up and engage with an external community of engineers as a way to keep up with the latest developments and innovate in-house… The Open Bridge Rack is a convertible datacenter rack that can physically adjust to hold any size server up to 23 inches across. It’s a key link in helping enterprises make the switch from proprietary gear to Open Compute… Fidelity designed (patent pending) and donated it to the Open Compute Foundation, making it available to different manufacturers.”
  • Apple has acquired speedy and scalable NoSQL database company FoundationDB, TechCrunch reports. With this acquisition, Apple aims to bolster the reliability and speed of its cloud services, and possibly provide video streaming for its rumored TV service.

The new “IT vendors” are companies that see IT not only as an integral part of their business strategy but go even further to view IT—and the data they collect and mine–as what their company is all about. This new breed of IT vendors have emerged over the last decade from the ranks of digital natives and they are joined now by large companies that want to get further return—and possibly new sources of revenues—from their large investments in IT. Furthermore, just like the digital natives, these large established companies don’t want to be beholden anymore to the lock-in tactics of traditional IT vendors.

The size of the IT industry worldwide in 2010 was about $1.5 trillion. Gartner predicts that the industry (including telecom) will grow to $3.8 trillion this year. The real “IT industry,” however, is much larger than that, given all the IT-related activities happening outside the traditional boundaries of the industry. And if you include in the “IT industry” everything that’s being digitized—content, communications, consumer electronics, commerce—we are looking at an industry that will grow to at least $20 trillion by 2020.  In this largely expanded industry there will probably be still room for the traditional IT players. But the growth spurts, innovation, and new skills will come from today’s outsiders, from the companies whose core competency is the collection, processing, analysis, distribution, and use of data.

 Originally published on Forbes.com

Posted in IT industry | Leave a comment

The History and Impact of the Hashtag

Posted in Social Media | Leave a comment

Data Science at Zillow (Slideshare)

Posted in Data Science, Data Science Careers, Data Scientists | Leave a comment

Bruce Schneier’s Must-Read Book on Security and Privacy

Data and Goliath_978-0-393-24481-6“The surveillance society snuck up on us,” says Bruce Schneier in Data and Goliath: The Hidden Battles to Capture Your Data and Control Your World. It’s a thought-provoking, absorbing, and comprehensive guide to our new big data world. Most important, it’s a call for a serious discussion and urgent action to stop the harms caused by the mass collection and mining of data by governments and corporations. To paraphrase Schneier’s position on anonymity—we either need to develop more robust techniques for preserving our freedom, or give up on the idea entirely.

An expert on computer security, Schneier has written over a dozen books in the last 20 years on the subject, some highly technical, but this one is a call to action addressed to a mainstream audience. The impetus for writing such a book, it seems, were the 2013 revelations of the NSA mass surveillance. Schneier worked with The Guardian’s Glenn Greenwald, helping in the analysis of some of the more technical documents that were leaked by Edward Snowden.

Schneier divides his guide to our big data world into three parts. The first covers the surveillance society: The massive amounts of data about ourselves we generate when we use computers, what governments and corporations do with this data separately and together, and the important difference between targeted and mass surveillance. The second part of the book is about the damage caused by government and corporate surveillance, including economic damage to U.S. businesses, and how the actions that are meant to protect us actually degrade privacy and security. The last part consists of a list of principles “to guide our thinking,” policy recommendations regarding government and corporate surveillance, and prescriptions for defensive behavior by individuals, ending with a general discussion of the trade-offs between big data’s value to society and its misuse and abuse.

“I’m not, and this book is not, anti-technology,” says Schneier. Declaring himself not even being anti-surveillance, he suggests that we need to design new ways for the NSA to perform its job while protecting our privacy. From that position, he proceeds to debunk and clarify some of the myths and misinformation spread by defenders of surveillance and to alert us to the consequences of our relaxed attitude towards what is being done with our data.

Corporate and government interests—and hunger for data—have converged in our time, Schneier argues. In exchange for free services from corporations and for protection from terrorists, we’ve agreed to mass surveillance. The convergence of government and corporate interests is now amplified by politicians who are lured by big data’s promises of targeted messaging and effective get-out-the vote campaigns.

The problem is that mass surveillance doesn’t work as advertised, in both the public and private sectors. Schneier: “There’s no actual proof of any real successes against terrorism as a result of mass surveillance and significant evidence of harm” and, on targeted advertising, “what’s unclear is how much more data helps.” The NSA simply tapped into the “massive Internet eavesdropping system” already built by corporations, but failed to see that it’s a very ineffective way to catch terrorists.

“Data mining works best when you are searching for a well-defined profile, when there are a reasonable number of events per year, and when the cost of false alarms is low,” says Schneier. Alas, terrorists do not have a common profile (see U.S. Authorities Struggle to Find a Pattern Among Aspiring Islamic State Members), each attack is unique, and terrorists would do their best to avoid detection. “When you are watching everything, you are not seeing anything,” concludes Schneier.

Mass surveillance by the NSA is not only ineffective, it also ensures reduced security and loss of privacy. All computer users today use basically the same hardware and software and when the NSA hacks into any of the components of the global computer network, it makes it more vulnerable. Schneier: “Because we all use the same products, technologies, protocols, and standards, we either make it easier for everyone to spy on everyone, or harder for anyone to spy on anyone.”

Data and Goliath is also a comprehensive guide to what’s to be done about our data. Here’s a sample of recommendations: Apply the same transparency principles that traditionally have governed law enforcement in the U.S. to national security; make government officials personally responsible for illegal behavior; overturn the “antiquated” third-party doctrine, recognizing that our information is our property and not the property of the service provider; reduce the NSA’s funding to pre-9/11 levels; establish an independent U.S. data protection agency; make intelligence-related whistleblowing a legal defense in the U.S.; block mass surveillance by encrypting your hard drive, chats, email, everything; and engage in the political process by noticing and talking about surveillance and by “giving copies of this book to all your friends as gifts.”

That the book is a great gift to one and all has already been recognized by many readers as evident by the fact that it has made the New York Times Best Sellers list. But will it manage to make a dent in the complacency of the American public? Will it motivate all of us to do the work we need to do to stop mass surveillance?

Recent results of a Pew Research Center survey titled “Americans’ Privacy Strategies Post-Snowden” are not very encouraging. Almost nine-in-ten respondents say they have heard at least a bit about the government surveillance programs to monitor phone use and internet use (31% say they have heard a lot). But only 30% of all adults have taken at least one step to hide or shield their information from mass surveillance and many are not aware of the commonly available tools that could make their online activities more private.

While 57% say it is unacceptable for the government to monitor the communications of U.S. citizens,  majorities support monitoring those particular individuals who use words like “explosives” and “automatic weapons” in their search engine queries (65% say that) and those who visit anti-American websites (67% say that). 46% describe themselves as “not very concerned” or “not at all concerned” about the surveillance. The lack of concern about mass surveillance is even more pronounced when people are asked about “electronic surveillance in various parts of their digital lives.”

I’m a good example of the general apathy about what is being done with our data. I’ve never bothered to look at the notices I often receive from banks and credit cards regarding their privacy policies. It so happened that I received one while reading Data and Goliath, a notice that I’m sure is a typical example of the privacy policies of many financial institutions as it simply follows what is allowed by U.S. federal law. The “privacy policy” (a more apt title would be “we do basically whatever we want to do with your personal information”) is outrageous in its entirety but this statement takes the cake: “When you are no longer our customer, we continue to share your information as described in this notice.”

In addition to demonstrating how many U.S. businesses couldn’t care less about their customers, these types of privacy policy notices—coming from staid, pre-Internet financial institutions—also show that while the scale of mass surveillance has reached unprecedented levels today, our indifference to what’s being done with our data has remained stable—and widespread—for ages. U.S. corporations have been legally collecting and sharing our data long before Google appeared on the scene and our data has been a fountain of enthusiasm for government officials and business executives for a long time. Here’s what Arthur R. Miller wrote in his 1971 The Assault on Privacy:

Too many information handlers seem to measure a man by the number of bits of storage capacity his dossier will occupy… The new information technologies seem to have given birth to a new social virus – ‘data mania.’ Its symptoms are shortness of breath and heart palpitations when contemplating a new computer application, a feeling of possessiveness about information and a deep resentment toward those who won’t yield it, a delusion that all information handlers can walk on water, and a highly advanced case of antistigmatism that prevents the affected victim from perceiving anything but the intrinsic value of data.

Today’s “data mania” is called big data. Schneier’s book helped convince me that the first step, the first line of defense against the data deluge drowning our freedoms is to expose, explain and eradicate from public discourse the false tenets of big data religion. These include the incredible effectiveness of lots and lots of data, machines are better than humans in making data-driven decisions, let the data ask the questions, sampling is so 19th century, and privacy is dead, get over it. After 9/11, the NSA converted to the big data religion and went after “the whole haystack” because it provided a comforting set of rituals, I mean, action plans (see here and here for some of my previous discussions of big data religion).

Schneier devotes the last pages of his book to “the big data trade-off” which he calls “the fundamental issue of the information age”: How do make use of our data to benefit society as a whole while at the same time protecting our privacy. “Our data has enormous value when we put it all together,” says Schneier. But that flies in the face of his well-argued contentions that more data does not lead to better outcomes either with targeted advertising or protecting us from terrorists.

What he is talking about is the promise of big data, our hopes that more data, lots more data, will improve our lives. But why trade our security and privacy—and our present freedoms—for an unproven promise of some vague future benefit?

“I don’t think anyone can comprehend how much humanity will benefit from putting all our health data in a single database and letting researchers access it,” says Schneier. Indeed, it sounds plausible, at least intuitively, that big data could serve as a remedy to what ails us. I remember that growing up with a father who was a physician in private practice, I often thought about the wasted valuable data about his patients he meticulously recorded in his index card file, data that was never shared and analyzed with other physicians’ data. (Obviously, I started the big data conversation, at least with myself, very early on).

Data has facilitated progress in medicine, science, and other areas of inquiry and practice. But the collection and analysis of data has evolved in tandem with the development of tools that help ensure that only non-biased data is collected and that our questions drive this data collection (as opposed to the data driving our questions). The hype surrounding the availability of data generated by our wearables and its presumed big promise for healthcare, for example, almost ensures that medical enquiry will forgo some of its critical and proven foundations: carefully designed samples, control groups, longitudinal studies, etc.

I used to be a data optimist. Here’s what I wrote in Big Data is Neither an Atomic Bomb Nor a Holy Grail: “Decisions based on non-biased data are almost always better than decisions that are not based on data. That’s the promise of big data or data analysis. No need to exaggerate its potential… Better focus on small steps where the collection and analysis of data measurably and demonstrably lead to better allocation of resources and improved quality of life.”

Bruce Schneier has made me a data radical. I don’t believe in trade-offs anymore, I’m firmly convinced that the risks associated with mass surveillance far outweigh any potential big data benefits. Instead of believing that decisions based on data are almost always better than decisions that are not based on data, I must admit now that decisions based on data are frequently dangerous, disruptive, ineffective and just plain stupid.

Don’t fall for the “promise” of big data. Better focus on the present and start getting our freedoms back, first and foremost by finally making our data our property. Occupy Data, anyone?

Originally published on Forbes.com

Posted in Privacy, Security | Leave a comment

Profiles in Data: 262 Female Writers on Data Science and Big Data Analytics

 

Meta S. Brown: “Don’t miss out on half the talent in the analytics field! When you’re looking for analytics talent, look for women.”

Posted in Big Data Analytics, Big Data Practice, Data Science | Leave a comment

The Happiness Statistics (Infographic)

Posted in Stats | 1 Comment