Michael Jordan on the coming big data winter and the state of machine learning

michaelijordanGreat Interview in IEEE Spectrum with machine learning expert, UC Berkeley Professor, and IEEE Fellow Michael Jordan:

“…people continue to infer… that deep learning is taking advantage of an understanding of how the brain processes information, learns, makes decisions, or copes with large amounts of data. And that is just patently false.”

“There is progress at the very lowest levels of neuroscience. But for issues of higher cognition—how we perceive, how we remember, how we act—we have no idea how neurons are storing information, how they are computing, what the rules are, what the algorithms are, what the representations are, and the like. So we are not yet in an era in which we can be using an understanding of the brain to guide us in the construction of intelligent systems.”

“…with big data, it will take decades, I suspect, to get a real engineering approach, so that you can say with some assurance that you are giving out reasonable answers and are quantifying the likelihood of errors.”

“The main [adverse consequences  if we remain on the big data trajectory we are on] will be a ‘big-data winter.’ After a bubble, when people invested and a lot of companies overpromised without providing serious analysis, it will bust. And soon, in a two- to five-year span, people will say, “The whole big-data thing came and went. It died. It was wrong.” I am predicting that. It’s what happens in these cycles when there is too much hype, i.e., assertions not based on an understanding of what the real problems are or on an understanding that solving the problems will take decades, that we will make steady progress but that we haven’t had a major leap in technical progress. And then there will be a period during which it will be very hard to get resources to do data analysis. The field will continue to go forward, because it’s real, and it’s needed. But the backlash will hurt a large number of important projects.”

Note that Jordan took issue with the title and the lead-in to the IEEE Spectrum article.

More on what needs to be done to avoid a big data winter is in Jordan’s Reddit AMA and in the Frontiers in Massive Data Analysis report from the US National Research Council’s Committee on the Analysis of Massive Data (which Jordan chaired).


About these ads
Posted in Big Data Backlash, Data Science | 1 Comment

How Data Travels Around the World

Posted in Big Data Analytics, Data growth | Leave a comment

Is Privacy Becoming a Luxury Good? Julia Angwin Keynote at Strata + Hadoop 2014 (Video)

We are being watched – by companies, by the government, by our neighbors. Technology has made powerful surveillance tools available to everyone. And now some of us are investing in counter-surveillance techniques and tactics. Julia Angwin discusses how much she has spent trying to protect her privacy, and raises the question of whether we want to live in a society where only the rich can buy their way out of ubiquitous surveillance.

Julia Angwin is an award-winning investigative journalist at the independent news organization ProPublica. From 2000 to 2013, she was a reporter at The Wall Street Journal, where she led a privacy investigative team that was a Finalist for a Pulitzer Prize in Explanatory Reporting in 2011 and won a Gerald Loeb Award in 2010. Her book, Dragnet Nation: A Quest for Privacy, Security and Freedom in a World of Relentless Surveillance, was published by Times Books in 2014. In 2003, she was on a team of reporters at The Wall Street Journal that was awarded the Pulitzer Prize in Explanatory Reporting for coverage of corporate corruption. She is also the author of “Stealing MySpace: The Battle to Control the Most Popular Website in America” (Random House, March 2009).

Posted in Big Data Analytics, Big Data Backlash, Privacy | Leave a comment

Recruiting Data Scientists to Mine the Data Explosion



Wes Hunt, Chief Data Officer (CDO) at Nationwide Mutual Insurance Co. on recruiting data scientists:

Finding talent is my largest challenge. Someone who understands our business, who has quantitative skills, who has the technical skills to create the models, and who is able to persuade others that the insights they’ve come up with are ones you can trust and take action on. The hardest part is persuasion. You get the quantitative skills, but there’s a struggle in that ability to communicate effectively. We’ll often pair people together, but we’d really like to grow the talent.

When I was in marketing, we put a focus on liberal-arts-educated individuals, because abstract thinking where there are ambiguous data sets is an area where they are comfortable. Ph.D.s in psychology were a great recruiting pool. A psych Ph.D. has a fair amount of statistical training. We created a program to recruit Ph.D.s.

There’s not yet an educational discipline and curriculum that produces data scientists at the scale that would clear the market. So the way we’ve focused on it is to find people with innate curiosity and critical thinking. You can teach the other skills. On my team, I have a pathologist, a bioengineering student who trained in doing heart research, an M.B.A., and someone who is trained in traditional data architecture. I also have a landscape construction engineer and a psychology Ph.D.


Posted in Data growth, Data Science, Data Science Careers | 1 Comment

Data & The New Era of Interactive Storytelling–Strata+Hadoop 2014 (Video)


Data is an evolving story. It’s not a static snapshot of a point in time insight. With data from internal and external sources constantly updating, we are evolving from rear-view mirror dashboard views into an era of interactive Storytelling. Data Storytelling is both a visual art and a method of interpreting analytic results. Data Stories shed insights every minute, every hour, everyday, every week. This keynote will discuss how data dashboards are no longer adequate and how companies are using Interactive Storytelling to discover faster insights across many disparate data sources.

About Sharmila Shahani-Mulligan:
Sharmila has spent 18+ years building game-changing software companies in a variety of markets. She has been EVP & CMO at numerous software companies, including Netscape, Kiva Software, AOL, Opsware, and Aster Data. She drove the creation of several multi-billion dollar market categories, including application servers, data center automation and big data analytics. She is on the board of Hadapt and Lattice Engines, advisor to numerous companies, large and small, and an active investor in early stage companies.

Posted in Big Data Analytics, data visualization | 1 Comment

Statistics Without the Agonizing Pain: John Rauser Keynote at Strata + Hadoop 2014 (Video)

There are two essential skills for the data scientist: engineering and statistics. A great many data scientists are very strong engineers but feel like impostors when it comes to statistics. In this talk John will argue that the ability to program a computer gives you special access to the deepest and most fundamental ideas in statistics. John’s goal is to convince the non-statistician engineers in the audience that the road to statistical fluency is much, much shorter than they think.

About John Rauser:
John has been extracting value from large datasets for over 20 years at hedge funds, small data-driven startups, Amazon, and now Pinterest. He has deep experience in machine learning, data visualization, on-line experimentation, website performance and real-time fault analysis. An empiricist at heart, “Just do the experiment!” is his favorite call to arms.

Posted in Big Data Analytics, Data Science, Statistics | Leave a comment

Doug Cutting on Hadoop, October 2014 (Video)

Posted in Hadoop | Leave a comment