From the most recent edition of the tech bible: Moore’s Law begat faster processing and cheap storage which begat machine learning and big data which begat deep learning and today’s AI Spring. In her opening keynote at the Intel Analytics Summit, which was mostly about machine learning, Intel’s executive vice president Diane Bryant said that we are now “reaching a tipping point where data is the game changer.” (Disclosure: Intel paid my travel expenses).
With the rapid growth of machine-to-machine data exchange we should expect more, according to Bryant: Autonomous vehicles will produce 4 terabytes of data each day, a connected plane will transmit 40 terabytes of data, and the automated, connected factory will generate one petabyte (one million gigabytes) daily.
Another presenter, CB Bohn, Senior Database Engineer at Etsy, the online marketplace, speculated that the tipping point has already happened—when the value of the data exceeded the cost of its storage. Historical data has lots of value left in it, so “why throw it away?” asked Bohn. Cheap storage, added Debora Donato, Director of R&D at Mix Tech, a content discovery platform, has changed the attitudes of businesses towards data and what they can do with it.
Leading-edge enterprises today apply machine learning algorithms to mine and find insights in the ever-expanding data store. Jason Waxman, corporate vice president and general manager of the data center solutions group at Intel, described how Penn Medicine is improving patient care, using Intel’s TAP open analytics platform. One pilot study focused on sepsis or blood infection which affects more than a million Americans annually and is the ninth leading cause of disease-related deaths and the #1 cause of deaths in intensive care units, according to the Centers for Disease Control (CDC). Penn Medicine was able to correctly identify about 85% of sepsis cases (up from 50%), and made such identifications as much as 30 hours before the onset of septic shock, as opposed to just two hours prior using traditional identification methods.
Candid is a new app that launched recently, applying AI to solve the challenges previous anonymous social platforms could not overcome. CEO Bindu Reddy explained how machine learning helps identify and remove “bad apples”—both inappropriate content and abusers—and recommend relevant groups to Candid users.
Clear Labs differentiate itself by conducting DNA tests that are untargeted and unbiased, aiming to index the world’s food supply and set worldwide standards for “food integrity.” Maria Fernandez Guajardo, vice president of product, described how their molecular analysis of 345 hot dog samples from 75 brands and 10 retailers, discovered that 14.4% of the products tested were “problematic in some way,” mostly because of added ingredients that did not show up on the label. Some consumers, she reported, were especially concerned about hot dogs that claimed to be vegetarians but actually contained meat.
In answer to a question on the future of machine learning from O’Reilly Media’s Ben Lorica, moderator of a panel on distributed analytics, Intel fellow Pradeep Dubey suggested focusing on deep learning as it has been demonstrably successful recently. Michael Franklin of UC Berkeley recommended focusing on machine learning approaches that are usable, understandable and robust, whether of the deep or shallow kind. If an automated system is going to make a decision, he said, “You’d better understand what are the assumptions that went into the data and the algorithms, where does the data you collected differ from those assumptions and how robust is the answer that popped out of the system.”
This was, I believe, a swipe at some of the deep learning practitioners who have admitted publicly that they don’t really understand how their system comes up with its successful results (e.g., Yoshua Bengio: “very often we are in a situation where we do not understand the results of an experiment”). But nothing succeeds like success, whether it is understood or not, and for the last few years deep leaning has become a force of climate change, transforming the AI Winter into the AI Spring.
Pedro Domingos of the University of Washington, in his talk at the event, put the recent resurgence of deep learning in the historical perspective of five different approaches (and solutions) to artificial intelligence: Symbolists (inverse deduction), Connectionists (backpropagation—popular with the deep learning crowd), Evolutionaries (genetic programming), Bayesians (probabilistic inference) and Analogizers (kernel machines). Domingos’ book, The Master Algorithm, is a rallying cry for finding the best of all worlds, the one algorithm that will unite all approaches and provide the answer to life, the universe, and everything.
Before we get to the time when big algorithm will tell us what to do whether we understand it or not, humans are still needed to make sense of all the data they—and the machines—generate. The last panel of the Analytics Summit was, appropriately, a discussion of educating future data scientist. The panelists, executives with Coursera (Emily Glassberg Sands), Kaggle (Anthony Goldboom), Continuum Analytics (Travis Oliphant), Metis (Rumman Chowdhury), and Galvanize (Ryan Orban), moderated by Edd Wilder-James of Silicon Valley Data Science, represented the burgeoning world of data science education.
The good news is that one got the impression that they are now training a vastly expanded pool of people, with very diverse backgrounds and experiences, that either want to become proficient in data analysis or want to be able to speak, as general business managers, the data scientist’s language. The challenge today is not so much the widely-discussed shortage of data scientists but the failure by many companies to effectively integrate and support the work of data scientists. The right internal champion, the panelists agree, one who understands the potential of analytics and machine learning and knows how to get the required resources, is key to the success of the data science team.
Originally published on Forbes.com