“Business leaders want ‘the answer,’” says Bob Rogers, Chief Data Scientist for Big Data Solutions at Intel. But data scientists must understand what “the answer” means in the specific business context and communicate the expected impact in the language of the business executives. They need to explain the results of their analysis in “terms of the risk to the business” and “translate uncertainty into outcomes,” says Rogers. “If you show error bars on a number in a business presentation, you are probably going down the wrong path.”
When the data scientist as a new business role has emerged about a decade ago, the emphasis was on how it combined two disciplines and skill sets: computer science and statistics. More recently, the discussion of this evolving role has been along the lines of Rogers’ observation, as one combining technical and business expertise, emphasizing the importance of communications skills. Drew Conway’s 2010 definition of a data scientist as a Venn diagram of computer science, statistics and domain expertise has now been updated to include communications as a stand-alone set of required skills.
“Statisticians have missed the initial boat of data science,” says Rogers. “They tend to be very specific about the way they discuss data, ways that are not necessarily amenable to a broader discussion with a business audience.”
What we have here is a re-definition of what was previously perceived as a highly technical job to a more generalized business role. The rise of the Sexiest Job of the 21st Century has spawned numerous undergraduate and graduate programs focusing on imparting technical skills and knowledge, aiming to supply the widely-discussed shortage of experts in managing and mining the avalanche of big data. We now see business schools (e.g., Wharton) establishing a major in analytics, combining data science training with general business education. The nest step, I would argue, will be the complete integration of the two types of training: Business education as data science education.
Rogers’ varied work experience over the last twenty five years is a prime example of the amalgam of skills and expertise that will be the hallmark of successful business leaders in the years to come. It’s a unique combination of scientific curiosity and acumen, facility with computer programming and data manipulation, entrepreneurial drive and experimental inclination. All of these wrapped in a deep understanding, derived from direct experience, of the business context—the requirements, challenges, human motivations and attitudes that drive business success.
Like some of the leading data scientists of recent vintage, Rogers started his working life after earning a PhD in Physics. But in 1991, when he got his degree from Harvard University, there was not much data to support his thesis work in astrophysics, so he and others like him “were doing a lot more simulations.” Today, “there is a lot of data associated with cosmology,” says Rogers, but then and now, knowing how “to model the data” has been a crucial requirement in this and other scientific fields. A new training ground today for budding data scientists, according to Rogers, is computational neuroscience, where the “amount and shape of data” coming from functional MRI requires “advanced modeling thinking.”
While doing a post-doc at a research institute, his own experience with computer modeling and simulations led Rogers to co-author a book on using artificial neural networks for time series forecasting. All of a sudden he was getting phone calls from people asking him about forecasting the stock market, a subject he didn’t know much about.
Serendipity plays a major role in many illustrious careers and Rogers’ was no exception. The husband of a friend of his wife’s owned a trading firm in Chicago, and with his help, Rogers started a company rather than pursue an academic career, just like many latter-day data scientists. “I was 28 at the time,” he explained when I asked him why he made such a risky career switch.
In another similarity to today’s data scientists, Rogers did not limit his involvement with the startup to developing forecasting models for the Chicago futures market, but also got down and dirty building a research platform for collecting data on transactions and the back-office systems for executing trades, accounting, and other functions.
This went on for about a dozen years, in the last four of which Rogers has switched from R&D work to selling the company’s services when it opened up for new—international—investors.
“What was really profound for me as a data scientist,” says Rogers, “was actually the marketing side—I started to appreciate that there was a huge difference between having a technology that performed well and having a product that was tailored to fit the specific business needs of the customer. International investors had very specific needs around how the product was configured.”
Recalling his own experience leads Rogers to yet another observation about how understanding the business context and being able to communicate with business leaders are such important components of the data scientist’s job today:
“What I’ve seen changed between the pre-data science period and the current era is that analytics in the enterprise used to be very focused on a business leader asking a business analyst for a report on X—that was the process. Now, it’s much more of a conversation. Story telling skills, sensitivity to what the business needs are—successful data scientists tend to have this conversation.” In addition, there is more sensitivity to the uncertainty associated with data—“awareness that a number is not just a number”—even data that comes from a structured database should be handled with care.
By 2006, it was time to move on and “get into something that was more personally satisfying to me,” says Rogers, as “our computational and technological advantages have started to decline.” Healthcare turned out to be the more personally satisfying domain and he became the global product manager for the Humphrey Visual Field Analyser, widely used in Glaucoma care.
In yet another application of adding a time dimension to data, Rogers worked with a research team to move beyond a single, one-time measurement of the patient’s peripheral vision and compute the rate of change and the progression of blindness over time. “It became an important tool for tracking these patients and their response to therapy,” he says. And in yet another immersion in practical, hands-on computing, the solution involved adding networking software (licensed from Apple) to multiple devices in a clinic to facilitate the collection of data from past measurements.
Better access to data, Rogers understood from that experience, was crucial for improving healthcare. In 2009, when the US federal government started to give incentives for healthcare providers to use Electronic Medical Records (EMR) systems, he saw how the original paper silos were simply replaced with electronic silos, with each EMR system becoming a stand-alone database. Not only there was no physical connection, there was no interoperability “from a semantic point of view—descriptions in one system could not be directly compared with those in another.”
The solution was a cloud-based system that pulled data from a variety of sources and a machine learning software that constructed a table of all the codes and concepts in the clinical data and mapped them to each other. ”The more data we got, the better we got at mapping these concepts and building a robust set of associations,” says Rogers. “That allowed us to build a clinically intelligent search engine.”
You may think that this is “big data” in a nutshell—more data equals better learning or as some have called it, “the unreasonable effectiveness of data.” But you may want to reconsider admiring data quantity for quantity’s sake, given what Rogers and his colleagues found out while mining electronic medical records.
“63% of the key information that the doctor needs to know about you is not in your coded data at all,” says Rogers. “And 30% of the time, if you have a heart failure in your code, it’s not heart failure” and could have been a mistake or a related entry (e.g., a test for heart failure) in the billing system. As a result, most of the learning in Rogers’ machine learning system was dedicated to analysis of the text to “understand what information about the patient is actually correct.” An important big data lesson or what one may call the unreasonable effectiveness of data quality.
That system became the foundation of another startup, Apixio, which has recently raised $19.3 million in Series D venture capital funding. After serving there as Chief Scientist for 5 years, Rogers moved on again, in January 2015, this time from the world of startups to the corporate world and his current role at Intel.
As Chief Data Scientist he works internally on product road maps, providing input related to his expertise and the trends he sees. Externally, he works with the customers of Intel’s customers, helping them in “conceptualizing their entire analytics pipeline.” Providing free advice to consumers of analytics “helps keep Intel at the center of the computational world” and helps keep Rogers abreast of the latest data mining trends and developments. He learns about on-going concerns regarding whether a “new architecture” is required to accommodate the most recent data science tools and observes the rise of new challenges such as “monitoring many different real-time data streams.” And he reports that recently there has been a lot of interest in deep learning. Here, too, a key concern is integration–is it possible to build these new capabilities within the existing big data infrastructure?
Rogers’ role as a trusted advisor also includes working with partners. For example, the Collaborative Cancer Cloud, an Intel-developed precision medicine analytics platform. Currently, it is used by the Knight Cancer Institute at Oregon Health & Science University, Dana-Farber Cancer Institute and Ontario Institute for Cancer Research, to securely share patient genomic, imaging and clinical data to accelerate their research into potentially lifesaving discoveries.
Extrapolating from his current and previous work, Rogers sees the future of AI as “the development of machine learning systems that are good at figuring out the context.” A lot of the recent AI news has been about what he perceives as immature work—“image captioning is a sort of parlor trick,” says Rogers. “We will start to see an emerging AI capability” when we have machine or deep learning capable of identifying the context of the image.
Unlike others who see the machines as potentially replacing humans, Rogers envisions human-machine collaboration: “AI capabilities are most interesting when they are used to amplify human capabilities. There are things that we are good at cognitively but we cannot do at scale. We [should use machine learning] to surface the information from a large volume of data so we can do the next level of inference ourselves,” he says.
Understanding the context. Accepting and managing uncertainty. Linking pieces of data to uncover new insights. Like good data scientists, future business leaders will not look for “the answer.” With the right attitude, experience, and training, they will actively search for data to refute their assumptions, question their most beloved initiatives, and challenge their established career trajectories.
Originally published on Forbes.com