I still think that Hacking Skills, Math & Statistics Knowledge and Substantive Expertise (shortened to “Programming”, “Statistics” and “Business” for legibility) are important… but I think that the role of Communication is important, too. All the insights you derive by leveraging your hacking, stats and business expertise won’t make a bit of a difference unless you can communicate them to people who may not have that unique blend of knowledge. You may need to explain your statistical insights to a business manager who needs to be convinced to spend money or change processes. Or to a programmer who doesn’t think statistically.
So here is the new data science Venn diagram, which also includes communication as one indispensable ingredient.
Davenport and Patil describe data scientists as curious, self-directed and innovative, i.e., they are not limited by the tools available and when needed fashion their own tools and even conduct academic- style research. Not surprisingly, people with this combination of skills and characteristics are rare, as rare and as much in demand as the computer programmers in the 1990s.
This rarity and high demand for data science skills has meant that statisticians, machine learners, data miners, data analysts, DBAs as well as quantitative analysts, i.e., people with any data or analytics skills have re-badged themselves as data scientists so that they are more marketable. This is not unlike the pre-Y2K hype when computer operators and users of PCs, re-badged themselves as computer programmers.
The term “data scientist” itself has become so diffuse that it represents anybody from data base administrators to analysts doing simplistic summaries on Excel spreadsheet to data engineers setting up Hadoop infrastructure to advanced analytics practitioners who discover valuable insights from data using existing tools as well as those like the data scientists in Google and Facebook who derive insights from data using their own enhanced toolkit.
So, is the name really relevant? Apparently not, since Google’s career pages advertise for Decision Support Analysts, Statisticians, Quantitative Analysts, and Data Scientists and they all mean the same thing. Over the last 50 years, many people have been working as the data scientists described by Davenport and Patil, discovering insights from large volumes of diverse data using existing tools as well as new tools that they fashioned. They have been labelled statisticians, artificial intelligence researchers, data miners, machine learners, advanced analytics experts and the list goes on.
What is relevant is to understand where an individual’s interest lies in the broad data science church and where the needs of the organisation are. The individual’s interest may be developing innovative algorithms to solve a new problem (the high-end data scientist described by Davenport and Patil), or identifying new business problems that can be solved with existing tools or distributed programming for Hadoop. The key is to match the organisation’s needs with an individual’s interest and not be bothered with the position title or the candidate’s label.
Finally, as for finding this rare species, let me point out that the characteristics of curiosity, self-direction and innovation are required in all scientific research. Fashioning tools to overcome a challenge has always been the hallmark of a research scientist. Didn’t Newton invent infinitesimal calculus when the mathematical tools at his disposal were insufficient to calculate the instantaneous speed? Furthermore, scientific research through PhD ensures that they are able to teach themselves new skills.
So, instead of looking to graduates from the newly designed data science majors, develop your own data scientists by first finding a PhD or Masters in a quantitative science such as physics, mathematics, statistics or computer science and then providing them data, time and autonomy. It worked for LinkedIn with Jonathan Goldman and for many other data-driven companies and it can work for you too!!