Domain Expertise vs. Machine Learning: The Debate Continues

By starting to rank all the data scientists participating in its competitions, Kaggle today advanced further its argument that data science is a generic set of skills that can be applied to any problem without prior domain expertise. Talking to The New York Times‘ Quentin Hardy, Jeremy Howard, Kaggle’s president and chief scientist, said that “it makes little difference for a top performer if the problem is public health or essays in Arabic. The argument that great data science is just about letting the data talk holds true.”

For a (short, recent) history of the debate, see Mike Driscoll’s summary of the deliberations of the panel arguing for and against machine learning and domain expertise at the recent Strata conference (video here), the results of a KDnuggets poll, and Mike Loukides’ passionate defense of expertise, concluding that “the real value of a subject matter expert: not just asking the right questions, but understanding the results and finding the story that the data wants to tell. Results are good, but we can’t forget that data is ultimately about insight, and insight is inextricably tied to the stories we build from the data.”