Thanks Gil – some interesting figures here. No surprise that 2/3 of data scientists view cleaning and organizing data as their #1 challenge ๐ I’d be interested to see that figure for Hadoop adopters in particular. We find many customers make the integration and organization somewhat easier by consolidating raw data in their native formats onto a Hadoop Data Lake for which no rigid metadata schema are required. Data scientists can apply their custom schema when they actually run queries.
Many customers still struggle with the coding requirements and duplicate copies of conventional loading technologies. Those that use automated data integration software can speed the process and free up resources to actually focus on analytics.
– Kevin Petrie, Senior Director and Technology Evangelist, Attunity. http://www.attunity.com
Very interesting data. The popularity of Excel surprises me. Most of the rest I agree with. I find it very ironic that, to present data about Data Scientists, the graphics violate nearly every โruleโ a good Data Scientist would follow, based on experts like Tufte & Few.
@Bill Lyon – agreed. I’m thinking this is either based on a tickbox question of “what have you used previously” rather than “what do you use commonly?”, otherwise we’re including people that simply can’t be called Data Scientists with any real conviction.
Interesting infographic indeed. I agree in most of the areas.
Thanks Gil – some interesting figures here. No surprise that 2/3 of data scientists view cleaning and organizing data as their #1 challenge ๐ I’d be interested to see that figure for Hadoop adopters in particular. We find many customers make the integration and organization somewhat easier by consolidating raw data in their native formats onto a Hadoop Data Lake for which no rigid metadata schema are required. Data scientists can apply their custom schema when they actually run queries.
Many customers still struggle with the coding requirements and duplicate copies of conventional loading technologies. Those that use automated data integration software can speed the process and free up resources to actually focus on analytics.
– Kevin Petrie, Senior Director and Technology Evangelist, Attunity. http://www.attunity.com
Very interesting data. The popularity of Excel surprises me. Most of the rest I agree with. I find it very ironic that, to present data about Data Scientists, the graphics violate nearly every โruleโ a good Data Scientist would follow, based on experts like Tufte & Few.
Pingback: Data Mining and Machine Learning Digest: FEB 2015 | {coding}Refs
@Bill Lyon – agreed. I’m thinking this is either based on a tickbox question of “what have you used previously” rather than “what do you use commonly?”, otherwise we’re including people that simply can’t be called Data Scientists with any real conviction.