3 Recent Books on Data Mining, Data Science and Big Data Analytics

Data-MiningNow that most of the hype around big data has died down, overtaken by the buzz over the Internet of Things, we are sometimes treated to serious discussions of the state-of-the-art (or science, for that matter) in data analysis. If you are planning a career as a data scientist or you are a business executive trying to understand what the data scientists are telling you, three recent books provide excellent and accessible overviews:

The Analytics Revolution: How to Improve Your Business By Making Analytics Operational In The Big Data Era by Bill Franks

Data Mining For Dummies by Meta S. Brown

Data Science For Dummies by Lillian Pierson

Bill Franks is the Chief Analytics Officer for Teradata, and his specialty is translating complex analytics into terms that business users can understand. The Analystics Revolution follows Franks’ Taming the Big Data Tidal Wave, which was listed on Tom Peters’ 2014 list of “Must Read” books.

“With all the hype around big data, it is easy to assume that nothing of interest was happening in the past if you don’t know better from experience” says Franks. The over-excitement about big data caused many organizations to re-create solutions that already exist and build new groups dedicated to big data analysis, separate from their traditional analytics functions. As a correction, Franks advocates “a new, integrated, and evolved analytics paradigm,” combining traditional analytics on traditional data with big data analytics on big data.

The focus of this new approach–and the book–is Operational Analytics. It takes us from the descriptive and predictive analytics of traditional and big data analytics to prescriptive analytics. It pays close attention to the numerous decisions and actions, mostly tactical, taking place every day in your business. Most important, it places great emphasis on the process of analytics, on embedding it everywhere, and on automating the required response to events and changing conditions.

“Of course,” says Franks, “it takes human intervention to decide that an operational analytics process is needed and to build the process.”  But once the process is designed and turned on, the process accesses data, performs analysis, makes decisions, and then actually causes actions to occur. And humans are crucial to the success of this new brand of automated analytics, not only at the design phase, but also in the on-going monitoring and tweaking of the process.

An example of operational analytics is the development of an improved maintenance schedule using sensor data. There will be no value in the Internet of Things without an automated process for data analysis and action based on that analysis. “As traditional manufacturers suddenly find themselves embedding sensors, collecting data, and producing analytics for their customers, industry lines blur. Not only are new competencies needed, but the reason customers choose a product may have less to do with traditional selection criteria than with the data and analytics offered with the product,” says Franks.

The practical advice Franks provides in the book ranges from how to set up an analytics organization to developing and maintaining a corporate culture dedicated to discovery (finding new insights in the data and quickly acting on them) to implementing operational analytics. The Analytics Revolution is an excellent guide to the new business world of blurred industry lines and innovative data products.

If you are ready to move on from understanding the why of analytics today and how to think about it in a broad business and organizational context to a more specific understanding of the how of analyzing data, Data Mining for Dummies by Meta Brown should be your first step. The book was written for “average business people,” showing them that you don’t need to be a data scientist and “you don’t need to be an expert in statistics, a scientist, or a computer programmer to be a data miner.”

Brown is a consultant, speaker and writer with hands-on experience in business analytics. She’s the creator of the Storytelling for Data Analysts and Storytelling for Tech workshops. In Data Mining for Dummies, Brown tells the story of what data miners do.

It starts with a description of a day in the life of a data miner and goes on to discuss in clear, easy-to-understand prose all the key data mining concepts, how to plan and organize for data mining, getting data from internal, public and commercial sources, how to prepare data for exploration and predictive modeling, building predictive models, and selecting software and dealing with vendors. Data Mining for Dummies is an excellent step-by-step guide to understanding data mining and how to become a data miner.

If you are ready to move on from understanding data mining and being a data miner to more advanced tools and applications for data analysis, Data Science for Dummies by Lillian Pierson should be your first step. The book was written for readers with some technical and math skills and experience, but it aims to provide a general introduction to one and all: “Although data science may be a new topic for many, it’s a skill that any individual who wants to stay relevant in her career field and industry needs to know.”

Pierson is a data scientist and environmental engineer and the founder of Data-Mania, a start-up that focuses mainly on web analytics, data-driven growth services, data journalism, and data science training services. “Data scientists,” she explains, “use coding, quantitative methods (mathematical, statistical, and machine learning), and highly specialized [domain] expertise in their study area to derive solutions to complex business and scientific problems.

Data Science for Dummies is an excellent practical introduction to the fundamentals of data science.  It provides a guided tour of the data science landscape today, from data engineering and processing tools such as Hadoop and MapReduce to supervised and unsupervised machine learning, statistics and mathematical modeling, using open-source applications such as Python and the R statistical programming language, finding resources for publicly-available data, and data visualization techniques for showcasing the results of your analysis. Stressing the importance of domain expertise for data scientists, Pierson provides detailed examples of applying data science in specific domains such as journalism, environmental intelligence, and e-commerce.

“A lot of times,” says Pierson, “data scientists get caught up analyzing the bark of the trees that they simply forget to look for their way out of the forest.” The three books reviewed here provide a handy map to the maze of data analysis and a safe conduct pass for business executives, IT staff, and students, ensuring that they successfully get in and out of the data forest. Remember, as ones and zeros eat the world, data is the new product and operational analytics, data mining, and data science is the new process of innovation.