Artifical Intelligence Machines to Replace Physicians and Transform Healthcare


Image | Posted on by | Leave a comment

10 New Big Data Observations from Tom Davenport

The term “big data” has become nearly ubiquitous. Indeed, it seems that every day we hear new reports of how some company is using big data and sophisticated analytics to become increasingly competitive. The topic first began to take off in late 2010 (at least according to search results from Google Trends) and, now that we’re approaching a five-year anniversary, perhaps it’s a good time to take a step back and reflect on this major approach to doing business. This article describes 10 of my observations about big data.

See also Tom Davenport’s Guide to Big Data

Posted in Big Data Analytics, Uncategorized | Tagged | Leave a comment

41 Tech Startups Disrupting The Car Industry


CB Insights:

Connected Car

One of the more popular categories for consumer-focused auto tech startups is the connected car, in which companies exploit car data and cloud software to allow drivers and third-party services insights into driving habits, car usage, and maintenance metrics. Companies like Zubie and Automatic market devices that capture data and allow drivers to track and improve driving habits, paired with an API for third-party services built on top of the device and software. Metromile is using data from your car to create a new type of “per-mile” usage-based auto insurance.

Fleet Telematics

Fleet telematics caught on relatively early as a hot category for auto tech startups. Companies like Telogis (first funded in 2009), Greenroad (which received a Series B in 2005), and Vnomics use installed hardware — and more recently, sometimes, smartphones — to capture data about the driving habits and fuel efficiency of truck drivers. Since trucks typically transport asset-intensive goods, businesses are highly incentivized to use products like these to improve savings and manage their inventory. Most of these installed systems transmit and organize data about fleets, provide in-vehicle coaching, and notes to drivers, and also handle billing for transport as part of a suite of offerings.

Vehicle-to-Vehicle Communication

Vehicle-to-vehicle communication allows cars to make decisions based on their surroundings and context, including distance, speed, and directional movement of other vehicles, underpinning self-driving and safety applications but also traffic management and driving efficiency use cases. Autotalks and Cohda Wireless market suites of hardware and software solutions for vehicle-to-vehicle communication (as well as vehicle-to-infrastructure communication). And Peloton deploys this technology primarily in trucks, which would theoretically allow self-driving trucks to travel in close “platoons,” and reduce the need for constant driver control.

Vehicle Cybersecurity

Vehicle cybersecurity is a small but emerging field. As more cars become connected to the cloud and infrastructure and other vehicles, more possible entry points exist that need to be protected against hackers’ exploits. Argus, which has raised $30M, is a cybersecurity company specifically focused on automobiles. Towersec aims to protect not just the vehicle itself, but also any telematics and any in-vehicle infotainment as well.

Driver-Safety Tools

Driver safety and collision-prevention are among the most important immediate applications auto tech is attacking. Insurance companies are particularly interested in this category. Various approaches exist: Navdy is using heads-up projections to display relevant information so that drivers don’t look at their smartphones while driving; Cambridge Mobile Telematics analyzes your driving habits using your smartphone, and provides coaching/analytics on how to improve; and Lytx uses dashcam technology to provide feedback based on visual cues, combined with driving habits.

Driver Assistance/Automated Car

These companies are using networks of sensors and powerful software to provide driver-assistance features. While companies like Cruise are retrofitting older cars, Robot of Everything has an entire lab dedicated to improving the different facets of automated driving in their “robocars.” It’s not just cars, RoboCV is working to automate warehouse vehicles, which navigate in smaller, more constrained spaces.






Posted in startups, Uncategorized | Tagged , | Leave a comment

$124,000: Median salary for professionals with big data expertise


Louis Columbus, Forbes, based on data from WANTED Analytics, a CEB Company

Louis Columbus:

The median advertised salary for professionals with big data expertise is $124,000 a year. Sample jobs in this category include Software Engineer, Big Data Platform Engineer, Information Systems Developer, Platform Software Engineer, Data Quality Director, and many others.


Posted in Big data jobs | Tagged , | Leave a comment

A Visual History of Human Knowledge (TED Talk)

Manuel Lima is the author of  The Book of Trees: Visualizing Branches of Knowledge. In this TED Talk, Lima explores the thousand-year history of mapping data — from languages to dynasties — using trees of information.

Posted in data visualization | Tagged , | Leave a comment

Google’s RankBrain Outranks the Best Brains in the Industry

google-brainBloomberg recently broke the news that Google is “turning its lucrative Web search over to AI machines.” Google revealed to the reporter that for the past few months, a very large fraction of the millions of search queries Google responds to every second have been “interpreted by an artificial intelligence system, nicknamed RankBrain.”

The company that has tried hard to automate its mission to organize the world’s information was happy to report that its machines have again triumphed over humans. When Google search engineers “were asked to eyeball some pages and guess which they thought Google’s search engine technology would rank on top,” RankBrain had an 80% success rate compared to “the humans [who] guessed correctly 70 percent of the time.”

There you have it. Google’s AI machine RankBrain, after only a few months on the job, already outranks the best brains in the industry, the elite engineers that Google typically hires.

Or maybe not. Is RankBrain really “smarter than your average engineer” and already “living up to its AI hype,” as the Bloomberg article informs us, or is this all just, well, hype?

Desperate to find out how far our future machine overlords are already ahead of the best and the brightest (certainly not “average”), I asked Google to shed more light on the test, e.g., how do they determine the “success rate”?

Here’s the answer I got from a Google spokesperson:

“That test was fairly informal, but it was some of our top search engineers looking at search queries and potential search results and guessing which would be favored by users. (We don’t have more detail to share on how that’s determined; our evaluations are a pretty complex process).”

I guess both RankBrain and Google search engineers were given possible search results to a given query and RankBrain outperformed humans in guessing which are the “better” results, according to some undisclosed criteria.

I don’t know about you, but my TinyBrain is still confused. Wouldn’t Google search engine, with or without RankBrain, outperform any human being, including the smartest people on earth, in terms of “guessing” which search results “would be favored by users”? Haven’t they been mining the entire corpus of human knowledge for more than fifteen years and, by definition, have produced a search engine that “understands” relevance more than any individual human being?

The key to the competition, I guess, is that the “search queries” used in it were not just any search queries but complex queries containing words that have different meaning in different context. It’s the kind of queries that will stump most human beings and it’s quite surprising that Google engineers scored 70% on search queries that presumably require deep domain knowledge in all human endeavors, in addition to search expertise.

The only example of a complex query given in the Bloomberg article is “What’s the title of the consumer at the highest level of a food chain?” The word “consumer” in this context is a scientific term for something that consumes food and the label (the “title”) at highest level of the food chain is “predator.”

This explanation comes from search guru Danny Sullivan who has come to the rescue of perplexed humans like me, providing a detailed RankBrain FAQ, up to the limits imposed by Google’s legitimate reluctance to fully share its secrets. Sullivan: “From emailing with Google, I gather RankBrain is mainly used as a way to interpret the searches that people submit to find pages that might not have the exact words that were searched for.”

Sullivan points out that a lot of work done by humans is behind Google’s outstanding search results (e.g., creating a synonym list or a database with connections between “entities”—places, people, ideas, objects, etc.). But Google needs now to respond to some 450 million new queries per day, queries that have never been entered before into its search engine.

RankBrain “can see patterns between seemingly unconnected complex searches to understand how they’re actually similar to each other,” writes Sullivan. In addition, “RankBrain might be able to better summarize what a page is about than Google’s existing systems have done.”

Finding out the “unknown unknowns,” discovering previously unknown (to humans) links between words and concepts is the marriage of search technology with the hottest trend in big data analysis—deep learning. The real news about RankBrain is that it is the first time Google applied deep learning, the latest incarnation of “neural networks” and a specific type of machine learning, to its most prized asset—its search engine.

Google has been doing machine learning since its inception. The first published paper listed in the AI and  machine learning section of its research page is from 2001, and, to use just one example, Gmail is so good at detecting spam because of machine learning). But Goggle hasn’t applied machine learning to search. That there has been internal opposition to doing so we learn from a summary of a 2008 conversation between Anand Rajaraman and Peter Norvig, co-author of the most popular AI textbook and leader of Google search R&D since 2001. Here’s the most relevant excerpt:

The big surprise is that Google still uses the manually-crafted formula for its search results. They haven’t cut over to the machine learned model yet. Peter suggests two reasons for this. The first is hubris: the human experts who created the algorithm believe they can do better than a machine-learned model. The second reason is more interesting. Google’s search team worries that machine-learned models may be susceptible to catastrophic errors on searches that look very different from the training data. They believe the manually crafted model is less susceptible to such catastrophic errors on unforeseen query types.

This was written three years after Microsoft has applied machine learning to its search technology. But now, Google got over its hubris. 450 million unforeseen query types per day are probably too much for “manually crafted models” and google has decided that a “deep learning” system such as RankBrain provides good enough protection against “catastrophic errors.”

Deep learning has taken the computer science community by storm since it was used to win an image recognition competition in 2012, performing better than traditional approaches to teaching computers to identify images.

With deep learning, the computer “learns” by putting together the pieces of a puzzle (e.g., an image of a cat), moving up a hierarchy created by the computer scientist, from simple concepts to more complex ones. (see here and here for overviews of deep learning). Decades ago this idea got the unfortunate name “neural networks” under the misguided (and hype-generating) notion that the computer networks were “mimicking the brain” (what they were mimicking were speculations about how neurons work in the human brain). The hype did not produce the promised results but starting about ten years ago, with the availability of greater computer power and much larger sets of data and more sophisticated algorithms, neural networks have been reincarnated as deep learning.

In 2012, Google engineers made their first deep learning splash when they announced that Google computers have detected the image of a cat after processing zillions of unlabeled still frames from YouTube videos.

In their post on this deep learning experiment, Jeff Dean, a Google Fellow, and Andrew Ng, A Stanford professor on leave at Google at the time, wrote:

“And this isn’t just about images—we’re actively working with other groups within Google on applying this artificial neural network approach to other areas such as speech recognition and natural language modeling.”

And in 2013, Google engineers announced an open source toolkit called word2vec “that aims to learn the meaning behind words.” They wrote: “Now we apply neural networks to understanding words by having them ‘read’ vast quantities of text on the web. We’re scaling this approach to datasets thousands of times larger than what has been possible before, and we’ve seen a dramatic improvement of performance — but we think it could be even better.”

2013 was also the year Google hired Geoffrey Hinton of the University of Toronto, “widely known as the godfather of neural networks,” according to Wired.  But the two other widely known members of the (self-labeled) “deep learning conspiracy” went to Google’s competitors: Yann LeCun to Facebook (leading a new AI research lab) and Yoshua Bengio to IBM (teaching Watson a few deep learning tricks).

Then there’s Apple, Yelp, Twitter and others—all of Google’s competitors are rushing to adopt deep learning.

This creates a serious competition for talent, for all the graduate students who three or four years ago switched the topic of their dissertations to something related to deep learning and all the others who have joined recently this “computers can learn on their own” movement. Hence the need to tell the world via Bloomberg that Google is in the game and for Google’s CEO to insist on its latest earnings call that “machine learning is a core transformative way by which we are rethinking everything we are doing.”

But beyond PR and prestige, future profits could be the most important incentive for Google to add deep learning to its search technology. It’s not only reducing costs by reducing the need to rely on humans and their “manually crafted models.” It’s also search quality, the reason Google has become the dominant search engine and a verb.

A Search Engine Land columnist, Kristine Schachinger, sheds further light on RankBrain in the context of search quality and Google’s shift in 2013 (the “Hummingbird” overhaul of their search algorithms) from providing search results based on words (strings of letters) to search results based on its knowledge of “things” (entities, facts):

Google has become really excellent at telling you all about the weather, the movie, the restaurant and what the score of last night’s game happened to be. It can give you definitions and related terms and even act like a digital encyclopedia. It is great at pulling back data points based around entity understanding.

Therein lies the rub. Things Google returns well are known and have known, mapped or inferred relationships. However, if the item is not easily mapped or the items are not mapped to each other, Google has difficulty in understanding the query…

While Google has been experimenting with RankBrain, they have lost market share — not a lot, but still, their US numbers are down. In fact, Google has lost approximately three percent of share since Hummingbird launched, so it seems these results were not received as more relevant or improved (and in some cases, you could say they are worse)…

Google might have to decide whether it is an answer engine or a search engine, or maybe it will separate these and do both.

I will go even further and speculate that Google is seeing the end of search as we know it (and they perfected), the possibility that in the future we will not enter search queries into search boxes but will rely on “knowledge navigators” (to use the term Apple coined in 1986), going beyond the current answer engines to communicating with us, providing relevant information and news, and anticipating our needs by linking things in our past, present, and future.

Now, is it possible that with Facebook’s investment in AI and deep learning, it will be the first to provide us with a futuristic knowledge navigator? What will happen to Google’s advertising revenues if the social network will consist not only of people but also deep learning machines?

Given its past performance and the competitive people running it (and its parent company), it’s obvious that RankBrain is just one of the many investments Google is making in “disrupting itself before others do” (I’m pretty sure that’s how they talk about it). Google will continue to provide outstanding, free, advertising-supported service to its users, no matter what form this service will take in the future.

Or maybe not. Being a devoted and admiring Google search user, I was a bit skeptical when I read Schachinger’s words quoted above that Google’s search results “were not received as more relevant or improved (and in some cases, you could say they are worse).”  But one very surprising search result I recently got from Google, led me to think that, indeed, sometimes when you invest in the future, you sacrifice the present.

I Googled the address “75 Amherst Street, Cambridge, MA 02139.” What I got (a number of times, over three days) at the top of the search results was a map of 75 Amherst Alley, Cambridge, MA 02139.

There is such a place, but I have never heard about it or ever been there. What’s more, 75 Amherst Street is the home of MIT’s Media Lab, so this is not only a very simple query but also one that probably has been entered into Google numerous times (the Media Lab’s contact page appears as the second result, just under the erroneous map).

Time to invest in more humans working diligently on “manually crafted models”?

Posted in AI, Google, Machine Learning | Tagged , , , , , | 2 Comments

Cloud traffic will grow at an annual rate of 33% over the next 5 years, Cisco predicts

The new version of the Cisco Cloud Index computes the rapid expansion of today’s stampede to the cloud. “We have never seen anything like this in terms of speed of customer adoption,” Oracle Co-CEO Mark Hurd said recently, describing how his corporate customers have enthusiastically embraced the cloud.

One of them, General Electric, has moved, in just the last 18 months, 10% more of its computing load into the cloud, and expects to run 70% of its applications in the cloud by 2020. In their latest quarterly financial reports, Amazon reported that its cloud business has surged 79% year-over-year and Microsoft announced that its cloud business has “more than doubled.”

Here are the highlights of Cisco’s ongoing study of the growth of global data center and cloud-based data traffic.

Almost all of the work of IT will be done in cloud data centers

Based on its hands-on knowledge of the movement of data over global computer networks, Cisco predicts that cloud traffic will grow at an annual rate of 33% over the next 5 years, quadrupling from 2.1 zettabytes (2.1 trillion gigabytes) in 2014 to 8.6 zettabytes by the end of 2019. 86% of workloads will be processed by cloud data centers in 2019  and only 14% will be processed by traditional data centers.

Cisco Figure 3 DC and Cloud Growth

Source: Cisco Global Cloud Index, 2014–2019

Cloud traffic is expected to account for 83% of total data center traffic by 2019. Cloud traffic is a subset of data center traffic and is generated by cloud services accessible through the Internet from scalable, virtualized cloud data centers. Total data center traffic, which Cisco projects will reach 10.4 zettabytes by the end of 2019, is comprised of all traffic traversing within and between data centers as well as to end users.

10.4 trillion gigabytes is the equivalent of 144 trillion hours of streaming music or 6.8 trillion of high-definition (HD) movies viewed online. Ones and zeros are eating the world and the companies providing consumers with digital entertainment and other services have been at the forefront of the migration to the cloud.  Indeed, The Wall Street Journal has reported recently that Netflix has shut down the last of its data centers, moving the last piece of its IT infrastructure to the public cloud.

The public cloud will grow faster than the private cloud

Source: Cisco Global Cloud Index, 2014–2019

Source: Cisco Global Cloud Index, 2014–2019

While overall cloud workloads will grow at a CAGR of 27% from 2014 to 2019, the public cloud workloads are going to grow at 44% CAGR over that period, and private cloud (where cloud services are  delivered to corporate users by their IT department) workloads will grow at a slower pace of 16%. By 2019, there will be more workloads (56%) in the public cloud than in private clouds (44%).

New sources of data, especially the Internet of Things, will keep the clouds very busy

Source: Cisco Global Cloud Index, 2014–2019

Source: Cisco Global Cloud Index, 2014–2019

The total volume of stored data on client devices and in data centers will more than double to reach 3.5 zettabytes by 2019. Most stored data resides in client devices today and will continue to do so over the next 5 years, but more data will move to the data center over time, representing 18% of all data in 2019, up from 12% in 2014.

In addition to larger volumes of stored data, the stored data will be coming from a wider range of devices by 2019. Currently, 73% of data stored on client devices resides on PCs. By 2019, stored data on PCs will go down to 49%, with a greater portion of data on smartphones, tablets, and machine-to-machine (M2M) modules. Stored data associated with M2M will grow at a faster rate than any other device category at an 89% CAGR.

A broad range of Internet of Things (IoT) applications are generating large volumes of data that could reach, Cisco estimates, 507.5 zettabytes annually by 2019. That’s 49 times greater than the projected data center traffic for 2019 (10.4 zettabytes). Today, only a small portion of this content is stored in data centers, but that could change as big data analytics tools are applied to greater volumes of the data collected and transmitted by IoT applications.

The figure below maps several M2M applications for their frequency of network communications, average traffic per connection, and data analytic needs. Applications such as smart metering can benefit from real-time analytics of aggregated data that can optimize the usage of resources such as electricity, gas, and water. On the other hand, applications such as emergency services and environment and public safety can be much enhanced through distributed real-time analytics that can help make real-time decisions that affect entire communities. Although some other applications such as manufacturing and processing can have potential efficiencies from real-time analytics, their need is not very imminent.

Source: Cisco Global Cloud Index, 2014–2019

Source: Cisco Global Cloud Index, 2014–2019

More consumers will keep their data in the cloud

Cisco estimates that by 2019, 55% (2 billion) of the Internet-connected consumer population will use personal cloud storage, up from 42% (1.1 billion users) in 2014.

Source: Cisco Global Cloud Index, 2014–2019

Source: Cisco Global Cloud Index, 2014–2019

Global consumer cloud storage traffic will grow from 14 exabytes (14 billion gigabytes) annually in 2014 to 39 exabytes by 2019 at a 23% CAGR. This growth translates to per-user traffic of 1.6 gigabytes per month by 2019, compared to 992 megabytes per month in 2014.

Source: Cisco Global Cloud Index, 2014–2019

Source: Cisco Global Cloud Index, 2014–2019

Ones and zeros are eating the world and today we got fresh insights into how much, how fast, and how their movement changes the way IT services are delivered to businesses and consumers.  For more data and the study’s methodology, go to the Cisco Global Cloud Index webpage.

Originally published on

Posted in Cloud Computing, Data growth, IT industry | Tagged , , | Leave a comment