DeepBench from Baidu: Benchmarking Hardware for Deep Learning

baidu3

Source: Greg Diamos and Sharan Narang, “The need for speed: Benchmarking deep learning workloads,” O’Reilly AI Conference

At the O’Reilly Artificial Intelligence conference, Baidu Research announced DeepBench, an open source benchmarking tool for evaluating the performance of deep learning operations on different hardware platforms. Greg Diamos and Sharan Narang of Baidu Research’s Silicon Valley AI Lab talked at the conference about the motivation for developing the benchmark and why faster computers are crucial to the continued success of deep learning.

The harbinger of the current AI Spring, deep learning is a machine learning method using “artificial neural networks,” moving vast amounts of data through many layers of hardware and software, each layer coming up with its own representation of the data and passing what it “learned” to the next layer. As a widely publicized deep learning project has demonstrated four years ago, feeding such an artificial neural network with images extracted from 10 million videos can result in the computer (in this case, an array of 16,000 processors) learning to identify and label correctly an image of a cat. One of the leaders of that “Google Brain” project was Andrew Ng, who is today the Chief Scientist at Baidu and the head of Baidu Research.

Research areas of interest to by Baidu Research include image recognition, speech recognition, natural language processing, robotics, and big data. Its Silicon Valley AI Lab has deep learning and systems research teams that work together “to explore the latest in deep learning algorithms as well as find innovative ways to accelerate AI research with new hardware and software technologies.”

DeepBench is an attempt to accelerate the development of the hardware foundation for deep learning, by helping hardware developers optimize their processors for deep learning applications, and specifically, for the “training” phase in which the system learns through trial and error. “There are many different types of applications in deep learning—if you are a hardware manufacturer, you may not understand how to build for them. We are providing a tool for people to help them see if a change to a processor [design] improves performance and how it affects the application,” says Diamos.  One of the exciting things about deep learning for him (and no doubt for many other researchers) is that “as the computer gets faster, the application gets better and the algorithms get smarter.”

Case in point is speech recognition. Or more specifically, DeepSpeech, Baidu Research’s “state-of-the-art speech recognition system developed using end-to-end deep learning.” The most important aspect of this system is its simplicity, says Diamos, with audio on one end, text on the other end, and a single learning algorithm (a recurring convolutional neural network), sitting in the middle. “We can take exactly the same architecture and apply it to both English and Mandarin with greater accuracy than systems we were building in the past,” says Diamos.

In Mandarin, the system is more accurate in transcribing audio to text than native speakers, as the latter may have difficulty understanding what is said because of noise level or accent. Indeed, the data set used by DeepSpeech is very large because it was created by mixing hours of synthetic noise with the raw audio, explains Narang. The largest publicly available data set is about 2000 hours of audio recordings while the one used by DeepSpeech clocks in at 100,000 hours or 10 terabytes of data.

The approach taken by the developers of DeepSpeech is superior to other approaches argue Narang and Diamos. Traditional speech recognition systems using a “hand-designed algorithm,” get more accurate with more data but eventually saturate, requiring a domain expert to develop a new algorithm. The hybrid approach adds a deep convolutional neural network. The result is better scaling but again the performance eventually saturates. DeepSpeech uses deep learning as the entire algorithm and achieves continuous improvement in performance (accuracy) with larger data sets and larger models (more and bigger layers).

Bigger is better. But to capitalize on this feature (pun intended) of deep learning, you need faster computers. “The biggest bottleneck,” says Narang, “is training the model.” He concludes: “Large data sets, a complex model with many layers, and the need to train the model many times is slowing down deep learning research. To make rapid progress, we need to reduce model training time. That’s why we need tools to benchmark the performance of deep learning training. DeepBench allows us to measure the time it takes to perform the underlying deep learning operation. It establishes a line in the sand that will encourage hardware developers to do better by focusing on the right issues.”

Originally published on Forbes.com