Big data center operators say they are seeing a steady stream of new architectures for accelerating deep learning neural networks—and the flow is just getting started, according to comments at last week’s AI Hardware Summit. One analyst pegged the number of established and startup companies designing AI accelerators at a whopping 130.
“The machine-learning revolution has reopened the opportunity for new architectures…let a thousand flowers bloom,” said Alphabet Chairman and former Stanford President John Hennessy in an opening keynote at the event. Such domain-specific chips don’t have to be compatible with legacy object code so the industry “can introduce new architectures faster than in general-purpose computing,” he added.
Potential users from Alibaba, Facebook, Google, and Uber said the chip vendors need to show their benchmark scores, make their software easy to use, and conform to emerging standards.
“We are sampling a few vendors’ upcoming products, and one issue is using their software correctly…it takes a long time to vet hardware and a lot of time to bring new software into our ecosystem,” said Linjie Xu [[CQ]], director of applied AI architecture at Alibaba Cloud, speaking on a panel.
“China has a huge market for AI acceleration…but we want to see something real,” he said, noting Alibaba released an open-source version of one of its preferred benchmarks.
“A lot of new accelerators are coming next year or after, each for a different workload, but we’re limited by the resources we have” to evaluate them, said Samar Dalal, a senior manager of architecture and design at Uber. The lack of “a mature software stack is a limiter for deploying” new hardware, he agreed.
Panelists gave some examples of variations in their emerging workloads as a sign of how accelerators, too, will come in many flavors over time. For example, Alibaba does significant work in optimizing traffic patterns for smart cities, natural-language processing is a big focus for Uber, and Google recently stared an initiative in digital health using AI.
Hard and Soft Standards
To give the many chips a single socket, Whitney Zhao, a hardware engineer in Facebook’s infrastructure group, led the design of the open-source accelerator module (called the OAM), as well as a motherboard that can accommodate a handful of them. The next step for her group is to design or specify “standard tooling and utilities to do management and monitoring” for the systems, Zhao said on the panel.
“For different workloads, accelerator solutions may be different, so we defined a one-size-fits-all platform, and now more than 30 companies support it with [accelerator startup] Habana Labs being one of the first,” she said.
The hyperscalers said vendors need to show performance benchmarks for their chips, starting with the metrics set by the MLPerf group, and ideally extended to some of their own company-specific workloads. The MLPerf’s new inference benchmarks open a door to discussions of possible standards for which of several popular precision formats to use.
Xu of Alibaba encouraged chip designers to join those discussions. “New precision formats need to be something others will want. We’ve seen cases of great ideas that didn’t materialize…this is a really new area,” he said.
The so-called bfloat16 format Google helped define is gaining traction—Intel even re-spun its first Nervana training accelerator to support it. “It’s probably time to standardize that [format], so we should talk to the IEEE” group that handles floating-point specs, said Cliff Young, a Google software engineer who moderated the panel.
Among other standards efforts, Xu applauded Habana’s use of Ethernet as an interconnect for its accelerators. In the future, such an approach could enable big data centers to virtualize and use pools accelerators across their far-flung networks so they won’t need to maintain separate installations of training and inference systems, he added.
Overall, the tech landscape is wide open for the hyperscalers who are hungry to drive AI today. For example, Google focused all its jobs on x86 processors until deep learning came along and it was forced to adopt GPUs to get the performance it needed.
“That broke all our [software] models, and that enabled us to [make and adopt] TPUs [Google’s own accelerators] more easily,” said Young. “Now we ask what the customer wants and try to virtualize it. Six years ago, Google was early in this space and we had to set our own [internal] standards–that’s not the case now,” he added.
The frameworks used to describe neural network models are, for the moment, relatively stable with a lot of momentum behind Google’s TensorFlow. “It is the #1 framework used in Alibaba, but we use many others as well as our own framework built on top of others” to handle the needs of specific workloads, said Xu.
However, developers “are willing to change frameworks quickly,” said Young. Frameworks are not like programming languages such as Fortran that developers once expected they might use for their whole career, he noted.
The new accelerators face any challenges, according to Karl Friend, an analyst with Moor Insights & Strategy who spoke at the event. They require a lot of time and money to design and make.
The new ASICs will have a relatively hard time keeping up with changing neural-network models. “Data center AI demands programmability,” Freund said.
Today Nvidia’s programmable GPUs dominate the market for chips to accelerate training neural nets in big data centers. Intel dominates the market for running inference jobs using the models also in big data centers.
Both companies will be hard to beat, Freund said. Nvidia is delivering good performance and has an extensive network of tools and partners. Intel has acquired three accelerators makers—Mobileye, Movidius and Nervana and has a new GPU called Xe in the works.
Meanwhile “a lot of other companies don’t have working silicon yet–it takes time,” Freund said.
The winners will be companies that strike a compelling balance of delivering high performance on specific workloads while being able to run software for other jobs, said Hennessy. They also will solve specific problems like handling both dense and sparse linear algebra well, he added.
The good news is the new AI style of computing is reviving a market for processors whose old tricks—like using advanced speculation and large caches–were running out of gas.
“There is no simple way forward for getting performance. Only path left is…to do just a few tasks, but do them extremely well…it’s a whole new world,” he said.