feature article
Subscribe Now

AI Sparks Hyper-Competition

Summit Points to the Future for Accelerators

Big data center operators say they are seeing a steady stream of new architectures for accelerating deep learning neural networks—and the flow is just getting started, according to comments at last week’s AI Hardware Summit. One analyst pegged the number of established and startup companies designing AI accelerators at a whopping 130. 

“The machine-learning revolution has reopened the opportunity for new architectures…let a thousand flowers bloom,” said Alphabet Chairman and former Stanford President John Hennessy in an opening keynote at the event. Such domain-specific chips don’t have to be compatible with legacy object code so the industry “can introduce new architectures faster than in general-purpose computing,” he added.

Potential users from Alibaba, Facebook, Google, and Uber said the chip vendors need to show their benchmark scores, make their software easy to use, and conform to emerging standards.

“We are sampling a few vendors’ upcoming products, and one issue is using their software correctly…it takes a long time to vet hardware and a lot of time to bring new software into our ecosystem,” said Linjie Xu [[CQ]], director of applied AI architecture at Alibaba Cloud, speaking on a panel.

“China has a huge market for AI acceleration…but we want to see something real,” he said, noting Alibaba released an open-source version of one of its preferred benchmarks.

“A lot of new accelerators are coming next year or after, each for a different workload, but we’re limited by the resources we have” to evaluate them, said Samar Dalal, a senior manager of architecture and design at Uber. The lack of “a mature software stack is a limiter for deploying” new hardware, he agreed.

Panelists gave some examples of variations in their emerging workloads as a sign of how accelerators, too, will come in many flavors over time. For example, Alibaba does significant work in optimizing traffic patterns for smart cities, natural-language processing is a big focus for Uber, and Google recently stared an initiative in digital health using AI.

Hard and Soft Standards

To give the many chips a single socket, Whitney Zhao, a hardware engineer in Facebook’s infrastructure group, led the design of the open-source accelerator module (called the OAM), as well as a motherboard that can accommodate a handful of them. The next step for her group is to design or specify “standard tooling and utilities to do management and monitoring” for the systems, Zhao said on the panel.

“For different workloads, accelerator solutions may be different, so we defined a one-size-fits-all platform, and now more than 30 companies support it with [accelerator startup] Habana Labs being one of the first,” she said.

The hyperscalers said vendors need to show performance benchmarks for their chips, starting with the metrics set by the MLPerf group, and ideally extended to some of their own company-specific workloads. The MLPerf’s new inference benchmarks open a door to discussions of possible standards for which of several popular precision formats to use. 

Xu of Alibaba encouraged chip designers to join those discussions. “New precision formats need to be something others will want. We’ve seen cases of great ideas that didn’t materialize…this is a really new area,” he said.

The so-called bfloat16 format Google helped define is gaining traction—Intel even re-spun its first Nervana training accelerator to support it. “It’s probably time to standardize that [format], so we should talk to the IEEE” group that handles floating-point specs, said Cliff Young, a Google software engineer who moderated the panel.

Among other standards efforts, Xu applauded Habana’s use of Ethernet as an interconnect for its accelerators. In the future, such an approach could enable big data centers to virtualize and use pools accelerators across their far-flung networks so they won’t need to maintain separate installations of training and inference systems, he added.

Overall, the tech landscape is wide open for the hyperscalers who are hungry to drive AI today. For example, Google focused all its jobs on x86 processors until deep learning came along and it was forced to adopt GPUs to get the performance it needed.

“That broke all our [software] models, and that enabled us to [make and adopt] TPUs [Google’s own accelerators] more easily,” said Young. “Now we ask what the customer wants and try to virtualize it. Six years ago, Google was early in this space and we had to set our own [internal] standards–that’s not the case now,” he added.

The frameworks used to describe neural network models are, for the moment, relatively stable with a lot of momentum behind Google’s TensorFlow. “It is the #1 framework used in Alibaba, but we use many others as well as our own framework built on top of others” to handle the needs of specific workloads, said Xu.

However, developers “are willing to change frameworks quickly,” said Young.  Frameworks are not like programming languages such as Fortran that developers once expected they might use for their whole career, he noted.

Market Dynamics

The new accelerators face any challenges, according to Karl Friend, an analyst with Moor Insights & Strategy who spoke at the event. They require a lot of time and money to design and make.

The new ASICs will have a relatively hard time keeping up with changing neural-network models. “Data center AI demands programmability,” Freund said.

Today Nvidia’s programmable GPUs dominate the market for chips to accelerate training neural nets in big data centers. Intel dominates the market for running inference jobs using the models also in big data centers.

Both companies will be hard to beat, Freund said. Nvidia is delivering good performance and has an extensive network of tools and partners. Intel has acquired three accelerators makers—Mobileye, Movidius and Nervana and has a new GPU called Xe in the works.

Meanwhile “a lot of other companies don’t have working silicon yet–it takes time,” Freund said. 

The winners will be companies that strike a compelling balance of delivering high performance on specific workloads while being able to run software for other jobs, said Hennessy. They also will solve specific problems like handling both dense and sparse linear algebra well, he added.

The good news is the new AI style of computing is reviving a market for processors whose old tricks—like using advanced speculation and large caches–were running out of gas.

“There is no simple way forward for getting performance. Only path left is…to do just a few tasks, but do them extremely well…it’s a whole new world,” he said.

Leave a Reply

featured blogs
Jan 17, 2022
Today's interview features Dajana Danilovic, an application engineer based near Munich, Germany. In this video, Dajana shares about her pathway to becoming an engineer, as well as the importance of... [[ Click on the title to access the full blog on the Cadence Community sit...
Jan 13, 2022
See what's behind the boom in AI applications and explore the advanced AI chip design tools and strategies enabling AI SoCs for HPC, healthcare, and more. The post The Ins and Outs of AI Chip Design appeared first on From Silicon To Software....
Jan 12, 2022
In addition to sporting a powerful processor and supporting Bluetooth wireless communications, Seeed's XIAO BLE Sense also boasts a microphone and a 6DOF IMU....

featured video

Synopsys & Samtec: Successful 112G PAM-4 System Interoperability

Sponsored by Synopsys

This Supercomputing Conference demo shows a seamless interoperability between Synopsys' DesignWare 112G Ethernet PHY IP and Samtec's NovaRay IO and cable assembly. The demo shows excellent performance, BER at 1e-08 and total insertion loss of 37dB. Synopsys and Samtec are enabling the industry with a complete 112G PAM-4 system, which is essential for high-performance computing.

Click here for more information about DesignWare Ethernet IP Solutions

featured paper

Building Automation and Control Systems (BACS)

Sponsored by Analog Devices

Analog Devices' industrial communication products provide building automation engineers with a broad range of Analog IO, Digital IO, Isolation, and communication interfaces that combine low power, robust performance, and improved diagnostics in the smallest possible form factors.

Click here to read more

featured chalk talk

ROHM's KX132-1211 & KX134-1211 Accelerometers

Sponsored by Mouser Electronics and ROHM Semiconductor

Machine health monitoring is a key benefit in the Industry 4.0 revolution. Integrating data from sensors for vibration detection, motion detection, angle measurement and more can give a remarkably accurate picture of machine health, and timely warning of impending failure. In this episode of Chalk Talk, Amelia Dalton chats with Alex Chernyakov of ROHM Semiconductor about the key considerations in machine health monitoring, and how a new line of accelerometers for industrial applications can help.

Click here for more information about Kionix / ROHM Semiconductor KX134 & KX132 Tri-axis Digital Accelerometers