industry news
Subscribe Now

Trains BERT in Record-Setting 53 Minutes and Slashes Inference to 2 Milliseconds; Enables Microsoft, Others to Use State-of-the-Art Language Understanding in Large-Scale Applications

NVIDIA today announced breakthroughs in language understanding that allow businesses to engage more naturally with customers using real-time conversational AI.

NVIDIA’s AI platform is the first to train one of the most advanced AI language models — BERT — in less than an hour and complete AI inference in just over 2 milliseconds. This groundbreaking level of performance makes it possible for developers to use state-of-the-art language understanding for large-scale applications they can make available to hundreds of millions of consumers worldwide.

Early adopters of NVIDIA’s performance advances include Microsoft and some of the world’s most innovative startups, which are harnessing NVIDIA’s platform to develop highly intuitive, immediately responsive language-based services for their customers.

Limited conversational AI services have existed for several years. But until this point, it has been extremely difficult for chatbots, intelligent personal assistants and search engines to operate with human-level comprehension due to the inability to deploy extremely large AI models in real time. NVIDIA has addressed this problem by adding key optimizations to its AI platform — achieving speed records in AI training and inference and building the largest language model of its kind to date.

“Large language models are revolutionizing AI for natural language,” said Bryan Catanzaro, vice president of Applied Deep Learning Research at NVIDIA. “They are helping us solve exceptionally difficult language problems, bringing us closer to the goal of truly conversational AI. NVIDIA’s groundbreaking work accelerating these models allows organizations to create new, state-of-the-art services that can assist and delight their customers in ways never before imagined.”

Fastest Training, Fastest Inference, Largest Model

AI services powered by natural language understanding are expected to grow exponentially in the coming years. Digital voice assistants alone are anticipated to climb from 2.5 billion to 8 billion within the next four years, according to Juniper Research. Additionally, Gartner predicts, by 2021, 15% of all customer service interactions will be completely handled by AI, an increase of 400% from 2017.1

Helping lead this new era, NVIDIA has fine-tuned its AI platform with key optimizations that have resulted in three new natural language understanding performance records:

  • Fastest training: Running the large version of one of the world’s most advanced AI language models — Bidirectional Encoder Representations from Transformers (BERT) — an NVIDIA DGX SuperPOD™ using 92 NVIDIA DGX-2H™ systems running 1,472 NVIDIA V100 GPUs slashed the typical training time for BERT-Large from several days to just 53 minutes. Additionally, NVIDIA trained BERT-Large on just one NVIDIA DGX-2 system in 2.8 days – demonstrating NVIDIA GPUs’ scalability for conversational AI.
  • Fastest inference: Using NVIDIA T4 GPUs running NVIDIA TensorRT™, NVIDIA performed inference on the BERT-Base SQuAD dataset in only 2.2 milliseconds – well under the 10-millisecond processing threshold for many real-time applications, and a sharp improvement from over 40 milliseconds measured with highly optimized CPU code.
  • Largest model: With a focus on developers’ ever-increasing need for larger models, NVIDIA Research built and trained the world’s largest language model based on Transformers, the technology building block used for BERT and a growing number of other natural language AI models. NVIDIA’s custom model, with 8.3 billion parameters, is 24 times the size of BERT-Large.

Ecosystem Adoption
Hundreds of developers worldwide are already using NVIDIA’s AI platform to advance their own language understanding research and create new services.

Microsoft Bing is using the power of its Azure AI platform and NVIDIA technology to run BERT and drive more accurate search results.

“Microsoft Bing relies on the most advanced AI models and computing platform to deliver the best global search experience possible for our customers,” said Rangan Majumder, group program manager, Microsoft Bing. “In close collaboration with NVIDIA, Bing further optimized the inferencing of the popular natural language model BERT using NVIDIA GPUs, part of Azure AI infrastructure, which led to the largest improvement in ranking search quality Bing deployed in the last year. We achieved two times the latency reduction and five times throughput improvement during inference using Azure NVIDIA GPUs compared with a CPU-based platform, enabling Bing to offer a more relevant, cost-effective, real-time search experience for all our customers globally.”

Several startups in NVIDIA’s Inception program, including Clinc, Passage AI and Recordsure, are also using NVIDIA’s AI platform to build cutting-edge conversational AI services for banks, car manufacturers, retailers, healthcare providers, travel and hospitality companies, and more.

Clinc has made NVIDIA GPU-enabled conversational AI solutions accessible to more than 30 million people globally through a customer roster that includes leading car manufacturers, healthcare organizations and some of the world’s leading financial institutions, including Barclays, USAA and Turkey’s largest bank, Isbank.

“Clinc’s leading AI platform understands complex questions and transforms them into powerful, actionable insights for the world’s leading brands,” said Jason Mars, CEO of Clinc. “The breakthrough performance that NVIDIA’s AI platform provides has allowed us to push the boundaries of conversational AI and deliver revolutionary services that help our customers use technology to engage with their customers in powerful, more meaningful ways.”

Optimizations Available Today
NVIDIA has made the software optimizations used to accomplish these breakthroughs in conversational AI available to developers:

*NVIDIA’s implementation of BERT is an optimized version of the popular Hugging Face repo

Additional Resources

Keep Current on NVIDIA
Subscribe to the NVIDIA blog, follow us on FacebookTwitterLinkedIn and Instagram, and view NVIDIA videos on YouTube and images on Flickr.

Leave a Reply

featured blogs
May 18, 2022
Learn how award-winning ARC processor IP powers automotive functional safety tech, from automotive sensors to embedded vision systems, alongside AI algorithms. The post Award-Winning Processors Drive Greater Intelligence and Safety into Autonomous Automotive Systems appeared...
May 18, 2022
The Virtuoso Education Kit has just been released and now there is already a new kit available: The Organic Printed Electronics PDK Education Kit ! This kit also uses Virtuoso as the main Cadence... ...
May 12, 2022
By Shelly Stalnaker Every year, the editors of Elektronik in Germany compile a list of the most interesting and innovative… ...
Apr 29, 2022
What do you do if someone starts waving furiously at you, seemingly delighted to see you, but you fear they are being overenthusiastic?...

featured video

Building safer robots with computer vision & AI

Sponsored by Texas Instruments

Watch TI's demo to see how Jacinto™ 7 processors fuse deep learning and traditional computer vision to enable safer autonomous mobile robots.

Watch demo

featured paper

5 common Hall-effect sensor myths

Sponsored by Texas Instruments

Hall-effect sensors can be used in a variety of automotive and industrial systems. Higher system performance requirements created the need for improved accuracy and more integration – extending the use of Hall-effect sensors. Read this article to learn about common Hall-effect sensor misconceptions and see how these sensors can be used in real-world applications.

Click to read more

featured chalk talk

10X Faster Analog Simulation with PrimeSim Continuum

Sponsored by Synopsys

IC design has come a very long way in a short amount of time. Today, our SoC designs frequently include integrated analog, 100+ Gigabit data rates and 3D stacked DRAM integrated into our SoCs on interposers. In order to keep our heads above water in all of this IC complexity, we need a unified circuit simulation workflow and a fast signoff SPICE and FastSPICE architecture. In this episode of Chalk Talk, Amelia Dalton chats with Hany Elhak from Synopsys about how the unified workflow of the PrimeSim Continuum from Synopsys can help you address systematic and scale complexity for your next IC design.

Click to read more about PrimeSim Continuum