industry news
Subscribe Now

NVIDIA Inference Breakthrough Makes Conversational AI Smarter, More Interactive From Cloud to Edge

TensorRT 8 Provides Leading Enterprises Across Healthcare, Automotive, Finance with World’s Fastest AI Inference Performance

SANTA CLARA, Calif., July 20, 2021 (GLOBE NEWSWIRE) — NVIDIA today launched TensorRT™ 8, the eighth generation of the company’s AI software, which slashes inference time in half for language queries — enabling developers to build the world’s best-performing search engines, ad recommendations and chatbots and offer them from the cloud to the edge.

TensorRT 8’s optimizations deliver record-setting speed for language applications, running BERT-Large, one of the world’s most widely used transformer-based models, in 1.2 milliseconds. In the past, companies had to reduce their model size, which resulted in significantly less accurate results. Now, with TensorRT 8, companies can double or triple their model size to achieve dramatic improvements in accuracy.

“AI models are growing exponentially more complex, and worldwide demand is surging for real-time applications that use AI. That makes it imperative for enterprises to deploy state-of-the-art inferencing solutions,” said Greg Estes, vice president of developer programs at NVIDIA. “The latest version of TensorRT introduces new capabilities that enable companies to deliver conversational AI applications to their customers with a level of quality and responsiveness that was never before possible.”

In five years, more than 350,000 developers across 27,500 companies in wide-ranging areas, including healthcare, automotive, finance and retail, have downloaded TensorRT nearly 2.5 million times. TensorRT applications can be deployed in hyperscale data centers, embedded or automotive product platforms.

Latest Inference Innovations
In addition to transformer optimizations, TensorRT 8’s breakthroughs in AI inference are made possible through two other key features.

Sparsity is a new performance technique in NVIDIA Ampere architecture GPUs to increase efficiency, allowing developers to accelerate their neural networks by reducing computational operations.

Quantization aware training enables developers to use trained models to run inference in INT8 precision without losing accuracy. This significantly reduces compute and storage overhead for efficient inference on Tensor Cores.

Broad Industry Support
Industry leaders have embraced TensorRT for their deep learning inference applications in conversational AI and across a range of other fields.

Hugging Face is an open-source AI leader relied on by the world’s largest AI service providers across multiple industries. The company is working closely with NVIDIA to introduce groundbreaking AI services that enable text analysis, neural search and conversational applications at scale.

“We’re closely collaborating with NVIDIA to deliver the best possible performance for state-of-the-art models on NVIDIA GPUs,” said Jeff Boudier, product director at Hugging Face. “The Hugging Face Accelerated Inference API already delivers up to 100x speedup for transformer models powered by NVIDIA GPUs. With TensorRT 8, Hugging Face achieved 1ms inference latency on BERT, and we’re excited to offer this performance to our customers later this year.”

GE Healthcare, a leading global medical technology, diagnostics and digital solutions innovator, is using TensorRT to help accelerate computer vision applications for ultrasounds, a critical tool for the early detection of diseases. This enables clinicians to deliver the highest quality of care through its intelligent healthcare solutions.

“When it comes to ultrasound, clinicians spend valuable time selecting and measuring images. During the R&D project leading up to the Vivid Patient Care Elevated Release, we wanted to make the process more efficient by implementing automated cardiac view detection on our Vivid E95 scanner,” said Erik Steen, chief engineer of Cardiovascular Ultrasound at GE Healthcare. “The cardiac view recognition algorithm selects appropriate images for analysis of cardiac wall motion. TensorRT, with its real-time inference capabilities, improves the performance of the view detection algorithm and it also shortened our time to market during the R&D project.”

Availability
TensorRT 8 is now generally available and free of charge to members of the NVIDIA Developer program. The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.

About NVIDIA
NVIDIA’s (NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market and has redefined modern computer graphics, high performance computing and artificial intelligence. The company’s pioneering work in accelerated computing and AI is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others. More information at https://nvidianews.nvidia.com/.

Leave a Reply

featured blogs
Jul 24, 2021
Many modern humans have 2% Neanderthal DNA in our genomes. The combination of these DNA snippets is like having the ghost of a Neanderthal in our midst....
Jul 23, 2021
The Team RF "μWaveRiders" blog series is a showcase for Cadence AWR RF products. Monthly topics will vary between Cadence AWR Design Environment release highlights, feature videos, Cadence... [[ Click on the title to access the full blog on the Cadence Community...
Jul 23, 2021
Synopsys co-CEO Aart de Geus explains how AI has become an important chip design tool as semiconductor companies continue to innovate in the SysMoore Era. The post Entering the SysMoore Era: Synopsys Co-CEO Aart de Geus on the Need for AI-Designed Chips appeared first on Fro...
Jul 9, 2021
Do you have questions about using the Linux OS with FPGAs? Intel is holding another 'Ask an Expert' session and the topic is 'Using Linux with Intel® SoC FPGAs.' Come and ask our experts about the various Linux OS options available to use with the integrated Arm Cortex proc...

featured video

Electromagnetic Analysis for High-Speed Communication

Sponsored by Cadence Design Systems

When your team is driving the future of breakthrough technologies like autonomous driving, industrial automation, and healthcare, you need software that helps meet approaching deadlines and increasingly high-performance demands. Learn how a system analysis solution can provide accurate 3D modeling, electromagnetic simulation, and electrothermal simulation at the chip, package, PCB, and system level.

Click to learn more

featured paper

Long-term consistent performance matters for humidity sensing applications

Sponsored by Texas Instruments

The exposed polymer of humidity sensors can be impacted by the environment, leading to drift over time. This article from Texas Instruments discusses the accuracy and long-term drift of humidity sensors and how these parameters affect system performance and lifetime.

Click to read more

featured chalk talk

High-Performance Test to 70 GHz

Sponsored by Samtec

Today’s high-speed serial interfaces with PAM4 present serious challenges when it comes to test. Eval boards can end up huge, and signal integrity of the test point system is always a concern. In this episode of Chalk Talk, Amelia Dalton chats with Matthew Burns of Samtec about the Bullseye test point system, which can maintain signal integrity up to 70 GHz with a compact test point footprint.

Click here for more information about Samtec’s Bulls Eye® Test System