industry news
Subscribe Now

Deci Achieves Record-Breaking Inference Speed on NVIDIA GPUs at MLPerf

Deci achieves the highest inference speed ever to be published at MLPerf for NLP, while also delivering the highest accuracy.
[Tel Aviv, Israel, April 5th, 2023] – Deci, the deep learning company harnessing Artificial Intelligence (AI) to build better AI, today announced results for its Natural Language Processing (NLP) model submitted to the MLPerf Inference v3.0 benchmark suite under the open submission track. Notably, the NLP model, generated by Deci’s Automated Neural Architecture Construction (AutoNAC) technology, dubbed DeciBERT-Large, delivered a record-breaking throughput performance of more than 100,000 queries per second on 8 NVIDIA A100 GPUs while also delivering improved accuracy. Also, Deci delivered unparalleled throughput performance per TeraFLOPs, outperforming competing submissions made on even stronger hardware setups.
Running successful inference at scale requires meeting various performance criteria such as latency, throughput, and model size, among others. Optimizing inference performance after a model has already been developed is an especially cumbersome and costly process, often leading to project delays and failures. Accounting for the inference environment and production constraints early in the development lifecycle can significantly reduce the time and cost of fixing potential obstacles to trying to deploy models.
“These results demonstrate once again the power of Deci’s AutoNAC technology, which is leveraged today by leading AI teams to develop superior deep learning applications, faster,” said Prof. Ran El-Yaniv, Deci’s chief scientist and co-founder. “With Deci’s platform, teams no longer need to compromise either accuracy or inference speed, and achieve the optimal balance between these conflicting factors by easily applying Deci’s advanced optimization techniques”. Deci’s model was submitted under the offline scenario in MLPerf’s open division in the BERT 99.9 category. The goal was to maximize throughput while keeping the accuracy within a 0.1% margin of error from the baseline, which is 90.874 F1 (SQUAD).
AI Inference Efficiency Translates into Bottom Line Results 
For the submission, Deci leveraged its deep learning development platform powered by its proprietary AutoNAC engine. The AutoNAC engine empowers teams to develop hardware aware model architectures tailored for reaching specific performance targets on their inference hardware. Models built and deployed with Deci typically deliver up to 10X increase in inference performance with comparable or higher accuracy relative to state of the art open source models. This increase in speed translates into a better user experience and a significant reduction in inference compute costs.
In this case, AutoNAC was used by Deci to generate model architectures tailored for various NVIDIA accelerators and presented unparalleled performance on the NVIDIA A30 GPU, NVIDIA A100 GPU (1 & 8 unit configurations), and the NVIDIA H100 GPU.
The below chart illustrates the throughput performance per TeraFLOPs as achieved by Deci and other submitters within the same category. Deci delivered the highest throughput per TeraFLOPs while also improving the accuracy. This inference efficiency translates into significant cost savings on compute power and a better user experience. Instead of relying on more expensive hardware, teams using Deci can now run inference on NVIDIA’s A100 GPU, achieving 1.7x faster throughput and +0.55 better F1 accuracy, compared to when running on NVIDIA’s H100 GPU. This means a 68%* cost savings per inference query.
Other benefits of Deci’s results include the ability to migrate from multi-gpu to a single GPU and lower inference cost and reduced engineering efforts. For example, ML engineers using Deci can achieve a higher throughput on one H100 card than on 8 NVIDIA A100 cards combined. In other words, with Deci, teams can replace 8 NVIDIA A100 cards with just one NVIDIA H100 card, while getting higher throughput and better accuracy (+0.47 F1).
On the NVIDIA A30 GPU, which is a more affordable GPU, Deci delivered accelerated throughput and a 0.4% increase in F1 accuracy compared to an FP32 baseline.
By using Deci, teams that previously needed to run on an NVIDIA A100 GPU can now migrate their workloads to the NVIDIA A30 GPU and achieve 3x better performance then they previously had for roughly a third of the compute price. This means dramatically better performance for significantly less inference cloud cost.
About Deci
Deci enables deep learning to live up to its true potential by using AI to build better AI. With the company’s deep learning development platform, AI developers can build, optimize, and deploy faster and more accurate models for any environment including cloud, edge, and mobile, allowing them to revolutionize industries with innovative products. Deci’s deep learning development platform equips teams with the tools and visibility they need in order to adopt a production-aware model development approach, eliminating the risks of development and shortening time to market. Founded by Yonatan Geifman, Ph.D, Professor Ran El-Yaniv, and Jonathan Elial, Deci’s team of deep learning engineers and scientists are dedicated to eliminating production-related bottlenecks across the AI lifecycle.

One thought on “Deci Achieves Record-Breaking Inference Speed on NVIDIA GPUs at MLPerf”

Leave a Reply

featured blogs
Jul 20, 2024
If you are looking for great technology-related reads, here are some offerings that I cannot recommend highly enough....

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

Sponsored by Cadence Design Systems

Larsen & Toubro built the world’s largest FIFA stadium in Qatar, the world’s tallest statue, and one of the world’s most sophisticated cricket stadiums. Their latest business venture? Designing data centers. Since IT equipment in data centers generates a lot of heat, it’s important to have an efficient and effective cooling system. Learn why, Larsen & Toubro use Cadence Reality DC Design Software for simulation and analysis of the cooling system.

Click here for more information about Cadence Multiphysics System Analysis

featured paper

Navigating design challenges: block/chip design-stage verification

Sponsored by Siemens Digital Industries Software

Explore the future of IC design with the Calibre Shift left initiative. In this paper, author David Abercrombie reveals how Siemens is changing the game for block/chip design-stage verification by moving Calibre verification and reliability analysis solutions further left in the design flow, including directly inside your P&R tool cockpit. Discover how you can reduce traditional long-loop verification iterations, saving time, improving accuracy, and dramatically boosting productivity.

Click here to read more

featured chalk talk

Autonomous Robotics Connectivity Solutions
Sponsored by Mouser Electronics and Samtec
Connectivity solutions for autonomous robotic applications need to include a variety of orientations, stack heights, and contact systems. In this episode of Chalk Talk, Amelia Dalton and Matthew Burns from Samtec explore trends in autonomous robotic connectivity solutions and the benefits that Samtec interconnect solutions bring to these applications.
Jan 22, 2024