industry news
Subscribe Now

Generative AI Compute Front-Runner d-Matrix Reaches New Milestone in Efficient AI Inference

Digital in-memory compute chiplet is a first-of-its-kind for the industry

[August 22, San Francisco, CA]: Today, d-Matrix, the leader in high-efficiency Generative AI compute for data centers, announced Jayhawk II, the next generation of its highly anticipated generative AI compute platform. This new silicon features an enhanced version of its digital in-memory-compute (DIMC) engine with chiplet interconnect. This industry-first silicon demonstrates a DIMC architecture coupled with the OCP Bunch of Wires (BoW) PHY interconnect standard for low-latency AI inference on large language models (LLMs) from data center scale LLMs like ChatGPT to more focused models like Meta’s Llama2 or Falcon from the Technology Innovation Institute.‍

Cloud and enterprise business leaders are eager to deploy generative AI applications but are encountering major hurdles: the cost of running inference, inference latency and throughput and the availability of chips and compute power that scales for LLMs. Jayhawk II is designed to solve each of the challenges by combining a DIMC architecture with a chiplet-based interconnect. The d-Matrix silicon delivers a 40x improvement in memory bandwidth when compared to the state-of-the-art high-end GPUs. The higher memory bandwidth of the Jayhawk II translates to higher throughput and lower latency for generative inference applications while minimizing total cost of ownership (TCO).

“With the announcement of Jayhawk II, our customers are a step closer to serving generative AI and LLM applications with much better economics and a higher quality user experience than ever before,” said Sid Sheth, CEO and co-founder of d-Matrix. “We’re working with a range of companies large and small to evaluate the Jayhawk II silicon in real-world scenarios and the results are very promising.”‍

The Jayhawk II silicon follows the original Jayhawk announced earlier in 2023, which demonstrated 2Tbps bi-directional die-die connectivity and outperformed competitors with high bandwidth, energy efficiency and cost-effectiveness. Today’s announcement builds upon the original release and addresses the main challenges of running generative AI-specific LLMs:

  • DIMC engine that scales from 30 TOPs/w to150 TOPs/w using a 6nm process technology
  • Supports floating point and block floating point numerics across a range of precisions
  • Supports compression and sparsity approaches enabling prompt caching for Generative AI models.
  • Can handle 10 – 20x more generative inferences per second for LLM model sizes ranging from 3B to 40B parameters compared to incumbent state-of-the-art GPU solutions.
  • Are 10 – 20x better TCO for generative inference when compared to these GPU solutions.

Jayhawk II is now available for demos and evaluation. To learn more visit d-matrix.ai.

About d-Matrix

d-Matrix is leading the data center architecture shift to Digital In-Memory Computing (DIMC) to address the growing demand for transformer and generative AI inference acceleration. d-Matrix creates flexible solutions for inference at scale using innovative circuit techniques, a chiplet-based architecture, high-bandwidth BoW interconnects and a full stack of machine learning and large language model tools and software. Founded in 2019, the company is backed by top investors and strategic partners including Playground Global, M12 (Microsoft Venture Fund), SK Hynix, Nautilus Venture Partners, Marvell Technology and Entrada Ventures.‍

Visit d-matrix.ai for more information and follow d-Matrix on LinkedIn for the latest updates.

Leave a Reply

featured blogs
Dec 8, 2023
Read the technical brief to learn about Mixed-Order Mesh Curving using Cadence Fidelity Pointwise. When performing numerical simulations on complex systems, discretization schemes are necessary for the governing equations and geometry. In computational fluid dynamics (CFD) si...
Dec 7, 2023
Explore the different memory technologies at the heart of AI SoC memory architecture and learn about the advantages of SRAM, ReRAM, MRAM, and beyond.The post The Importance of Memory Architecture for AI SoCs appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

3D-IC Design Challenges and Requirements

Sponsored by Cadence Design Systems

While there is great interest in 3D-IC technology, it is still in its early phases. Standard definitions are lacking, the supply chain ecosystem is in flux, and design, analysis, verification, and test challenges need to be resolved. Read this paper to learn about design challenges, ecosystem requirements, and needed solutions. While various types of multi-die packages have been available for many years, this paper focuses on 3D integration and packaging of multiple stacked dies.

Click to read more

featured chalk talk

Intel AI Update
Sponsored by Mouser Electronics and Intel
In this episode of Chalk Talk, Amelia Dalton and Peter Tea from Intel explore how Intel is making AI implementation easier than ever before. They examine the typical workflows involved in artificial intelligence designs, the benefits that Intel’s scalable Xeon processor brings to AI projects, and how you can take advantage of the Intel AI ecosystem to further innovation in your next design.
Oct 6, 2023
7,867 views