[August 22, San Francisco, CA]: Today, d-Matrix, the leader in high-efficiency Generative AI compute for data centers, announced Jayhawk II, the next generation of its highly anticipated generative AI compute platform. This new silicon features an enhanced version of its digital in-memory-compute (DIMC) engine with chiplet interconnect. This industry-first silicon demonstrates a DIMC architecture coupled with the OCP Bunch of Wires (BoW) PHY interconnect standard for low-latency AI inference on large language models (LLMs) from data center scale LLMs like ChatGPT to more focused models like Meta’s Llama2 or Falcon from the Technology Innovation Institute.
Cloud and enterprise business leaders are eager to deploy generative AI applications but are encountering major hurdles: the cost of running inference, inference latency and throughput and the availability of chips and compute power that scales for LLMs. Jayhawk II is designed to solve each of the challenges by combining a DIMC architecture with a chiplet-based interconnect. The d-Matrix silicon delivers a 40x improvement in memory bandwidth when compared to the state-of-the-art high-end GPUs. The higher memory bandwidth of the Jayhawk II translates to higher throughput and lower latency for generative inference applications while minimizing total cost of ownership (TCO).
“With the announcement of Jayhawk II, our customers are a step closer to serving generative AI and LLM applications with much better economics and a higher quality user experience than ever before,” said Sid Sheth, CEO and co-founder of d-Matrix. “We’re working with a range of companies large and small to evaluate the Jayhawk II silicon in real-world scenarios and the results are very promising.”
The Jayhawk II silicon follows the original Jayhawk announced earlier in 2023, which demonstrated 2Tbps bi-directional die-die connectivity and outperformed competitors with high bandwidth, energy efficiency and cost-effectiveness. Today’s announcement builds upon the original release and addresses the main challenges of running generative AI-specific LLMs:
- DIMC engine that scales from 30 TOPs/w to150 TOPs/w using a 6nm process technology
- Supports floating point and block floating point numerics across a range of precisions
- Supports compression and sparsity approaches enabling prompt caching for Generative AI models.
- Can handle 10 – 20x more generative inferences per second for LLM model sizes ranging from 3B to 40B parameters compared to incumbent state-of-the-art GPU solutions.
- Are 10 – 20x better TCO for generative inference when compared to these GPU solutions.
Jayhawk II is now available for demos and evaluation. To learn more visit d-matrix.ai.
d-Matrix is leading the data center architecture shift to Digital In-Memory Computing (DIMC) to address the growing demand for transformer and generative AI inference acceleration. d-Matrix creates flexible solutions for inference at scale using innovative circuit techniques, a chiplet-based architecture, high-bandwidth BoW interconnects and a full stack of machine learning and large language model tools and software. Founded in 2019, the company is backed by top investors and strategic partners including Playground Global, M12 (Microsoft Venture Fund), SK Hynix, Nautilus Venture Partners, Marvell Technology and Entrada Ventures.
Visit d-matrix.ai for more information and follow d-Matrix on LinkedIn for the latest updates.