feature article
Subscribe Now

High Bandwidth, Modest Capacity

Micron Launches GDDR6; Cadence Talks AI/Crypto Memory

Aaaaand… it’s memory time again. I don’t keep up with every release of memory (who could keep up with that without dedicating their lives to nothing but that?), but here and there we have either technology or application angles to new-memory stories. So, in that vein, we address memory in automotive and AI. Yes, two critical keywords in any tech article these days.

Automotive Moves to Graphics

I chatted with Micron Technologies about their latest GDDR6 release. And the name of the game here is bandwidth, starting at 14 Gbps per pin and moving to 16 (with 20 working in the labs now). Memory capacity runs from 8 to 32 Gb.

There are two independent channels to this memory. In fact, you could pretty much think of them as two separate co-packaged dice. Each channel has its own memory, read/write access, and refresh. They can be run mutually asynchronously. If you want to use it as a single memory, you can gang the signals and busses together externally.

One of the things that’s changing with high-performance memory is the size of the payload: it’s shrinking. (How often do you hear about future workloads getting smaller?) So each channel has a 16-bit bus to be more efficient. If you gang the two channels together, then you get the more standard 32-bit bus.

So… OK… memory for graphics… um… where does the promised automotive thing enter the picture? Did we toss the word in here just to come up in more searches? Nope. Turns out that automotive designs – traditional users of LPDDR or plain-ol’ DDR – are needing more bandwidth for ADAS applications and, in particular, high-definition displays for L4/L5 levels of autonomy (in other words, high levels). Micron has worked with several other companies to help put together the circuits needed to create an entire system: Rambus for the PHY, Northwest Logic for the controller, and Avery Design Systems for verification IP.

But, of course, adapting to the functional needs of cars can be a deal with the devil when it comes to operational requirements – including reliability. Like, 7 to 10 years of reliability. According to Micron, this is tough to achieve with other memories, making their GDDR6 a better fit.

Memory for AI and Crypto

Next we look at two other application areas that share one characteristic with automotive. The applications are AI and cryptography, and what they share is the smaller transaction. But they still need super-fast access.

Cadence raised this topic at DAC in a conversation with Marc Greenberg. We didn’t really focus on new products specifically, but rather on developments in system design and how that’s translating into possible future memory solutions (whenever any resulting products materialize).

With AI, you’re storing the weights for a neural-net engine; with cryptography, you’re storing hashes. According to Cadence, designers are looking for novel memory structures to give them higher bandwidth without necessarily delivering higher capacity. HBM2 and GDDR6 are examples of such newer memories that are up for consideration.

The reason for seeking out something new lies in a gap in capacity with the standard memory options available today. Given that these are working-memory tasks, the options are SRAM and DRAM. AI memories tend to need on the order of 10 GB of capacity (plus or minus), which isn’t nothing, but it’s less than DRAM tends to deliver. That said, it’s way more than cache – which has space for a few MB of data – can handle. So there’s this Goldilocks capacity region that these designers are jonesing for.

One thing that you might anticipate would be that SRAM-based cache would draw more power than DRAM. After all, an SRAM bit cell always burns power; as bit cells go, SRAM cells are considered pretty power-hungry. Of course, you get speed in the bargain, but it would be understandable if you thought (as I would have) that SRAM is the higher-power solution.

Not so, according to Cadence. Yes, the SRAM bit cell does draw more power, but it turns out that that’s not what dominates power usage: data movement does. And with cache, you’re moving data some nanometers across a die. With DRAM, you’re going out pins and through wires and into other pins, and the power cost of doing so makes DRAM the overall higher-power solution.

Is that solved with HBM2 and GDDR6? Not clear. GDDR6 power is lower than GDDR5 due to a lower VDD. HBM2 power is lower than HBM for the same reason. And, as far as I can tell, HBM2 runs with less power than GDDR6. But are they meeting the power needs of these non-graphics, smaller-payload applications?

I checked back in with Cadence, and Mr. Greenberg clarified that power isn’t the driving feature here: bandwidth is. The catch is that, as noted, capacity needs are modest. These applications require more memory than can economically be included on-chip, so an off-chip solution is required. HBM2 and GDDR6 fit this space; their relative lower power as compared to alternatives or past generations certainly helps to reduce the overall power of the solution, but it’s not the main story.

Sooo… what’ll it be? HBM2 or GDDR6? Or both? Poking around, HBM2 may have the power advantage, but it would appear to have a significant cost disadvantage. Where bandwidth matters – like, say, gaming (where you see most of the HBM2 discussions), HBM2 can win. Its market has certainly been slower to evolve than some expected, but new offerings suggest that it’s still moving forward.

The DDR franchise, with its LP and G variants, contains more familiar names, so you might expect them to experience easier going. And high pricing is never a great thing in the automotive market. But what about AI, or crypto? Well, it depends on where the system is. In the cloud? In a server locally? Or in a gadget?

Acceptable price, performance, footprint, and power points will depend strongly on where the memory finds itself. AI, in particular, is new enough that it has a lot of settling out to do before we know whether it pervades absolutely everything or remains focused in more limited platforms. So we still have plenty of time before we know exactly what’s going to be required where.

 

More info:

Micron GDDR6

2 thoughts on “High Bandwidth, Modest Capacity”

  1. Doomed as an approach; it’s a long-term side effect of splitting the Silicon processes for CPUs and memory (DRAM).

    The reason they need more bandwidth is that communication is usually a dimension down from storage, i.e. storage is over the area of the chip (2D) but communication is usually just the edge (1D), and if you die-stack you’ll have a volume (3D) vs at best the (bottom) surface (2D) for communication. Every process shrink makes the problem worse.

    Also known as the commuting vs computing problem – spending more energy on moving data than actually computing.

    Processor-in-memory works a lot better, but most CAD flows don’t support asynchronous design and RTL CPUs are generally too hot too stack, so my money is on these guys –

    http://etacompute.com/products/low-power-ip/

Leave a Reply

featured blogs
Oct 20, 2020
In 2020, mobile traffic has skyrocketed everywhere as our planet battles a pandemic. Samtec.com saw nearly double the mobile traffic in the first two quarters than it normally sees. While these levels have dropped off from their peaks in the spring, they have not returned to ...
Oct 20, 2020
Voltus TM IC Power Integrity Solution is a power integrity and analysis signoff solution that is integrated with the full suite of design implementation and signoff tools of Cadence to deliver the... [[ Click on the title to access the full blog on the Cadence Community site...
Oct 19, 2020
Have you ever wondered if there may another world hidden behind the facade of the one we know and love? If so, would you like to go there for a visit?...
Oct 16, 2020
[From the last episode: We put together many of the ideas we'€™ve been describing to show the basics of how in-memory compute works.] I'€™m going to take a sec for some commentary before we continue with the last few steps of in-memory compute. The whole point of this web...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

Autonomous vehicles, robotics, augmented and virtual reality all require simultaneous localization and mapping (SLAM) to build a map of the surroundings. Combining SLAM with a neural network engine adds intelligence, allowing the system to identify objects and make decisions. In this demo, Synopsys ARC EV processor’s vision engine (VPU) accelerates KudanSLAM algorithms by up to 40% while running object detection on its CNN engine.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click here to download the e-book

Featured Chalk Talk

High-Performance Motor Control Solutions Through Integration

Sponsored by Mouser Electronics and Qorvo

Brushless motors have taken over the market for a huge number of applications these days. But, it’s easy to blow up your BOM cost with all the motor control and power management components required. In this episode of Chalk Talk, Amelia Dalton chats with Marc Sousa of Qorvo about the Power Application Controller (PAC) that can lower your BOM, trim down your component list, and give you several other benefits as well.

Click here for more information about Qorvo Power Application Controllers®