feature article
Subscribe Now

A Bid to Simplify Flash Subsystem Design

Flash memory, once exotic and expensive, has followed in DRAM’s steps to become a familiar everyday technology. Even more than DRAM perhaps: when was the last time you went to a drug store and picked up a DRAM card while you were there?

As with DRAM, this has been motivated by price decreases: the price per megabyte of Flash is falling by roughly half every year, and volume has responded with a compound annual growth rate of about 170%, according to Denali Software, Inc. Couple this with the fact that Flash is non-volatile, and, well, it’s no surprise that it’s being found not only in camera storage cards and cell phones, but is now becoming a part of the memory subsystem in computing platforms.

One technology advance that has made this reduction in price possible is the development of the Multi-Level Cell (MLC). Traditional Flash used a Single-Level Cell (SLC), which is the standard kind of memory bit we’re used to – one bit per cell, either on or off. With MLC technology, each cell can have more states than just on and off. Current technology allows four levels, meaning that two bits’ worth of data can be stored in a single cell. This literally doubles the capacity of an array. Of course, it’s not that simple. With an SLC, you can be relatively sloppy when reading the cell data – near ground for a zero, and near the rail for a one, for example. To get an MLC, you’re dividing up that voltage range into finer divisions, meaning the sensing circuitry has to be much more discerning, and data retention becomes less forgiving. This has complicated both the internal read circuitry and the Flash controllers that have to interpret which bits go where. The next step to four-bit/cell technology will make the challenge even greater – sixteen voltage levels per cell.

Makers of specialty components have the luxury of calling their own shots when it comes to how their parts work, are pinned out, are timed, etc. The problem comes about when the components aren’t so specialized anymore, when they become commoditized, but with multiple vendors having different interfaces to their chips. In the case of Flash, the interfaces have been similar, but close doesn’t count, meaning extra work supporting multiple Flash vendors in a system.

The Open NAND Flash Interface (ONFi – love the lower-case “i”… At least they didn’t go all hacker on us and call it oNfI) was formed to unify the interface at the chip level. Now there’s a group working at the other side of things on a common API for software: the Non-Volatile Memory Host Controller Interface (NVMHCI), which is close to being approved in its first iteration.

All of this makes it easier to put Flash memory subsystems together and allow interchangeability (from an interface standpoint) of different Flash memories. It is in this environment, then, that Denali is announcing a new FlashPoint™ platform that is intended to simplify the configuration of a Flash subsystem using PCI-Express as the data pipe.

Denali has historically concentrated on generating IP for use in System-on-Chip (SoC) designs, starting with DDR memory, and following on with NAND Flash and PCI-Express. So it’s somewhat natural that they would move up a level of abstraction and automate the pulling together of the PCI-Express, Flash, and control blocks. They see the use of this both for generating PCI-Express Flash cache chips and for integrating further into an SoC.

The package contains the hardware IP and a full software stack that talks to the NVMHCI interface. Since that interface is new, they also provide an NVMHCI driver for customers that haven’t yet incorporated it. They also support the ONFi interface, allowing the subsystems to be built with existing Flash devices from a variety of manufacturers.

The contents of the FlashPoint™ platform have been disclosed only at a high level so far. It consists of a 32-bit RISC processor, RAM, ROM, and a number of control blocks, assembled in an architecture that they have tuned for performance. Within this environment, you can size the Flash memory and scale the number of Flash controllers according to the needs of the application. There is also an encryption engine that can be bypassed as well as a security block that supports passwords and partitioning. The hardware IP is provided in RTL. Not clear yet is whether this customization happens through a wizard-like tool or through tighter interaction with Denali on a project-by-project basis.

On the software/controller side, Denali’s existing Flash system has four layers, starting with hardware, then hardware abstraction, a Flash Translation Layer, and an OS/RTOS layer. The FlashPoint platform adds modules to the existing set. So at the hardware level, they have added the processor and RAM, a command pipeline, an auto-config block, and an NVMHCI block. At the firmware level, they have added power management, reliability monitors, and a memory map. And at the software level, they provide a protocol stack, command ordering, a task engine, and a system initialization block.

Customization on the software side can be achieved by choosing which modules to include in a system. Customizing the behavior of the modules themselves is apparently theoretically possible, but isn’t really intended. First of all, there’s the issue of the memory footprint and trying to jam any new firmware into it. Then there’s the more practical matter of the software being available in binary only, complicating integration of anything new into the bundle. In reality, their intent is that the software modules remain intact, with any other functionality added by higher-level software that communicates with the Denali software via the NVMHCI API.

Denali promises bandwidth of 160 to 200 MB/s. While the platform can operate in any of the standard applications that use Flash, much attention is being focused on computer cache applications. Flash is moving into position as another piece of the memory architecture, to the point where Microsoft has provided new capabilities in Windows Vista ® – the so-called ReadyBoost™ and ReadyDrive™ features – to support Flash as a cache to a hard drive or as an outright Solid State Drive (SSD), respectively. Denali expects the cost/capacity of Flash to make SSDs mainstream by mid-2009. They specifically point to having been able to build a 300 MB cache operating at 15K I/O Operations per second (IOPS), and a smaller laptop cache at 10K IOPS.

The implications of greater Flash usage can mean both higher performance and lower power, an attractive (and unusual) combination that suggests that Flash will find its way beyond the keychain and into more of the devices we use. If Flash subsystems get easier to design, hopefully that will happen sooner rather than later.

Leave a Reply

featured blogs
Mar 28, 2024
The difference between Olympic glory and missing out on the podium is often measured in mere fractions of a second, highlighting the pivotal role of timing in sports. But what's the chronometric secret to those photo finishes and record-breaking feats? In this comprehens...
Mar 26, 2024
Learn how GPU acceleration impacts digital chip design implementation, expanding beyond chip simulation to fulfill compute demands of the RTL-to-GDSII process.The post Can GPUs Accelerate Digital Design Implementation? appeared first on Chip Design....
Mar 21, 2024
The awesome thing about these machines is that you are limited only by your imagination, and I've got a GREAT imagination....

featured video

We are Altera. We are for the innovators.

Sponsored by Intel

Today we embark on an exciting journey as we transition to Altera, an Intel Company. In a world of endless opportunities and challenges, we are here to provide the flexibility needed by our ecosystem of customers and partners to pioneer and accelerate innovation. As we leap into the future, we are committed to providing easy-to-design and deploy leadership programmable solutions to innovators to unlock extraordinary possibilities for everyone on the planet.

To learn more about Altera visit: http://intel.com/altera

featured chalk talk

ROHM Automotive Intelligent Power Device (IPD)
Modern automotive applications require a variety of circuit protections and functions to safeguard against short circuit conditions. In this episode of Chalk Talk, Amelia Dalton and Nick Ikuta from ROHM Semiconductor investigate the details of ROHM’s Automotive Intelligent Power Device, the role that ??adjustable OCP circuit and adjustable OCP mask time plays in this solution, and the benefits that ROHM’s Automotive Intelligent Power Device can bring to your next design.
Feb 1, 2024
7,683 views