feature article
Subscribe Now

A Bid to Simplify Flash Subsystem Design

Flash memory, once exotic and expensive, has followed in DRAM’s steps to become a familiar everyday technology. Even more than DRAM perhaps: when was the last time you went to a drug store and picked up a DRAM card while you were there?

As with DRAM, this has been motivated by price decreases: the price per megabyte of Flash is falling by roughly half every year, and volume has responded with a compound annual growth rate of about 170%, according to Denali Software, Inc. Couple this with the fact that Flash is non-volatile, and, well, it’s no surprise that it’s being found not only in camera storage cards and cell phones, but is now becoming a part of the memory subsystem in computing platforms.

One technology advance that has made this reduction in price possible is the development of the Multi-Level Cell (MLC). Traditional Flash used a Single-Level Cell (SLC), which is the standard kind of memory bit we’re used to – one bit per cell, either on or off. With MLC technology, each cell can have more states than just on and off. Current technology allows four levels, meaning that two bits’ worth of data can be stored in a single cell. This literally doubles the capacity of an array. Of course, it’s not that simple. With an SLC, you can be relatively sloppy when reading the cell data – near ground for a zero, and near the rail for a one, for example. To get an MLC, you’re dividing up that voltage range into finer divisions, meaning the sensing circuitry has to be much more discerning, and data retention becomes less forgiving. This has complicated both the internal read circuitry and the Flash controllers that have to interpret which bits go where. The next step to four-bit/cell technology will make the challenge even greater – sixteen voltage levels per cell.

Makers of specialty components have the luxury of calling their own shots when it comes to how their parts work, are pinned out, are timed, etc. The problem comes about when the components aren’t so specialized anymore, when they become commoditized, but with multiple vendors having different interfaces to their chips. In the case of Flash, the interfaces have been similar, but close doesn’t count, meaning extra work supporting multiple Flash vendors in a system.

The Open NAND Flash Interface (ONFi – love the lower-case “i”… At least they didn’t go all hacker on us and call it oNfI) was formed to unify the interface at the chip level. Now there’s a group working at the other side of things on a common API for software: the Non-Volatile Memory Host Controller Interface (NVMHCI), which is close to being approved in its first iteration.

All of this makes it easier to put Flash memory subsystems together and allow interchangeability (from an interface standpoint) of different Flash memories. It is in this environment, then, that Denali is announcing a new FlashPoint™ platform that is intended to simplify the configuration of a Flash subsystem using PCI-Express as the data pipe.

Denali has historically concentrated on generating IP for use in System-on-Chip (SoC) designs, starting with DDR memory, and following on with NAND Flash and PCI-Express. So it’s somewhat natural that they would move up a level of abstraction and automate the pulling together of the PCI-Express, Flash, and control blocks. They see the use of this both for generating PCI-Express Flash cache chips and for integrating further into an SoC.

The package contains the hardware IP and a full software stack that talks to the NVMHCI interface. Since that interface is new, they also provide an NVMHCI driver for customers that haven’t yet incorporated it. They also support the ONFi interface, allowing the subsystems to be built with existing Flash devices from a variety of manufacturers.

The contents of the FlashPoint™ platform have been disclosed only at a high level so far. It consists of a 32-bit RISC processor, RAM, ROM, and a number of control blocks, assembled in an architecture that they have tuned for performance. Within this environment, you can size the Flash memory and scale the number of Flash controllers according to the needs of the application. There is also an encryption engine that can be bypassed as well as a security block that supports passwords and partitioning. The hardware IP is provided in RTL. Not clear yet is whether this customization happens through a wizard-like tool or through tighter interaction with Denali on a project-by-project basis.

On the software/controller side, Denali’s existing Flash system has four layers, starting with hardware, then hardware abstraction, a Flash Translation Layer, and an OS/RTOS layer. The FlashPoint platform adds modules to the existing set. So at the hardware level, they have added the processor and RAM, a command pipeline, an auto-config block, and an NVMHCI block. At the firmware level, they have added power management, reliability monitors, and a memory map. And at the software level, they provide a protocol stack, command ordering, a task engine, and a system initialization block.

Customization on the software side can be achieved by choosing which modules to include in a system. Customizing the behavior of the modules themselves is apparently theoretically possible, but isn’t really intended. First of all, there’s the issue of the memory footprint and trying to jam any new firmware into it. Then there’s the more practical matter of the software being available in binary only, complicating integration of anything new into the bundle. In reality, their intent is that the software modules remain intact, with any other functionality added by higher-level software that communicates with the Denali software via the NVMHCI API.

Denali promises bandwidth of 160 to 200 MB/s. While the platform can operate in any of the standard applications that use Flash, much attention is being focused on computer cache applications. Flash is moving into position as another piece of the memory architecture, to the point where Microsoft has provided new capabilities in Windows Vista ® – the so-called ReadyBoost™ and ReadyDrive™ features – to support Flash as a cache to a hard drive or as an outright Solid State Drive (SSD), respectively. Denali expects the cost/capacity of Flash to make SSDs mainstream by mid-2009. They specifically point to having been able to build a 300 MB cache operating at 15K I/O Operations per second (IOPS), and a smaller laptop cache at 10K IOPS.

The implications of greater Flash usage can mean both higher performance and lower power, an attractive (and unusual) combination that suggests that Flash will find its way beyond the keychain and into more of the devices we use. If Flash subsystems get easier to design, hopefully that will happen sooner rather than later.

Leave a Reply

featured blogs
Aug 17, 2018
Samtec’s growing portfolio of high-performance Silicon-to-Silicon'„¢ Applications Solutions answer the design challenges of routing 56 Gbps signals through a system. However, finding the ideal solution in a single-click probably is an obstacle. Samtec last updated the...
Aug 17, 2018
If you read my post Who Put the Silicon in Silicon Valley? then you know my conclusion: Let's go with Shockley. He invented the transistor, came here, hired a bunch of young PhDs, and sent them out (by accident, not design) to create the companies, that created the compa...
Aug 16, 2018
All of the little details were squared up when the check-plots came out for "final" review. Those same preliminary files were shared with the fab and assembly units and, of course, the vendors have c...
Aug 14, 2018
I worked at HP in Ft. Collins, Colorado back in the 1970s. It was a heady experience. We were designing and building early, pre-PC desktop computers and we owned the market back then. The division I worked for eventually migrated to 32-bit workstations, chased from the deskto...
Jul 30, 2018
As discussed in part 1 of this blog post, each instance of an Achronix Speedcore eFPGA in your ASIC or SoC design must be configured after the system powers up because Speedcore eFPGAs employ nonvolatile SRAM technology to store its configuration bits. The time required to pr...