feature article
Subscribe Now

A Bid to Simplify Flash Subsystem Design

Flash memory, once exotic and expensive, has followed in DRAM’s steps to become a familiar everyday technology. Even more than DRAM perhaps: when was the last time you went to a drug store and picked up a DRAM card while you were there?

As with DRAM, this has been motivated by price decreases: the price per megabyte of Flash is falling by roughly half every year, and volume has responded with a compound annual growth rate of about 170%, according to Denali Software, Inc. Couple this with the fact that Flash is non-volatile, and, well, it’s no surprise that it’s being found not only in camera storage cards and cell phones, but is now becoming a part of the memory subsystem in computing platforms.

One technology advance that has made this reduction in price possible is the development of the Multi-Level Cell (MLC). Traditional Flash used a Single-Level Cell (SLC), which is the standard kind of memory bit we’re used to – one bit per cell, either on or off. With MLC technology, each cell can have more states than just on and off. Current technology allows four levels, meaning that two bits’ worth of data can be stored in a single cell. This literally doubles the capacity of an array. Of course, it’s not that simple. With an SLC, you can be relatively sloppy when reading the cell data – near ground for a zero, and near the rail for a one, for example. To get an MLC, you’re dividing up that voltage range into finer divisions, meaning the sensing circuitry has to be much more discerning, and data retention becomes less forgiving. This has complicated both the internal read circuitry and the Flash controllers that have to interpret which bits go where. The next step to four-bit/cell technology will make the challenge even greater – sixteen voltage levels per cell.

Makers of specialty components have the luxury of calling their own shots when it comes to how their parts work, are pinned out, are timed, etc. The problem comes about when the components aren’t so specialized anymore, when they become commoditized, but with multiple vendors having different interfaces to their chips. In the case of Flash, the interfaces have been similar, but close doesn’t count, meaning extra work supporting multiple Flash vendors in a system.

The Open NAND Flash Interface (ONFi – love the lower-case “i”… At least they didn’t go all hacker on us and call it oNfI) was formed to unify the interface at the chip level. Now there’s a group working at the other side of things on a common API for software: the Non-Volatile Memory Host Controller Interface (NVMHCI), which is close to being approved in its first iteration.

All of this makes it easier to put Flash memory subsystems together and allow interchangeability (from an interface standpoint) of different Flash memories. It is in this environment, then, that Denali is announcing a new FlashPoint™ platform that is intended to simplify the configuration of a Flash subsystem using PCI-Express as the data pipe.

Denali has historically concentrated on generating IP for use in System-on-Chip (SoC) designs, starting with DDR memory, and following on with NAND Flash and PCI-Express. So it’s somewhat natural that they would move up a level of abstraction and automate the pulling together of the PCI-Express, Flash, and control blocks. They see the use of this both for generating PCI-Express Flash cache chips and for integrating further into an SoC.

The package contains the hardware IP and a full software stack that talks to the NVMHCI interface. Since that interface is new, they also provide an NVMHCI driver for customers that haven’t yet incorporated it. They also support the ONFi interface, allowing the subsystems to be built with existing Flash devices from a variety of manufacturers.

The contents of the FlashPoint™ platform have been disclosed only at a high level so far. It consists of a 32-bit RISC processor, RAM, ROM, and a number of control blocks, assembled in an architecture that they have tuned for performance. Within this environment, you can size the Flash memory and scale the number of Flash controllers according to the needs of the application. There is also an encryption engine that can be bypassed as well as a security block that supports passwords and partitioning. The hardware IP is provided in RTL. Not clear yet is whether this customization happens through a wizard-like tool or through tighter interaction with Denali on a project-by-project basis.

On the software/controller side, Denali’s existing Flash system has four layers, starting with hardware, then hardware abstraction, a Flash Translation Layer, and an OS/RTOS layer. The FlashPoint platform adds modules to the existing set. So at the hardware level, they have added the processor and RAM, a command pipeline, an auto-config block, and an NVMHCI block. At the firmware level, they have added power management, reliability monitors, and a memory map. And at the software level, they provide a protocol stack, command ordering, a task engine, and a system initialization block.

Customization on the software side can be achieved by choosing which modules to include in a system. Customizing the behavior of the modules themselves is apparently theoretically possible, but isn’t really intended. First of all, there’s the issue of the memory footprint and trying to jam any new firmware into it. Then there’s the more practical matter of the software being available in binary only, complicating integration of anything new into the bundle. In reality, their intent is that the software modules remain intact, with any other functionality added by higher-level software that communicates with the Denali software via the NVMHCI API.

Denali promises bandwidth of 160 to 200 MB/s. While the platform can operate in any of the standard applications that use Flash, much attention is being focused on computer cache applications. Flash is moving into position as another piece of the memory architecture, to the point where Microsoft has provided new capabilities in Windows Vista ® – the so-called ReadyBoost™ and ReadyDrive™ features – to support Flash as a cache to a hard drive or as an outright Solid State Drive (SSD), respectively. Denali expects the cost/capacity of Flash to make SSDs mainstream by mid-2009. They specifically point to having been able to build a 300 MB cache operating at 15K I/O Operations per second (IOPS), and a smaller laptop cache at 10K IOPS.

The implications of greater Flash usage can mean both higher performance and lower power, an attractive (and unusual) combination that suggests that Flash will find its way beyond the keychain and into more of the devices we use. If Flash subsystems get easier to design, hopefully that will happen sooner rather than later.

Leave a Reply

featured blogs
Oct 19, 2020
We'€™re proud to see that many expert verification teams exploit the powers of UVM vr_ad, in implementing intricate verification environments in e . The vr_ad is an open source package, part of UVM- e... [[ Click on the title to access the full blog on the Cadence Communit...
Oct 16, 2020
Another event popular in the tech event circuit is PCI-SIG® DevCon. While DevCon events are usually in-person around the globe, this year, like so many others events, PCI-SIG DevCon is going virtual. PCI-SIG DevCons are members-driven events that provide an opportunity to le...
Oct 16, 2020
If you said '€œYes'€ to two of the items in the title of this blog -- specifically the last two -- then read on......
Oct 16, 2020
[From the last episode: We put together many of the ideas we'€™ve been describing to show the basics of how in-memory compute works.] I'€™m going to take a sec for some commentary before we continue with the last few steps of in-memory compute. The whole point of this web...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

See Inuitive’s NU4000 3D imaging and vision processor in action. The SoC supports high-quality 3D depth processor engine, SLAM accelerators, computer vision, and deep learning by integrating Synopsys ARC EV processor. In this demo, the NU4000 demonstrates simultaneous 3D sensing, SLAM and CNN functionality by mapping out its environment and localizing the sensor while identifying the objects within it. For more information, visit inuitive-tech.com.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

Featured Paper

The Cryptography Handbook

Sponsored by Maxim Integrated

The Cryptography Handbook is designed to be a quick study guide for a product development engineer, taking an engineering rather than theoretical approach. In this series, we start with a general overview and then define the characteristics of a secure cryptographic system. We then describe various cryptographic concepts and provide an implementation-centric explanation of physically unclonable function (PUF) technology. We hope that this approach will give the busy engineer a quick understanding of the basic concepts of cryptography and provide a relatively fast way to integrate security in his/her design.

Click here to download the whitepaper

Featured Chalk Talk

Consumer Plus 3D NAND SD Cards

Sponsored by Panasonic

3D NAND has numerous advantages, like larger capacity, lower cost, and longer lifespan. In many systems, 3D NAND in SD card form is a smart move. In this episode of Chalk Talk, Amelia Dalton chats with Brian Donovan about SD 3D NAND in applications such as automotive.

Click here for more information about Panasonic Consumer Plus Grade 3D NAND SD Cards