It feels just a little bit like a long arc that may actually be looping back onto its origin. Back to the future. Or perhaps ahead to the past.
Our topic for today is NOR flash memory, the introverted and slightly older twin to the better-known NAND flash that powers all those thumb drives that are spilling out of my drawer. Named for their internal architectures, both memories provide non-volatile storage by means of floating gate charge trapping, but NAND works better for random storage; NOR has been more effective for code execution.
At least, that’s how the story started. One of the primary uses of NOR flash was “execute in place” (or XIP): point the processor to the ROM during boot-up. Given that it was talking to a memory interface, the processor was none the wiser as it blithely executed the non-volatile flash code. What happened after that program was executed depended on the system; for the most part, you’d load your system up and then point somewhere in DRAM to continue on past boot.
It also sounds like some low-level routines and drivers were originally intended to be executed out of the ROM during normal operation. You can tolerate some slow performance (to a point) when booting up, but ongoing performance bottlenecks are less easily forgiven.
And so, way back when (I saw a reference to Chips and Technologies, if you want something of a timestamp), it became a thing on PCs not to execute directly out of the flash memory, but rather to dump the memory contents into RAM and then execute from there in what’s called “code shadowing.”
The tradeoff here is that you lose some RAM (or have to provide additional RAM) in exchange for the better access performance that the RAM provides.
Now… once you’ve done this, the distinction between NAND and NOR becomes less critical because you’re not executing out of the flash anymore. For instance, according to Micron Technology, phones stopped using NOR flash at some point, choosing instead to shadow code out of NAND flash. They “credit” this for the longer boot times of the resulting phones.
But this raises yet another tradeoff: how many kinds of memories you have in your system. NOR for boot, NAND for non-volatile random, DRAM for execution, SRAM for really fast execution… And, unless it’s embedded, it means a different chip for each one. (And if the flash ones are embedded, then it complicates the process.) It should be obvious that, in tight quarters like a phone, fewer chips is better. Might even be worth a longer boot cycle.
Given the prevalence of shadowing, a full memory interface on the NOR flash would be a distinct liability. The good news with such an interface is that you can access anything directly, rather than having to burst a bunch of neighboring data. The bad news is that it takes a lot of pins. And if you’re no longer executing directly out of the thing, then those pins aren’t nearly as critical.
And so NOR folks have developed serial EEPROMs – effectively squeezing the contents out of a single SPI wire for arranging in DRAM, where it can be executed. A benefit of this is that you can combine it with compression and/or encryption to save memory space and/or secure the boot code (not so good when executing in place).
So here we started with an execution performance issue, which drove a move away from XIP, which drove a move towards low-pin-count memory. Which is where our arc starts its loop back.
Performance might be better when executed out of DRAM, but you’ve still got to load that DRAM when booting. And a one-bit pipe is a pretty narrow way to do that. We’ve now fixed an operational performance problem but created a boot performance problem.
So devices are now available with up to quad SPI interfaces.
Meanwhile, someone seems to have been addressing the fundamental access speed issues of NOR. Note that NOR isn’t known for data storage partly because it has a really slow erase operation (and not great write times). But it does pretty well if you simply want to read what’s already there – in other words, treat it as a true ROM rather than as reprogrammable.
Micron makes a number of different flash configurations, so it can compare widely while not bashing the competition (assuming they’re not hiding better numbers from the competition). They lined up five 256-Mb devices with different configurations. Two were parallel NOR, having more I/Os for direct access. One ran at 100 Mbps, the other at 266 Mbps.
That’s better bandwidth than the two serial options they showed: a quad and a dual-quad, with 83 and 166 Mbps, respectively. But the parallel devices come with 64-pin packages, the serial with 24 pins. This sets up the current tradeoff: speed and 64 pins or less speed and 24 pins.
Then comes their just-announced XTRMflash series: configurable as x1 or x4 – or with a DDR-like interface that takes 4 more pins and delivers 400 Mbps – in a 24-pin package. It’s backwards pin-compatible with existing quad devices; they make use of 4 existing no-connect pins in the existing pinout. The interface is something they’re calling XTRMbus, and it follows an octal DDR protocol.
“But how about execution performance?” you ask. Here, obviously, the DDR configuration is going to win, and it’s what gives the 400 Mbps. Given that the device favors sequential access, there are two critical times: latency for first-word access and then subsequent-word access speed. The earlier four devices show latency of 70 and 96 ns for the parallel devices and 130-157 ns for the serial devices. The new XTRMflash device takes 83 ns to get the first word – not the fastest, but not a giant penalty, depending on how many random accesses you plan on. (How spaghetti is your code?)
On the flip side, every subsequent 8 bits takes… 2.5 ns to read. That compares to 6 ns on the fastest twin-quad device (the others range as high as 20 ns. The point being, as long as you don’t have too many jumps in your code, this could provide some not-so-sluggish XIP opportunities.
And we come full circle.
One practical note for anyone who hasn’t managed these configurations before. By default, the device powers up into X1 SPI mode. There are instructions that enable it to change to different configurations. But… you haven’t booted yet – who’s issuing those instructions?
Turns out that’s not the only way to set the configuration. You can program non-volatile registers at the same time as you program any ROM during production, before mounting the chip to the board. Alternatively, chipsets that support XTRMflash can be configured to boot the memory in X8 mode.
And, as a last quick tip of the hat to everybody’s favorite IoT worry, it bears noting that the device also has a number of security features – like the ability to freeze or “ROM” – sectors. Pretty much can’t get anywhere without security these days. Just like flying…
(Top image courtesy Micron.)
[Editor note: per convo below, edited first paragraph to clarify.]
10 thoughts on “Micron’s New NOR Flash Circles Back”
Does Micron’s new fast parallel interface make XIP viable again?
“NAND works better for random storage; NOR is more efficient for sequential access”
This sounds backwards to me. NAND is actually pretty good (well OK) for sequential access (think of solid state disk), while NOR is much better at random access, and faster all around. NAND’s real gain is in bits per unit area. Don’t look for 32 Gb NOR flash chips.
You know… there are those moments when you do a bunch of research and form a mental model and start writing, and a little voice screams from the distance that something seems inconsistent but you’re on deadline so you keep on writing…
Yeah, one of those times. I could have thought that through better and worded it differently – with a different meaning. With a full memory interface, NOR should do pretty well with random access. (Not so much in a serial version…)
Perhaps it’s more accurate to say that NAND tends to get used more for random storage of stuff. NOR certainly seems to be faster for sequential read access, which is good for code that doesn’t involve jumps. And as far as I could tell, NOR also has horrible erase/write times, making it worse for on-the-fly storage, making it better for code that’s not changing.
Conflating random-stuff-storage vs random access, and sequential vs code (much of which is sequential)…