posted by Bryon Moyer
QuickLogic has been focusing hard on sensor fusion for a while, and we have looked at their ArcticLink solution in the past. The first versions consisted of a combination of hard logic, dedicated microcode processor, and FPGA fabric. And the focus was on low power.
Apparently this has done well enough to make them double down on the investment. But their new SoC is different enough that they gave it a new name. While the old devices were called ArcticLink, they’ve now dubbed their new device EOS (after the goddess of the dawn).
It targets both phones and wearables, and it’s predicated on the fact that new algorithms (voice trigger, indoor navigation, motion-compensated heart rate, etc.) require in the range of 10-20 MIPS each (additive as you add more functions), and that 10 mW is the magic cutoff that OEMs don’t want to exceed. So the question is, how many such functions can be implemented and still stay within the power budget?
Let’s start at a high level. The old devices were simply slaves in a larger system; you needed a host in the system. EOS, by contrast, can live on its own – a feature they tout for wearables, where the number of components and the power they consume are particularly delicate.
So EOS consists of the stuff that ArcticLink has – indicated in various shades of green below – plus a number of other blocks that allow the device to be autonomous. Functionally, whether or not it accompanies another device, this is intended to act as the “always-on” manager for the system, since they claim it will consume less power than any microcontroller-based sensor hub. So even if this sits in a phone next to an application processor with an integrated sensor hub, EOS is intended to handle the always-on sensing since it will consume less energy than that integrated hub.
`(Image courtesy Quicklogic)
Summarizing the new elements in the block diagram:
- The blocks in yellow are for running general-purpose software and an OS if desired. Because both phones and wearables are likely to have large amounts of external flash (for things like data logging), the EOS has no embedded flash. The 32-bit CPU (with floating point unit) runs up to 80 MHz; power consumption is 100 µW/MHz or 80 µW/DMIPS. Contains 512 Kb of SRAM.
- The blue blocks are for always-on voice detection. They handle only the first part of voice activation, where the system decides that a sound may be a voice. This is well- enough known to cast into hardware for lower power; higher-level decisions and command parsing would be handled by software either in the EOS’s Cortex M4 or by an external processor.
- The gray portions are system support blocks – including an LDO, which means the device doesn’t need an accompanying power management IC (PMIC).
The audio subsystem, all implemented in hardware, bears further discussion. The different formats and interfaces can get a bit confusing for those of us with less audio background, so here’s an attempt at a Reader’s Digest Condensed version.
Audio is typically digitized in microphones into one of two formats: pulse-density modulation (PDM) or pulse-coded modulation (PCM). The latter is actually probably more familiar: it’s simply a set of integers representing the digitized audio, with framing (so you can tell where the integers start and stop). An I2S bus is typically used to connect audio components using this format; higher-level processing and codecs (used for things like echo cancellation, beam forming, etc.) typically expect PCM.
PDM indicates a value based on how many transitions occur within a given timeframe. It’s a stream of pulses with no framing, and it’s typically carried on a two-wire bus (data and clock). Stereo sound can be carried on one signal, with the left channel sampled on one clock edge and the right channel on the other (although, at present, EOS supports only a single channel).
Some microphones can generate PCM; others (typically cheaper) generate PDM. So this makes possible the following EOS scenarios (which assume higher-level processing is implemented on a different device; the EOS internal Cortex M4 is an alternative for all of them):
- A PDM microphone is connected to EOS; it’s converted by EOS to PCM, and if the voice detector triggers, then the PCM signal is forwarded out the I2S bus for higher-level processing.
- A PCM microphone is connected via the I2S bus. If voice is detected, then the higher-level processor is alerted, and it consumes the signal already on the I2S bus.
- A PCM microphone is connected via the I2S bus; if voice is detected, then the PCM signal could be sent back out onto the same I2S bus using a different channel (since an I2S frame can contain several channels; the drawing below illustrates the original microphone signal on Channel 1 and the forwarded signal on Channel 2). The higher-level processor would need to know to listen to the appropriate channel. This would appear to have no advantage over scenario 2 unless the PCM stream were somehow modified by EOS.
- A PCM mic comes in via the I2S bus; if voice is detected, then the (presumably modified) PCM stream is sent back out onto a separate I2S bus that connects the higher-level processor. The EOS supports two I2S busses.
(Some drawing elements courtesy Quicklogic)
These different options mean that the EOS can be used in a device with a codec, like a phone, or without a codec, like a wearable. They’ve partnered with Sensory for many of the audio functions (hardware and software libraries for execution on the M4).
On a separate note, you may recall that some of the sensor fusion functions in the ArcticLink library came courtesy of Sensor Platforms – which has since been acquired by Audience. QuickLogic decided that they needed a bit more control of their own code, so they hired a substantial team to build libraries internally. That was made a bit easier by the fact that Sensor Platforms kept their code close to the vest, meaning that QuickLogic wasn’t contaminated – meaning they didn’t need to do a cleanroom project.
Of course… now they’re in the same boat with Sensory…
You can get more information from their announcement.
posted by Bryon Moyer
CogniVue recently made a roadmap announcement that puts Mobileye on notice: CogniVue is targeting Mobileye’s home turf.
We looked at Mobileye a couple years ago; their space is Advanced Driver Assistance Systems (ADAS). From an image/video processing standpoint, they apparently own 80% of this market. According to CogniVue, they’ve done that by getting in early with a proprietary architecture and refining and optimizing over time to improve their ability to classify and identify objects in view. And they’ve been able to charge a premium as a result.
What’s changing is the ability of convolutional neural networks (CNNs) to move this capability out of the realm of custom algorithms and code, opening it up to a host of newcomers. And, frankly, making it harder for players to differentiate themselves.
According to CogniVue, today’s CNNs are built on GPUs and are huge. And those GPUs don’t have the kind of low-power profile that would be needed for mainstream automotive adoption. CogniVue’s announcement debuts their new Opus APEX core, which they say can support CNNs in a manner that can translate to practical commercial use in ADAS designs. The Opus power/performance ratio has improved by 5-10 times as compared to their previous G2 APEX core.
You can find more commentary in their announcement.
Updates: Regarding the capacity for Opus to implement CNNs, the original version stated, based on CogniVue statements, that more work was needed to establish Opus supports CNNs well. CogniVue has since said that they've demonstrated this through "proprietary benchmarks at lead Tier 1s," so I removed the qualifier. Also, it turns out that the APEX core in a Freescale device (referenced in the original version) isn't Opus, but rather the earlier G2 version - the mention in the press release (which didn't specify G2 or Opus) was intended not as testament to Opus specifically, but to convey confidence in Opus based on experience with G2. The Freescale reference has therefore been removed, since it doesn't apply to the core being discussed.
posted by Bryon Moyer
Cadence has announced a couple of major upgrades over the last month or two. They’re largely unrelated to each other – one is synthesis, the other formal verification – so we’ll take them one at a time.
A New Genus Species
First, synthesis: they announced a new synthesis engine called Genus. Another in a recent line of new products ending in “-us.” (Upcoming motto: “One of -us! One of –us!”)
There are a couple specific areas of focus for the new tool. But they tend to revolve around the notion that, during the day, designers work on their units; at night, the units are assembled for a block- or chip-level run.
At the chip level, physical synthesis has been possible to reduce iterations between place & route and synthesis, but unit synthesis has remained more abstract, being just a bit of logic design without a physical anchor. For both chip and unit levels, accuracy and turn-around time are important. Without physical information, unit accuracy suffers, making necessary more implementation iterations.
To address turn-around time, significant effort was placed on the ability to split up a full chip for processing on multiple computers. They use three levels of partitioning – into chunks of about 100,000 instances, then 10,000 instances, and then at the algorithm level.
(Image courtesy Cadence)
The challenge they face in generating a massively parallel partition is that multiple machines will be needed. Within a single machine, multiple cores can share memory, but that’s not possible between machines. That means that communication between machines to keep coherent views of shared information can completely bog the process down on a poorly conceived partition.
To help with this, they partition by timing, not by physical location. So the partitions may actually physically overlap. They also try to cut only non-critical lines when determining boundaries, making less likely that multiple converging iterations will be needed.
That said, they still do several iterations of refinement, cutting and reassembling, then cutting again and reassembling. And then once more.
At the lowest level of partition, you have algorithms, and these are small enough to manage within a single machine. Which is good, because shared memory is critical here to optimize performance, power, and area (PPA).
Larger IP blocks can undergo microarchitecture optimization, where the impacts of several options are evaluated and analytically solved to pick the best PPA result.
Once an assembled chip has been synthesized with physical information, the contributing units are annotated with physical information so that subsequent unit synthesis can proceed more accurately, again, reducing overall iterations.
They’re claiming 5X better synthesis time, iterations between unit and block level cut in half, and 10-20% reduction in datapath area and power.
You can find other details about the Genus synthesis engine in their announcement.
Jasper Gets Incisive
Meanwhile, the integration of their acquired Jasper formal analysis technology is proceeding. Cadence had their own Incisive formal tools, but, with a few exceptions, those are giving way to the Jasper versions. But they’re trying to make it an easy transition.
One way they’ve done this is to maintain the Incisive front end with the new JasperGold flow. So there are two ways to get to the Jasper technology: via the Jasper front end (no change for existing Jasper users) and with the Incisive front end, making the transition faster.
(Image courtesy Cadence)
Under the hood, they’ve kept most of the Jasper engines based on how well they work, but they also brought over a few of the Incisive engines that performed well.
We talked about the whole engine-picking concept when we covered OneSpin’s LaunchPad platform recently. In Cadence’s case, they generally try all engines and pick the best results, but they also allow on-the-fly transitions from engine to engine. This is something they initiated a couple years ago with Incisive; it now accrues to the Jasper engines as well.
They’ve also got formal-assisted simulation for deeper bug detection; they compile IP constraints first for faster bug detection (at the expense of some up front time); and they’ve provided assertion-based VIP for emulation to replace non-synthesizable checkers.
And the result of all of this is a claimed 15X improvement in performance.
You can get more detail on these and other aspects in their announcement.