After DSP’s annus mirabilis in 1948, another three decades would pass before actual, practical DSP chips would appear. DSP bits and pieces like TRW’s MPY016H hardware multiplier and TI’s TMC0280 LPC speech chip teased – real, integrated DSPs were just around the corner – but it was not until the 1980s that semiconductor technology advanced enough to make programmable DSP chips practical. The number of single-chip DSPs exploded during the 1980s and 1990s. Then, after 20 years, the era of the single-chip DSP came to an abrupt end. (Note: This article is the second half of “A Brief History of the Single-Chip DSP.”
Wally Rhines was working for Texas Instruments (TI) in the 1970s, and he desperately wanted to leave TI’s site in Lubbock, Texas. When an opportunity arose for him to manage TI’s microprocessor operation in Houston, he took the position because he found Houston a far more attractive place to live. Besides, no one else wanted the job. TI’s 16-bit 9900 microprocessor was dead in the water due to its uncompetitive 16-bit address space. Having thus failed to capture a piece of the general-purpose microprocessor market, Rhines’ newly adopted microprocessor team at TI in Houston created a four-pronged application-specific processor strategy. The four prongs of TI’s forked strategy were:
- The TMS320 DSP family
- The TMS340 family of graphics processors
- The TMS360 mass-storage processor (which quickly went nowhere)
- The TMS380 token-ring LAN processor for IBM’s networking architecture
Of these, the TMS320 DSP family became the rock star prong in the strategy. As Rhines said in an interview, “…it teaches a lesson: desperation is the mother of innovation.” After a couple of gestational years, TI rolled out the first TMS320 DSPs in April, 1982. However, just building the chip was not sufficient for a new technology like this. TI evangelized DSP and supported its new DSPs with software development tools and training for years before seeing significant success with the parts. According to Rhines, it took another five or six years before TI started to see some real revenue from the products.
TI Wasn’t the First
However, TI’s DSP chips were certainly not the first in the market. Intel had sprinted to an early lead by introducing the ill-fated 2920 Analog Signal Processor in 1979, but another of the company’s products, the 16-bit 8086 microprocessor, caught fire when its little brother with the 8-bit external data bus – the 8088 microprocessor – became the heart of the IBM PC. The Intel 2920 sank from sight, quite possibly because Intel’s full attention was being drawn to the general-purpose microprocessor markets.
TI was only one of several semiconductor companies preparing to enter the DSP arena in the early 1980s. According to Will Strauss, President of Forward Concepts and a DSP analyst for many decades, the first “true” single-chip DSPs with hardware multiplier/accumulators to be announced were the AT&T DSP-1 – developed by Bell Labs and first sampled within AT&T in May, 1979 – and the NEC µPD7720, which was announced at the IEEE Solid State Circuits Conference in February 1980. AT&T incorporated the DSP-1 into its groundbreaking 5ESS electronic switching system for its telephone network. AT&T then continued to evolve the device for a few generations, which included the DSP16 and the DSP32 (the first floating-point DSP chip). However, the AT&T DSP-1 and its successors remained captive within the Bell System, never to become commercially available to other systems companies.
The NEC µPD7720 had a 16×16-bit multiplier and two 16-bit accumulators, so it was a true single-chip DSP. Although NEC announced the device in early 1980, it didn’t become commercially available along with the required development tools until 1981. Strauss notes that the NEC µPD7720 found its greatest success in Japan, as happens with so many programmable ICs from Japan, and it was also popular in Europe.
Motorola Semiconductor became another early contender in the battle for DSP chip dominance during the 1980s, starting with the DSP56000 processor introduced in 1986. The Motorola DSP56000 had a 24-bit hardware multiplier and two 48-bit accumulators that could be extended by another 8 bits using a pair of extension registers. This large data-word capability gave the Motorola DSP56000 the ability to handle high-precision audio, so the Motorola DSP56000 quickly became popular with developers of high-end audio systems.
Duking It Out In The 1990s
The major participants in the DSP arena battled for dominance during the 1980s and 1990s. They produced multiple generations of increasingly powerful devices with multiple hardware multipliers, floating-point hardware multipliers, and larger amounts of on-chip memory. By the late 1990s, TI, Motorola, and Philips had developed DSP monster processors with VLIW architectures, multiple multiplier/accumulators, and additional function units for special operations such as bit swizzling.
Development of bigger and more powerful standalone DSP chips came to an abrupt halt when a competing chip technology veered out of nowhere and blindsided the DSP vendors. Just as the Chicxulub asteroid wiped out the dinosaurs 66 million years ago and left a thin layer of iridium in the rock strata as a calling card, FPGAs crashed the single-chip DSP party at the turn of the millennium.
The combination of one fundamental principle of DSP and some history explain how and why FPGAs quickly wiped out single-chip DSPs as a vibrant processor category. First, the principle: DSP is all math and DSP performance relies on the ability to perform a ton of multiply/accumulate operations (MACs) very quickly. That’s why the latest single-chip DSPs featured multiple hardware multiplier/accumulator units and additional function units to route non-MAC operations away from the multiplier/accumulators. The more MAC units a device has, the faster it can perform DSP operations because most DSP algorithms contain a lot of inherent parallelism that multiple MAC units can exploit.
Now, the history: FPGAs first appeared on the scene in 1984 when Xilinx introduced the XC2064. That first FPGA was little more than a bunch of very slow gates (actually programmable logic blocks based on lookup tables) surrounded by a lot of programmable interconnect. This early architectural design allowed the FPGA to gobble up many TTL chips’ worth of logic on a board design. But the earliest FPGAs were pretty slow; they didn’t threaten the processors of the day and certainly didn’t impinge on DSP territory. Not at first, anyway.
The FPGA Age of Expansion
As intended from the start, FPGAs rode Moore’s Law, and FPGAs grew from the paltry 64 logic blocks in the original Xilinx XC2064 FPGA to tens of thousands of logic blocks by the year 2000. In an article published in Proceedings of the IEEE titled “Three Ages of FPGAs: A Retrospective on the First Thirty Years of FPGA Technology,” former Xilinx Fellow Steve Trimberger called the period of FPGA growth during the 1990s the “Age of Expansion.” During this era, FPGAs rode the Moore’s Law curve and grew larger and larger by incorporating more and more programmable logic blocks. However, when compared to ASICs, the circuits built with programmable logic within an FPGA are relatively slow, they’re inefficient with respect to silicon usage, and they’re more expensive. So MACs built with programmable logic are relatively slow and costly.
Later, during Trimberger’s “Age of Accumulation” – when FPGAs added hardened MAC blocks – FPGAs suddenly became serious DSP competitors. And FPGAs didn’t add just one or two hardware multipliers; enabled by the largesse of Moore’s Law, they added dozens of them.
The first FPGA device family to incorporate fast hardware multipliers was the Xilinx Virtex-II FPGA family. In July, 2001, Xilinx announced that it had already shipped a million dollar’s worth of Virtex-II XC2V6000 FPGAs, each with 144 hardened, on-chip 18×18-bit multipliers. So the first FPGA to incorporate hardware multipliers could already outperform every single-chip DSP that existed at the time, and likely every single-chip DSP that ever will exist.
Altera followed Xilinx and announced its first generation of Stratix FPGAs with 36×36-bit hardware multipliers in 2002. The hardware multipliers in the Stratix FPGAs were fractionable as 18×18-bit or 9×9-bit multipliers to permit even more MAC operations, albeit at lower bit resolution. In the first few years of this millennium, Xilinx and Altera FPGA families far outdistanced single-chip DSPs in the number of simultaneous MAC operations they could perform.
Today’s FPGAs Have MACS a’Plenty
Today, some of the smallest FPGAs from Intel (which bought Altera in 2015) and Xilinx deliver plenty of hardware multipliers. Members of the older but still-available Intel Cyclone IV FPGA family incorporate 80 to 532 18×18-bit embedded multipliers. Similarly, the older Xilinx Spartan 6 FPGA family includes devices with 8 to 180 DSP48A1 slices, while members of the newer Xilinx Artix FPGA family incorporate as many as 740 DSP48E1 slices. Each DSP48A1 slice contains an 18×18-bit multiplier and a 48-bit accumulator, while each DSP48E1 slice contains a 25×18-bit multiplier and a 48-bit accumulator. The number of bits in the DSP48-slice multipliers seems to, er, multiply over time.
The largest FPGAs from Intel and Xilinx feature thousands of DSP blocks and are capable of delivering three orders of magnitude more MACs/second than the fastest DSP chips. For example, members of the largest Intel Stratix 10 TX FPGA family are available with 5760 variable-precision DSP blocks, each containing two 18×19-bit hardware multipliers that can be configured as one 27×27-bit multiplier. That’s as many as 11,520 hardware multipliers on one big chip. The largest Xilinx Virtex UltraScale Plus FPGAs incorporate 12,288 DSP48E2 slices, each containing a 27×18-bit multiplier and a 48-bit accumulator.
Note that Intel and Xilinx are not the only FPGA vendors cramming hardware multipliers into their FPGAs. You can get FPGAs from Achronix, Lattice, and Microchip with various amounts of DSP hardware – MACs – built into the devices. For example, the recently announced Lattice CertusPRO-NX FPGA is available in two sizes, with 96 or 156 on-chip 18×18-bit multipliers. (See “Lattice Launches CertusPro-NX.”)
If you still want to write DSP code and run it on a single-chip DSP, you can. NXP, which bought Motorola Semiconductor, offers the DSP56300, DSP56700, and MSC8000 DSP families. These are the latest – and quite possibly the last – single-chip descendants of the Motorola DSP lines. In addition, you can still purchase members of the TI TMS320 FPGA families off the shelf. Meanwhile, hardware multipliers have become quite common in the design of general-purpose processors, where you can find monster 512-bit SIMD vector units fully capable of delivering respectable DSP performance, and even in microcontrollers, so that you can more easily incorporate DSP into even the smallest embedded designs. For all of this, give thanks to Moore’s Law.
However, there’s simply no comparison at this point. If your high-performance DSP application requires lots of fast MAC operations, FPGAs with their hundreds or thousands of fast hardware multipliers are uniquely qualified for the job.
How about you? How do you DSP? Why not leave a comment below?