November 5, 2013

The Hard Ceiling

Power Plays the Death Card

by Kevin Morris

Moore’s Law is a maddening mistress. As our engineering community has collectively held the tail of this comet for the past forty-seven years, we’ve desperately struggled to divine its limits. Where and why will it all end? Will lithography run out of gas, brining the exponential curve of semiconductor progress to a halt? Will packaging and IO constraints become so tight that more transistors would make no difference? Or, will economics bring the whole house of cards crashing down – putting us in a situation where there is just no profit in pushing the process envelope?

These are still questions that keep many of us employed – predicting, prognosticating, and pontificating from our virtual pedestals – trying to read the technological tea leaves and triangulate a trend line that will serve up that special insight we seek. We want to know the form of the destructor. When the exponential constants of almost fifty years make a tectonic shift and our career-long assumptions change forever, we’d appreciate some forewarning. We want to look the end of an era in the eye.

I am now ready to make a call.

When the grim reaper rides in and swipes his sickle through the silky fabric of fifty years of unbridled technological progress – the likes of which has never been experienced in the history of the human race – the head beneath the hood will be the flow of coulombs through the catacombs.

It is power that will be our ultimate undoing.

As transistors get ever smaller, we can constantly cram more of them into the same physical space. However, for the most part, those physical spaces are not changing. The size of a smartphone has varied slightly over the years, but it isn’t likely to change significantly in the foreseeable future. That, in turn, means that the size of a battery isn’t likely to expand dramatically, and we all know that battery density is not on any progress curve that even remotely resembles Moore’s Law. Therefore, the amount of power available for smartphone-related activities will remain fairly flat. We can shove in as many logic gates as we like, but they’ll have to divvy up the same amount of power they use today.

On the other end of the scale, server farms are packed into buildings at the maximum density for which the local utilities can supply the enormous amounts of juice required to feed the ravenous appetites of tens of thousands of von Neumann machines whirring away at multi-gigahertz speeds, along with the massive amounts of memory needed to deal with the data explosion produced by those prolific processors. Then, another good part of the power budget is used to run giant air-conditioning systems that remove the excess heat produced by all that gear. The bottom line? No matter how many transistors we can stuff into the building, they’ll still be competing for the same size power mains coming in from the electric company.

In fact, at just about every scale you can name, the hard ceiling at which our design exploration space ends is the total power budget. Whatever we design to make our machines do more work cannot increase the power consumed. Therefore, as system engineers, we are no longer in the performance business.

We are in the efficiency business.

With planar CMOS processes, each passing technology node has made us more efficient with dynamic power consumption (the power each transistor uses doing actual work) primarily because of lower supply voltage requirements. However, with each of those improvements has come an increase in the amount of leakage current that flows through each transistor when it is not operating. Thus, over time, leakage current has become the boogeyman, while active current has tracked Moore’s Law improvements.

FinFET (or, in Intel parlance, “Tri-Gate”) technology has pushed our power problems back a bit – delivering the most toggles for the fewest watts we’ve ever seen. However, even with that impressive discontinuity – computational power efficiency is not improving as fast as semiconductor transistor density. The upshot of it all? We can add as many transistors to our design as we want, we’re just not going to have the power to use them all. Think of the world’s population of transistors more than doubling every two years, but the supply of power to feed them remaining fixed. Eventually, we’re going to have a time of famine.

In a way, the end of Moore’s Law could be refreshing. It might even re-invigorate the art of engineering itself. The tidal force of exponential progress over the decades has been so great that it overwhelmed most clever or subtle advances. After all, what’s the point of spending a couple of years with a novel idea that will gain your team 20-30% when a process node leap in the same time frame is going to double your competition’s performance? Moore’s Law is a “go big or go home” proposition, where the giant companies with the resources to pour tens to hundreds of millions of dollars into bringing their design to the next process node will always win out over more clever but smaller efforts that can’t stay on the bleeding edge of semiconductor technology. If Moore’s Law is no longer the trump card, the playing field for innovation could be leveled once again, and we could see a renaissance of invention replace the current grind of spend, shrink, and pray.

The first target for this post-apocalyptic revolution should be our computing architecture itself. von Neumann machines are very efficient in their use of gates, but very inefficient in their use of power for each computation. When gates were precious and power was cheap, that was a great tradeoff. However, with gates now being almost free, and power being the scarce resource, the venerable von Neumann may have run his course.

Vanquishing von Neumann, however, will not be an easy undertaking. A half-century of software sits squarely on top of the sequential assumptions imposed by that architecture. While the computationally power-efficient future clearly is based on parallel processing and datapath structures, the means to move existing software to such platforms has been elusive at best. Taking something like plain-old legacy C code and efficiently targeting a heterogeneous computing machine with multiple processors and some kind of programmable logic fabric is the job of some fantastic futuristic compiler – the likes of which we have only begun to imagine. This non-existent compiler is, however, the key to unlocking a quantum leap in computing efficiency – allowing us to take advantage of a much larger number of transistors and therefore to create an exponentially-faster computer without hitting the aforementioned power ceiling.

The engineering for this new era will most likely not be in the semiconductor realm. While there are certainly formidable dragons to slay – double and triple patterning, EUV, nanotubes and more – software and compiler technology may take an increasing share of the spotlight in the next era. It will be fascinating to watch.

11 thoughts on “The Hard Ceiling”

kevin says:

November 5, 2013 at 11:09 am

When do you think Moore’s Law will end? Will power play a key role in its demise?

Log in to Reply
Dwyland says:

November 5, 2013 at 12:01 pm

It already has.
“Isn’t it true to say that Moore’s Law is still working?”
“No, but it is accurate.”

Moore’s law went into the ICU in 2004, right about the time of the shift from 130 to 90 nm. As the CTO of IBM was said to say, “Somewhere between 130 and 90 nm, we lost scaling.” For 40+ years, each shrink cut the die cost in half, improved the speed by 30% and cut the power by 30%. Remember 50 MHz 486’s? In 2004, I had a 3.5 GHz Pentium 4 system. There has never been a 4 GHz Pentium. If it were possible, it would be an easy sell.

Now each shrink cuts the die cost by half but *decreases* the speed and *increases* the power. All the magic (halfnium oxide, strained silicon …) goes to trying to keep the speed up and power down. But they are all 1-trick ponies: they can generally be back propagated to earlier nodes and improve them, too. And they do not help you with the next node, that needs more magic.

Why? It’s not the silicon; it’s the aluminum (or copper). Up to 90 nm, you do a 2D shrink – like projecting a smaller picture by adjusting the lens. At 90 nm and down, you have to do a 3D shrink on the interconnect metal. The traces have become vary tall and skinny. If you do not shrink the height, they tip over and short into the next trace(s). And this means higher resistance for lower speed and higher power. Some say that at 130 nnm, 75% of the speed and 75% of the power is in the metal, not the silicon. Shrinking further must makes this worse.

It was a great run. I thoroughly enjoyed it.

Log in to Reply
Reiner says:

November 6, 2013 at 1:36 am

Dear Kevin,

congratulations for your excellent comment “The Hard Ceiling”. This also supports my very strong critical statement athttp://hartenstein.de/EIS2/#mistake

Also the Reconfigurable Computing Paradox (seehttp://xputer.de/RCpx/ ) proves that you are right, and, that already going von Neumann was biggest mistake in the history of computing.

The first electrical computer ready for mass production (at 1884) was data-stream based like running Reconfigurable Computing resources. Seehttp://hartenstein.de/EIS2/

The massive inefficiency was introduced by the paradigm shift from data-stream-based over to instruction stream based. Read about “Nathan’s Law of Software”(“software is a gas”) and critics by many more celebrities.

Best regards,
Reiner

Log in to Reply
CharlieM says:

November 8, 2013 at 9:54 pm

Kevin Morris
EE Journal
9 Nov. 2013

Dear Kevin,

There is an alternative to computation and Moore’s Law limits: Please read on:

In the academic world, the ideal steps for scientific and technical progress are: theory, invention, and practice.
In the world of industry, the problems we face tend to be solved in a different order: invention, practice, and theory. Just so, in my case with Natural Machine Logic:

First, there were no temporal logic elements, nor theory at the same level as simple Boolean logic. There was a need, so I invented a couple of temporal logic elements to perform specific control tasks in the time domain. They were used in practical applications (practice). After ensuring the workability of the new logic elements, I constructed a theory to explain and rationalize them against the background of computation, which I found to be “spatial,” or static (Boolean) logic in frames, taken at rates of (now up to) billions of operations per second—but static transforms nevertheless. My new logic elements, on the other hand, performed dynamic transforms, which recognize change.

Professionals usually hold computation in high esteem and maintain a defense against change that is difficult to penetrate. I realize that such barriers serve to maintain the integrity, stability, and strength of our various sciences, but after 60 and more years of the love-hate relationship with software, yet its problems persist, it is time for a change. (It is an odd twist, but in my view it is exactly the acceptance and management of change—via temporal logic in the time domain—that is the next step in the progress of control science.) It will not be easy to effect that change in thinking after the long history of following the static logic ideal of ancient philosophers and logicians.

Background
In the early 1970s, as an automation engineer, I solved a couple of very fundamental problems in machine control. The solutions entailed the invention of temporal logic elements. One was the real-time (as the events happened) determination of the temporal order of first occurrence of a pair of independent signals, for which I was granted a patent. The other was the determination of failure in a pair of switches important to a process controller, whether FO (failed open) or FC (failed closed).

After some time spent playing with these new temporal logic concepts, I had identified a number of temporal operators and their functions and devised corresponding hardware logic elements or architectural arrangements as means of implementation. The new temporal operators together with the two existing spatial operators (AND & NOT) made a cohesive set for describing dynamic processes. The combined set of operators provided 56 simple functions of the relationships among two events or conditions in the space, time, and (joint) space-time domains, where before existed only the 16 Boolean functions (among two operands) in the space-domain and STORE (memory operator and time-to-space translator). The new logic thus created is much more expressive in describing discrete process control than the Boolean-sequential logic that is the basis of computation.

Along with the operators and logic elements, I devised an algebraic method for concisely specifying discrete dynamic processes and a methodology for creating process-control circuits and system architecture that can be derived directly from the process specification. Together, these tools and practice, now called Natural Machine Logic (NML), provide real-time, parallel-concurrent hardware alternatives to the shared resource, linear-sequential hardware-software mix in present use. This master system of logic, NML, doesn’t replace the existing Turing paradigm, but retains its fundamental operators as a part of a greater, more complete and comprehensive whole.

By 1986, I had produced a manual of operation for the system, then called Natural Logic (NL). The manual was useful during an invitation to Arthur D. Little Enterprises in Cambridge, MA, who were mildly interested in NL, but did not realize its great potential (nor did I, at that time). Since that time, I have learned where (in the spectrum of control systems), the usefulness of this new technology could best be applied. As they say, persistence is important, but it is just one of the necessary factors for successfully instituting a marked change (milestone?) in applied science or technology.

Computation isn’t going away soon, nor should it. It has been and continues to be apt for symbol management (data-processing). Supervision of live processes, on the other hand, is a different story. (Boolean logic and computers and software are suitable only for the space-domain, therefore can’t deal with temporal issues very well.) Automation, at its most useful, monitors and controls a physical process. It is mechanical decision-making, assessing and acting on such questions as: does the process need an adjustment for optimum function, should a maintenance person be alerted, or should the process be halted to ensure the safety of personnel or equipment?

Beyond Computation
In order to progress beyond software, we must move beyond computation, which was theorized by Alan Turing to manage symbols and strings of symbols for the purposes of encrypting and decrypting encoded messages. Computation enables dead, lifeless data to have a pseudo-life in which those data can be the effect of causes and can also be made to cause things to happen. Process control difficulties stem from using linear-sequential computation in which all temporal issues must be first translated to the space-domain, then solved with spatial logic (Boolean).

Software composers, living in natural space and time, create control systems for some processes that also ‘live’ in natural space and time—but they must build those systems in the artificially flat and lifeless storage spaces of memory and by means of timeless logic that operates to the drumbeat of step-by-step instructions. The double-think (normal space and time vs. computational space and “time”) required of programmers is error-prone and the process is highly inefficient. As a result, as much time is spent fixing software as on creating it in the first place, and the end user risks encountering bugs that, despite great effort, are left embedded in the product. Most of the problem issues are caused by the inability (of space-domain logic) to deal effectively with time-domain issues.

At present, the same Turing-type machine (TM) that is so suitable for doing the common work of data-processing has been applied toward management of physical processes. But linear-sequential data-processing is not an appropriate tool for live process management, which is best performed in an asynchronous, parallel-concurrent way, treating process faults and anomalies in immediate fashion as they arise, rather than on a fixed schedule. The data-processing solution, when applied to process management, becomes complex and always occurs after-the-fact, due to polling or sampling and instruction-fetch and -execution (time-sharing and software) delays. Using data-processing for process management is akin to making the lowest-rated factory assembly worker (who must be guided by explicit instructions at each step) suspend his/her labor and act as manager (via explicit instructions at each step) when supervisory duties are required. It is time we had a physical process management method that is true real time, parallel-concurrent, safe and immediate, instead of a method that suffers all of the Impediments of Computation, which add up to a high cost of ownership.

Physical processes, whether natural or man-made, are self-motivated to a greater or lesser degree. They have ‘lives’ naturally, thus do not have to be artificially animated by computation. Yet billions of microprocessors are being used in applications in which events and conditions from live processes must first be converted to lifeless data so they can be managed via computation. It is such a waste of human and other resources.

Remedy
There is a remedy: Natural Machine Logic (NML). For physical process control applications, NML is much better than computation for performing safety-, time-, or mission-critical management tasks. NML can perform high-level managerial duties better because it is a simpler overseer and control method than is computing, and is a parallel-concurrent means of describing dynamic processes and implementing their supervisory control functions, although its language is very familiar to the average person. NML can better monitor and control data-processing than can data-processing be used to control itself because it has a much better command of the time-domain.

The ideal placement for NML would be creating IP (new hardware designs) in integrated circuit design houses for the internal management and verification of correct microprocessor and computer operation and for the management of microprocessor-less small systems.

The goals and direction of most logic devices programs trend toward smaller, faster, and denser electronics to support ever-more complex computational systems for data-processing. The technology I have devised can leverage those advances in electronics toward simple, direct control of the applications that can not now be served economically due to the many impediments of computation. NML can unburden the microprocessor from ill-suited temporal process management to gain efficiency and safety in both data processing and process management tasks.

I have asked the question, “After software, what’s next?” See the ensuing discussion on the forum at http://www.control.com/thread/1327707041……

I need help to confirm (and perhaps formalize), package, and market NML and its products and I am willing to share its benefits. Would you be interested in partnering with me in this endeavor? If not, could you recommend a person, organization, or institution who may be interested in this opportunity?

Best regards,

Charles R. Moeller, Senior Engineer (electromechanical systems)
c.moeller@ieee.org
cmoel888@aol.com

Log in to Reply
kevin says:

November 12, 2013 at 11:43 am

Charles,

Several thousand people will read this article and the comments – let’s see what they have to say. Well, folks – thoughts?

Log in to Reply
CharlieM says:

November 21, 2013 at 5:29 pm

Kevin,

Can I assume the verbal picture I drew of the current situation is close to the truth, if not spot-on? Perhaps no one else thought of it in quite the same way as I did, nor the possibility of a remedy.

The next steps are:

1. For small systems, NML is the simpler and safer alternative to microprocessors and software. Design & build straight-forward NML controller chips for those fuel injectors, AC units, toasters, irons, etc. that aren’t required to dial up your aunt Jane, or to be programmed while you’re on vacation in Yellowstone or the Smoky Mountains.

2. Reduce complications and faults in microprocessor-based systems with real-time NML internally managing and verifying their functions.

Best regards,
CharlieM

Log in to Reply
CharlieM says:

November 25, 2013 at 3:51 pm

Kevin,

I claim that digital technology has not gone far enough because its basis in logic fails to accommodate time in its own dimension.

In process management, having to deal with temporal effects is difficult when the computational tools available can only manage items in space (via the Turing paradigm). The obligatory translation of all things temporal to the space-domain before processing (and back to the time-domain for output) sure does make it tough for software engineers.

It is hard to believe that of your many thousands of readers, there is no one able to either: a) question my thesis and put up substantive arguments, or b) agree with me and perhaps share some anecdotal experience as further evidence. It seems that too many are willing to accept the status quo and believe that the “right” software will cure all the ills of the present technology. (That hasn’t happened in over 60 years of trying, or in the 40+ years since Edsger Dijkstra raised the alarm.)

Best regards,
CharlieM

Log in to Reply
kevin says:

November 26, 2013 at 8:42 am

Charlie,

I have to confess I am not able to understand what you’re proposing well enough to either agree or disagree. For starters, I don’t know what problem you are solving. From my perspective, the current solutions are doing a pretty spectacular job and improving at an exponential rate. If Moore’s Law does indeed end, it doesn’t mean our current solutions stop working – only that exponential progress ceases.

For me, at least, microcontrollers and the ecosystem surrounding them do a pretty nice job of managing process control in most situations. Of course, there are the human-induced issues like Toyota’s spectacular failure, but I believe humans will make mistakes in any paradigm we establish.

In my three-decade career, I’ve run down many a rabbit hole chasing alternatives to sequential programming. The arguments are generally the same – the world is asynchronous and parallel, and trying to map it to a clocked, sequential model is obviously a fundamentally flawed approach. However, no matter how many alternative programming models we offer, the masses still seem to understand how do describe what they want to happen in sequential terms. Perhaps once we learn to read left-to-right and top-to-bottom, we are doomed – brains programmed for a sequential existence in a parallel world.

Log in to Reply
CharlieM says:

November 30, 2013 at 6:56 pm

Kevin,

Computation can only manage and perform the acquisition, movement, and transformation of symbols in the space-domain. Such limitations hold for the theoretical Turing Machine (TM) and all computers, microprocessors and microcontrollers, as they, each and every one, are the same kind of machine. These operations, narrowly restricted as they are, nevertheless fulfilled Turing’s goal in theorizing a method of automating the symbol manipulations that humans performed while engaged in decrypting substitution-encoded messages. TMs have the ability to convert any string of symbols into any other, given appropriate instructions. The end products of the symbol manipulations (decoding) still had to be examined by humans for sense and meaning.

The tasks performed in modern times by computational devices are no longer confined to translation or transformation of one set of static symbols to another, yet at bottom, that is all these machines are able to do. The designers of computational devices, operating systems, application software, and process control systems, therefore, must determine how to make symbol manipulation act like a controller that: a) keeps the process on track, and b) prevents the process from harming personnel or equipment. Bridging the gap between symbol twiddling and physical process control is inefficient, it is a lot of work, it is subject to much error, and is the primary reason for the comparatively large amount of hardware and software necessary to manage and enact a given set of tasks, when performed exclusively via computation.

There is clearly a need for an alternative method of digital or discrete process control. It should be one that can work compatibly in the same environment as computation, but have a non-computational basis. The main facilities and characteristics of computation produce a number of problems, limitations, or impediments that are detrimental to physical process control, but which are solved (or bypassed) by Natural Machine Logic (NML).
A list follows:

1. First and foremost, Boolean logic and software exist and operate exclusively in the space-domain (Turing’s endless tape; now virtually endless solid-state memories). Complexities arise when these computational resources must deal with temporal issues and characteristics. For instance: in order to use computational devices, all things temporal must first be translated to the space-domain (as static values) along with time-stamps or other numbers so they can be operated upon by static Boolean AND & NOT (and their combinations) and those operators performing arithmetic functions.

Physical processes (e.g., punch-press operations) are expressly dependent upon the order of events and the chain of cause and effect, yet there is no fundamental time-domain logic, event logic, or cause-effect logic available to the process control community. Such temporal attributes instead must be inferred from results given by static logic operations. It would be much simpler, and more fundamentally correct, if time-domain characteristics and issues could be recognized and solved directly in the time domain instead of having to translate them into the space-domain first, then obtain solutions by static logic means, then finally translate back to the time-domain for output. To remedy these shortcomings of conventional methods, NML provides two systems of real-time temporal logic that fulfill those time-domain needs and functions. The NML temporal operations, which can make sense out of dynamic occurrences *as they happen* and their corresponding hardware logic elements are also compatible to work alongside the existing spatial Boolean and Boolean-sequential logic operators and elements.

2. The number of descriptive words (fundamental operators) …

To be continued …

Best regards,
Charles Moeller

Log in to Reply
CharlieM says:

December 1, 2013 at 2:13 pm

Kevin,

Responding to your note of 11-26-13:
It is not human brains that have a problem with parallel-concurrent input. We do it all the time. Even while reading a linearly laid out book, the presence of themes, acts, and environments are held in mind while taking new input from each line read. The instant activity is overlaid (in parallel, if you will) upon the scenes and activities already established (as read). It is linear-sequential machines that can’t accommodate parallel-concurrent input (of course). L-S machines can only hold one operand or two and one operator “in mind” at a time. If we insist and persist in relying upon the Turing paradigm for ALL automated activity, we will never achieve a truly parallel-concurrent goal.

In contrast, NML is a discrete parallel-concurrent language and electronic modeling tool that can specify, implement, and (let the hardware) act in a parallel-concurrent manner. This mode especially benefits time-, safety-, and mission-critical tasks, such as *management* of linear-sequential tools. The L-S mode by the way is just perfect for data processing, but not so good at critical real-time tasks. In my plan for the future, we will let the L-S parts do the data-processing and let the NML parts do the real-time management.

Best regards,
CharlieM

Log in to Reply
CharlieM says:

December 17, 2013 at 9:28 am

2. The number of descriptive words (fundamental operators) are few in the current technology. There are no verbs, dynamic operators, or temporal logic in the fundamental computer logic that has now completely pervaded our daily lives.
There are only two primitive logic operations that are necessary and sufficient, in combination, to perform the 16 possible static Boolean operations between two operands: AND, a conjunction, and NOT, the operator of negation, an adverb. These can be used or combined to perform logical AND, NOT, NAND, OR, NOR, XOR, XNOR, as well as equate to certainty (1), and null (0). The set can also perform binary arithmetic. All of these operations are conjunctive, or coincident, in both space and time. “A AND B” is true if both are present at one and the same time. When performed by physical logic elements, the operations are considered to be executed in a null-time zone, as the evaluations are ready at the next live moment (usually at the next clock pulse or instruction), which is designed to occur after any contributing settling times or gate-delays have run to completion. Boolean logic used in such a manner is static, is unobservant of change, and can be said to inhabit the space domain. The time domain is an untapped resource.

Another operation useful (and necessary) to computing is STORE, the memory operator. STORE is a transitive verb, but it is not supported by any formal logic, which is all static, not dynamic.

All higher-level computer languages (i.e., in software) are ultimately decomposable to, hence built up from, sequences and combinations of the Boolean operations and STORE. In machine language, those operations are used to determine explicitly: a) the locations from which to acquire the numerical or conditional operands, b) what Boolean operations to perform, c) where to put the results, and d) the next step in the program. Every step is, and must be, predetermined.

Being limited to the combinations and repetitions of only three words certainly puts a strain on the creative talents of software composers. Imagine writing a paper or a book while (fundamentally) limited to the variations and repetitions of only three words (AND, NOT, and STORE), combined with the four activities (a-d, above). Yet that is what programmers confront while writing process descriptions. They must devise structures in space (physical memory) that can be stepped through to achieve controller goals.

Natural Machine Logic will make life easier for control system designers: more words to work with including verbs, and temporal concepts in operators and their corresponding logic elements.

Best regards,
Charles Moeller

Log in to Reply