Complexitango

For years now, FPGA companies have been proclaiming that you can use their devices to create a “System on a Chip.” We’ve seen “SoCs,” “Programmable System-on-Chip.” “SoPC,” “Platform FPGA,” and numerous other marketing-oriented, pseudo-jargonic phraseologies.

Supposing that’s true, and we want to put a “system” onto a chip. What exactly is a “system”?

Wikipedia tells us a “system” is a “set of interacting or interdependent entities, real or abstract, forming an integrated whole.” An “embedded system” is widely accepted as a computer that is integrated or embedded into some other device. For our purposes today, though, I’d like to go with the definition offered by a former colleague of mine:

“A ‘System’ is the thing one level of complexity higher than what you can understand.”

Most of the electronic systems that we all design certainly fit that definition today. While we can usually get a good grasp on the basic interactions of all the major components, as we dive deeper, we will inevitably bump into a layer that we have to accept on faith. We have a pretty good idea of the external behavior of these components, but we could not create them ourselves. Moving up the tree of abstraction, we usually hit a different sort of wall. While we may understand the function of our device at a unit level, when we put that device in the context of a much larger system, we may not be able to predict or fully account for all its behaviors.

In the software part of our products, this effect is even more pronounced. Software is pure, distilled, refined “essence of complexity.” While the complexity of the hardware portions of our systems is somewhat bounded by Moore’s Law, the complexity of a software system can be seemingly infinite and practically incomprehensible. We build our software applications upon layers and layers of encapsulated complexity including middleware, operating system calls, runtime libraries, compilers, and so forth. Attempting to maintain awareness of the inner workings of all those layers at once would be a futile task indeed.

When people find themselves in the middle of this sea of overwhelming complexity, they start exhibiting behaviors similar to religious and superstitious practices. We don’t understand the actual cause-and-effect mechanisms, so we assign arbitrary ones ourselves based on cursory observations…

“One time, I used the –lr option and my intermittent bug seemed to go away, so I always use –lr now. I don’t know what it does, though.”

Over time, these superstitious behaviors can ingrain themselves into the policies and procedures of an entire organization. Often, the original reasons for a particular design practice have been long forgotten, but the practice dogmatically plods ahead. Many engineering teams are laden with legacy lore and burdened by word-of-mouth and anecdotal wisdom. Often, nobody on the current-generation engineering team was even present when this litany of bogus best practices began. In the same manner as inane grade-school playground games, they are cycled down from generation to generation almost in suspended animation, never seeing the light of outside reason or sanity checking.

I was once managing a software development team of about 10 engineers, and one of the engineers came into my office quite frustrated. “There’s going to be a delay on the register retiming part of the project. I just can’t get it to work right with this stupid error handling mechanism they want us to use.”

I was a bit puzzled by the comment, so I asked further. “What error handling mechanism is that?”

The engineer went on to describe a pretty badly designed error handler, which sounded nothing like what the rest of the team was doing, and I felt compelled to inquire a bit further.

“Who said you had to use that?”

The engineer was immediately frustrated and agitated. “It’s the same one we’ve always used. I don’t know why they want us to use it, it’s outdated and a mess, and it slows down development a lot!”

I had been the manager of this engineering team since its inception. I had never mandated any particular error-handling method, as I assumed the team would arrive at some convention on their own, agree on it, and deploy it. This particular engineer, however, had apparently seen a snippet of code we had borrowed from some other team, and that team had used this poorly designed error handler. Subconsciously, he had decided that this must be company or team policy, and he had proceeded to implement it in every module he had written. Now, he was angry at “them” for “mandating” that we use this procedure.

It was difficult to convince this engineer that he was marching to phantom orders. Furthermore, he had infected several other members of the team with the same mythology, and about a third of the team was now unproductive and angry at “them.”

Several years later, I ran into the same syndrome, only with a twist. Early in the design and specification phase, we were choosing a graphics API for our project. One engineer made a particularly compelling pitch for a library from one company, and we decided to go with that recommendation. About 18 months later, that same engineer was behind schedule on a part of the project, and came into a staff meeting complaining about the terrible, buggy graphics API that “they” were making us use. “This one is just outdated technology,” he ranted. “If they’d let us use something more modern, we’d be done by now.” He had completely (and conveniently) forgotten that he was the one who had practically insisted we use this API in the first place.

When we are swimming in a sea of complexity, our left brain sometimes lets our right brain take over. We act and react with emotion instead of engineering reason. We can’t understand why a particular part of our system isn’t working as expected, and we fall back on primitive and defensive behaviors. We vilify an imaginary “them” that is apparently behind the scenes pulling strings and making our engineering life more difficult. We assign arbitrary and capricious behaviors and personalities to pieces of technology that we use without comprehension, and we also fall back on one of the engineer’s staple defense mechanisms – NIH. “Not Invented Here” syndrome.

We all understand intellectually that our productivity goes up exponentially when we design at a higher level of abstraction. Grab a USB block, a piece of processor IP, some pre-designed peripherals, and a base of embedded software IP, stick them together in a few minutes and you’ve got yourself a working embedded system running on an FPGA. Of course, you didn’t really “design” any of it. Most of your mad engineering skills sat on the sidelines while you grappled with the vagaries of the drag-and-drop interface in the “platform creation” tool supplied by your FPGA vendor.

Once the time comes to move away from the demo design, however, and customize the system to do exactly what we need, NIH can rear its ugly head. If you designed the USB interface at your last company, you may find yourself yearning to re-design this one instead of using the vendor’s IP. After all, they couldn’t possibly be giving away one as robust as yours. At the same time, you’re protected from having to understand where the real bugs are coming from – in a part of the system that you don’t yet understand.

Your emotions take over the design process and you spend a couple of months re-engineering something that was already working. It would be a rare product that differentiated itself by the fabulous engineering elegance of its USB core. Nonetheless, countless subsystems are needlessly re-engineered every day just based on emotional reactions of engineers lost in the complexity quagmire.

In my experience managing software teams, the danger phrase was always “spaghetti code.” Any time an engineer would come into my office and begin the conversation with a reference to pasta-like properties of a piece of software, I knew that NIH was about to kick in, and the engineer in question was about to volunteer to re-write a perfectly functional piece of software, resetting the “bugs discovered” counter to zero once again.

As our systems become more converged and more complex, the possibility of understanding them in their entirety grows ever more remote. We all have to fight that basic engineering instinct that wants us to control and understand every aspect of our design and replace it with a more mature mentality where we trust, verify, and prioritize our understanding so that we deliver the best product we can and make the best use of our own experience and expertise.

Complexitango

Related

Leave a Reply Cancel reply

featured video

Larsen & Toubro Builds Data Centers with Effective Cooling Using Cadence Reality DC Design

featured chalk talk