posted by Bryon Moyer
Multicore systems can be a b…east to verify code on, depending on how you have things constructed. Left to, say, an OS scheduler, code execution on your average computer is not deterministic because of the possibility of interruption by other programs or external interrupts. So it becomes nigh unto impossible to prove behavior for safety-critical systems.
Lesson #1 from this fact is, “Don’t do that.” Critical code for multicore must be carefully designed to guarantee provably deterministic performance. But lesson #2 is, when tools claim to analyze multicore code, you have to ask some questions to figure out exactly what that means.
Which is what I did when LDRA announced new multicore code coverage analysis. This kind of analysis invariably involves instrumentation of source code, which, by definition, exacerbates concerns about determinism. So what does this mean in LDRA’s case?
I got to spend a few minutes with one of their FAEs, Jay Thomas (yes, they were actually trusting enough – of both of us, frankly – to let an FAE talk to press) to get a better understanding of what’s going on.
First of all, the scope of the analysis is coverage – determining whether or not a particular piece of code got executed. This is conceptually done by adding a bit of code to (i.e., instrumenting) each “basic block.”
A basic block is a straight-line set of code statements without any branches. Because there are no branches, then if you enter the block, you know that every line in that block got executed. I suppose, thinking out loud here, that if you put the extra instrumented code at the start of the block, then an interrupt or an unscheduled stop might invalidate the proof; if you placed the instrumentation at the end of the basic block (in blue in the figure), then, by reaching it, you can reasonably assert that you had to have executed the prior instructions to get there.
The coverage is tracked in a scoreboard-like matrix, and so “checking off” a block involves setting a value in a position of the matrix that corresponds to the block just executed.
The challenge here is performance. A straightforward “index into a matrix” operation involves calculation of target addresses each time. This may sound trivial, but, apparently it adds up. And multicore makes it worse, not only because you might expect such new programs to be bigger, but because now you have the possibility of collisions. We’ll talk about collisions in a second, but let’s first address performance.
In order to reduce this computational overhead, LDRA implements code that pre-calculates destination addresses at compile time. I haven’t seen exactly how that works, but the effect is analogous to changing an indirect store to a direct store operation. This apparently saves lots of time during program execution.
That aside, let’s return to the collision question. There’s one big scoreboard for the entire program, not for each core. So two cores might try to write at the same time – an impossible situation for a single-core system. There’s some nuance to this, since you might think that memory controllers should hide the fact that two memory requests are made at the same time.
There are lots of ways to design a scoreboard, but for compactness, LDRA packs bits. The memory controller can manage separate words or bytes (or whatever its granularity is), but it can’t manage bit-packing. So if two cores attempt to set bits that happen to be packed into the same word, then there’s an unresolvable collision. And performance means that you don’t want one to be waiting around until the other finishes. (And I can’t imagine what the ugly performance impact would be if you naively tried to spawn separate non-blocking terminal threads for each of those writes to unblock the testing of the code…)
The way LDRA deals with such collisions is to abandon an attempt to check a bit in a word that’s already in use by some other check-off. First come, first serve. In fact, first come, only serve.
This means that, even though the instrumentation says to “check off the block,” it may not actually happen if you collide with a different core checking off a different block. For this specific instance, you could consider this a “false positive.” In other words, if you immediately used the resulting bit values to determine whether or not the block got covered, it would say that it didn’t get covered, when in fact it did – it’s just that the logging operation failed.
This is conservative behavior: critically for mission-critical software, it won’t create a false negative. Said differently, coverage tracked in such a way might be better than indicated; it won’t be worse. That’s important to know.
But still, false positives aren’t fun. No one wants to go through a list of “fails” only to find that they weren’t, in fact, fails. It takes a long time to do the analysis, and you end up with this long exception list that just feels… messy, especially when you’re trying to build confidence in the code.
There are two solutions to this issue. The first is to do nothing – literally. Embedded programs love loops, so you may fail to check off a block during one loop iteration; no problem, you’ll probably hit it the next time. For this reason, even though an individual write might indicate a false positive, by the time you’re done executing the entire program, most of those will likely have disappeared.
But there still could be some stragglers remaining. In order to deal with that, LDRA provides control over how many bits get packed into a word. If you make each word sparser, then there are fewer possible collisions. The limit is to have one word per matrix cell. At that level, the memory controller can manage the collisions, and you’re good to go. The cost, of course, is the size of the matrix.
You can find more in LDRA’s announcement.
posted by Bryon Moyer
While the Internet of Things (IoT) is full of promise, there’s one word that summarizes all that people fear about it: security.
We got to hear a bit about that at a session dedicated to the topic at the recent Internet of Things Engineering Summit co-conference at EE Live. Presented by consultant George Neville-Neil, it wasn’t about technology per se; it was about our state of mind.
Most of us believe it’s important to keep intruders out. His main takeaway: assume they will get in. Because, eventually, they will. Building sturdy walls is good and important, but planning for what happens next is also important.
What caught my ear in particular is one of the less-obvious possible consequences of not minding the store properly: a “consent decree.” I’ve heard the term in a generic sense, but it’s not obvious what the implications are if you’ve never had one (which I haven’t, which is why I asked). Apparently, if you’ve been careless with security, a consent decree allows the Federal Trade Commission (FTC) to become your overseer, getting all up in your business and stepping in when they want. Most of all, the documentation required during the term of the decree sounds particularly onerous. So… avoid this.
That aside, the following are my attempt to summarize his supporting recommendations (“attempt” because I was writing furiously to keep up):
- Shrink the “attack surface” (i.e., expose less). Meaning, drivers, daemons, features, debug access, web servers, data loggers, etc.
- Separate out “concerns.” I.e., no processes with root access or super-control; restrict access to data. Nothing gets access to anything irrelevant.
- “Defense in Depth” – rings of security. What happens when the first wall is breached?
- Provide only those features really needed. (OK, marketing will have a fun time with this. You know the drill:
- Marketing: Here are the features we need in the next release.
- Engineering: You can’t have them all; which ones do you really need?
- Marketing: We need them all. We didn’t bother asking for the nice-to-haves.
- Engineering: Well, which of these do you need least?
In other words, marketing probably already thinks they’re getting less than the really-needed features.)
- Be conservative in what data you accept and send.
- Review your code.
- Review other people’s code – especially when incorporating someone else’s code or IP. Do an internet search for the package along with words like “crash” or swear words to find red flags.
- Use “sandboxing” to provide isolation.
- Use automation to test and analyze your code. Oh, and don’t forget to look at the results.
- And, the bottom line, “Plan for Compromise.”
And sleep with one eye open. Because They’re coming, you know…
posted by Bryon Moyer
I was talking to Atmel the other day – they had announced the release of their ATPL230 power line communication (PLC) chip, which was filling in of one of the squares in the strategy that we reported on some months ago. PLC is one of the ways in which smart meters can communicate back with the utility. But when you look at Atmel’s overall communication strategy for smart energy devices, there are other options, including Zigbee, but notably not including WiFi or Bluetooth.
This may look simply like yet another battle in the wireless world, but there’s more to the story than that. First, the inclusion of Zigbee has less to do with technology than you might think. In fact, it’s partly a money story – and it almost sounds like a strategy determined by tactical dollars. As Atmel describes it, some years ago, stimulus dollars were available. Without going into the details, putting Zigbee into smart meters was a “future-proofing” step that made those stimulus funds available. Now the Department of Energy recommends (although doesn’t require, since it’s not a safety issue) Zigbee for “smart energy” home use.
But the other thing that occurred to me is that “smart energy” and “smart homes,” which would appear to be versions of the same thing, have more nuance in them as well. “Smart” tends to mean “connected,” and the smart home has lots of connected items in it. Thermostats are frequently cited as examples, but so are refrigerators and dryers and door locks.
But there are two things going on here. “Smart energy” tends to refer to energy-related devices that communicate on the utility’s network. And they do so via protocols like PLC and Zigbee. The kinds of devices that qualify as “smart energy” obviously include smart meters and other equipment dedicated to the efficient delivery of electrical energy.
But utilities also want to be able to reach into homes and factories and tinker with usage to optimize energy consumption when supplies are tight. That clearly means turning down the thermostat, but it could also mean communication with appliances that consume lots of energy, and whose use involves options, like your clothes dryer.
Would the utility actually try to reach in and prevent you from drying your clothes when energy use peaks? Perhaps not. Might a dryer manufacturer elect to include a feature that allows the utility to display current electricity pricing in an era of demand-based pricing so that you can decide whether to dry now or later? Possibly.
But there’s another aspect of the smart home, and that’s the ability to connect items to the cloud and to smartphones or computers. Yup, the Internet of Things (IoT). This is a completely separate network from the one the utilities use for smart energy. And they tend to use WiFi or Bluetooth Smart because that’s what’s in phones and computers.
So, in theory, a thermostat following the DoE-recommended approach could communicate with the utility via Zigbee and with the IoT via WiFi. The dryer could communicate via Zigbee to receive electric pricing – or it’s possible – even likely – that the utilities would also place that pricing information in the Cloud, accessed using WiFi.
According to Atmel, much of the smart-meter Zigbee capability out there now via is unused within the home. It’s clearly available for connecting outwards towards the utility, but to access nodes inside the home, the meter could also use WiFi or Bluetooth. You could even argue that it would be much more efficient to do it that way, since the smart meter would be the single transition point between the utility network and the home/IoT network. The alternative would be to require numerous devices in the house – thermostats, dryers, anything that may need to talk to the utility – to have both radios so that they can talk both to the utilities and the IoT.
All of this is, of course, still in play, so there’s no one “right way” to implement this. And yes, I keep coming back to this wireless question, not so much because I have a preferred “winner,” but because it seems to be a confusing space, and I look for those occasional refreshing moments of clarity.