posted by Bryon Moyer
We recently looked at levels of data communication in the Internet of Things (IoT) and established three levels:
- Formal communications protocol level (e.g., TCP/IP)
- Generic data level (e.g., Xively)
- Business objects
At the recent Internet of Things Engineering Summit, I talked with another company that illustrates some of how this works. They’re called Econais. (I keep seeing this as looking French, and I want to pronounce it “eh-koh-NAY” – but that’s wrong: it’s a Greek company, and it’s pronounced “ee-KOH-ness”).
Econais recently announced a new module for connecting Things to WiFi. And the focus is on making integration easy: with 20 lines of code, you can connect to a local WiFi network. Assuming your Thing doesn’t have a screen (and, like a motion detector, might even be mounted someplace inconvenient), your phone acts as the keyboard, launching Thing code that gets connection information from the access point. This is part of their ProbMe (“probe me” – named after its pinging capability) in-situ management system.
Because Econais implements standards like WiFi and TCP, with no further abstraction, it occupies the comms protocol level (i.e., the first of the three above). But they also partner with Xively, who lays over the protocol level. In fact, for a programmer, both APIs are then available: you can write at the detailed Econais level or at the more abstracted Xively level.
The overall idea here is that you can get onto the network easily with Econais, but you can then manipulate data more easily at the Xively (or whoever lies above this) level. Of course, the WiFi only goes as far as the access point; to get to the cloud, you then transition to the various other wired (or even wireless) comms protocols that make up the Internet.
Econais actually has two families of WiFi module, the 19D01, which doesn’t have an MCU in it (so presumably you attach it to your Thing that already has an MCU) and the recently-announced 19W01, which includes an MCU as well as integrated FLASH and an antenna. It’s all a bit confusing since, at the time of this writing, these distinctions aren’t clear on the website or some of the graphics. But size is an important selling factor for them: the MCU-less version is an 8-mm square module; the W01 is 14 mm x 12 mm.
And, just as I was preparing to post this, notice came in of a new Lantronix WiFi module for Arduino boards. So it slides into the same category. It is larger, at 24 mm x 16.5 mm.
Update (5/14/14): I have some more clarification on the Econais integration story.
- There's an EC32L module that has an MCU separate from the WiFi chip.
- The EC19W products integrate the MCU in with the WiFi chip, although the MCU is still available for developer programs. Some of the other hardware interfaces (A/D, GPIO, etc.) are reduced vs. the EC32L.
- Both of these include FLASH and an antenna, so they're certified by the various international organizations.
- The EC19D excludes the FLASH and antenna. It's therefore not certified (but presumably a system including it would need to be).
posted by Bryon Moyer
Multicore systems can be a b…east to verify code on, depending on how you have things constructed. Left to, say, an OS scheduler, code execution on your average computer is not deterministic because of the possibility of interruption by other programs or external interrupts. So it becomes nigh unto impossible to prove behavior for safety-critical systems.
Lesson #1 from this fact is, “Don’t do that.” Critical code for multicore must be carefully designed to guarantee provably deterministic performance. But lesson #2 is, when tools claim to analyze multicore code, you have to ask some questions to figure out exactly what that means.
Which is what I did when LDRA announced new multicore code coverage analysis. This kind of analysis invariably involves instrumentation of source code, which, by definition, exacerbates concerns about determinism. So what does this mean in LDRA’s case?
I got to spend a few minutes with one of their FAEs, Jay Thomas (yes, they were actually trusting enough – of both of us, frankly – to let an FAE talk to press) to get a better understanding of what’s going on.
First of all, the scope of the analysis is coverage – determining whether or not a particular piece of code got executed. This is conceptually done by adding a bit of code to (i.e., instrumenting) each “basic block.”
A basic block is a straight-line set of code statements without any branches. Because there are no branches, then if you enter the block, you know that every line in that block got executed. I suppose, thinking out loud here, that if you put the extra instrumented code at the start of the block, then an interrupt or an unscheduled stop might invalidate the proof; if you placed the instrumentation at the end of the basic block (in blue in the figure), then, by reaching it, you can reasonably assert that you had to have executed the prior instructions to get there.
The coverage is tracked in a scoreboard-like matrix, and so “checking off” a block involves setting a value in a position of the matrix that corresponds to the block just executed.
The challenge here is performance. A straightforward “index into a matrix” operation involves calculation of target addresses each time. This may sound trivial, but, apparently it adds up. And multicore makes it worse, not only because you might expect such new programs to be bigger, but because now you have the possibility of collisions. We’ll talk about collisions in a second, but let’s first address performance.
In order to reduce this computational overhead, LDRA implements code that pre-calculates destination addresses at compile time. I haven’t seen exactly how that works, but the effect is analogous to changing an indirect store to a direct store operation. This apparently saves lots of time during program execution.
That aside, let’s return to the collision question. There’s one big scoreboard for the entire program, not for each core. So two cores might try to write at the same time – an impossible situation for a single-core system. There’s some nuance to this, since you might think that memory controllers should hide the fact that two memory requests are made at the same time.
There are lots of ways to design a scoreboard, but for compactness, LDRA packs bits. The memory controller can manage separate words or bytes (or whatever its granularity is), but it can’t manage bit-packing. So if two cores attempt to set bits that happen to be packed into the same word, then there’s an unresolvable collision. And performance means that you don’t want one to be waiting around until the other finishes. (And I can’t imagine what the ugly performance impact would be if you naively tried to spawn separate non-blocking terminal threads for each of those writes to unblock the testing of the code…)
The way LDRA deals with such collisions is to abandon an attempt to check a bit in a word that’s already in use by some other check-off. First come, first serve. In fact, first come, only serve.
This means that, even though the instrumentation says to “check off the block,” it may not actually happen if you collide with a different core checking off a different block. For this specific instance, you could consider this a “false positive.” In other words, if you immediately used the resulting bit values to determine whether or not the block got covered, it would say that it didn’t get covered, when in fact it did – it’s just that the logging operation failed.
This is conservative behavior: critically for mission-critical software, it won’t create a false negative. Said differently, coverage tracked in such a way might be better than indicated; it won’t be worse. That’s important to know.
But still, false positives aren’t fun. No one wants to go through a list of “fails” only to find that they weren’t, in fact, fails. It takes a long time to do the analysis, and you end up with this long exception list that just feels… messy, especially when you’re trying to build confidence in the code.
There are two solutions to this issue. The first is to do nothing – literally. Embedded programs love loops, so you may fail to check off a block during one loop iteration; no problem, you’ll probably hit it the next time. For this reason, even though an individual write might indicate a false positive, by the time you’re done executing the entire program, most of those will likely have disappeared.
But there still could be some stragglers remaining. In order to deal with that, LDRA provides control over how many bits get packed into a word. If you make each word sparser, then there are fewer possible collisions. The limit is to have one word per matrix cell. At that level, the memory controller can manage the collisions, and you’re good to go. The cost, of course, is the size of the matrix.
You can find more in LDRA’s announcement.
posted by Bryon Moyer
While the Internet of Things (IoT) is full of promise, there’s one word that summarizes all that people fear about it: security.
We got to hear a bit about that at a session dedicated to the topic at the recent Internet of Things Engineering Summit co-conference at EE Live. Presented by consultant George Neville-Neil, it wasn’t about technology per se; it was about our state of mind.
Most of us believe it’s important to keep intruders out. His main takeaway: assume they will get in. Because, eventually, they will. Building sturdy walls is good and important, but planning for what happens next is also important.
What caught my ear in particular is one of the less-obvious possible consequences of not minding the store properly: a “consent decree.” I’ve heard the term in a generic sense, but it’s not obvious what the implications are if you’ve never had one (which I haven’t, which is why I asked). Apparently, if you’ve been careless with security, a consent decree allows the Federal Trade Commission (FTC) to become your overseer, getting all up in your business and stepping in when they want. Most of all, the documentation required during the term of the decree sounds particularly onerous. So… avoid this.
That aside, the following are my attempt to summarize his supporting recommendations (“attempt” because I was writing furiously to keep up):
- Shrink the “attack surface” (i.e., expose less). Meaning, drivers, daemons, features, debug access, web servers, data loggers, etc.
- Separate out “concerns.” I.e., no processes with root access or super-control; restrict access to data. Nothing gets access to anything irrelevant.
- “Defense in Depth” – rings of security. What happens when the first wall is breached?
- Provide only those features really needed. (OK, marketing will have a fun time with this. You know the drill:
- Marketing: Here are the features we need in the next release.
- Engineering: You can’t have them all; which ones do you really need?
- Marketing: We need them all. We didn’t bother asking for the nice-to-haves.
- Engineering: Well, which of these do you need least?
In other words, marketing probably already thinks they’re getting less than the really-needed features.)
- Be conservative in what data you accept and send.
- Review your code.
- Review other people’s code – especially when incorporating someone else’s code or IP. Do an internet search for the package along with words like “crash” or swear words to find red flags.
- Use “sandboxing” to provide isolation.
- Use automation to test and analyze your code. Oh, and don’t forget to look at the results.
- And, the bottom line, “Plan for Compromise.”
And sleep with one eye open. Because They’re coming, you know…