posted by Bryon Moyer
We looked at MQTT, along with various other messaging protocols, not too long ago. Included in the discussion was a brief mention of MQTT having quality-of-service (QoS) features – and one of those is debatable.
MQTT’s optional QoS levels are to guarantee delivery no more than once (in which case you may miss a message), at least once (in which case you might get a duplicate), or exactly once. This latter one is the subject of debate: various threads in various discussions argue why it’s impossible in practice.
This can be discussed on a number of levels. Within the protocol itself, these QoS levels are implemented through acknowledgment sequences.
For at-least-once service, the sender has to keep a copy of the message until it receives a PUBACK from the receiver. In the absence of a PUBACK, the message must be resent. There are two possible causes for a resend: the original message may indeed not have been received, or the PUBACK may not have been received. In the latter case, the original was received and sent on for processing, but the sender doesn’t know that. So it sends a duplicate, but the receiver will have no record that this is a duplicate message; to it this is simply a new message. So it will process it just as it did the first one.
Figure 1. At-least-once delivery. Note that in the third scenario, the message is sent for processing (the dashed arrow) twice.
Exactly-once is slightly more involved. And there are a couple choices the receiver has in how it handles the message once received, and I don’t want to get lost in that, so I’ll abstract the receiver behavior (and the drawing picks one option). As above, the sender sends the message and keeps a copy. When the receiver gets the message, it sends a PUBREC back to the sender, but it keeps a record of the receipt for the moment. Once the sender gets the PUBREC, it can delete its copy of the message, but it keeps a record of the PUBREC for the moment and sends a PUBREL back to the receiver, letting it know that it knows that the receiver got its copy. Once the receiver gets the PUBREL, then it sends back a PUBCOMP and discards its record of the message; when the sender gets the PUBCOMP, then it discards its record of the PUBREC and the exchange terminates.
Figure 2, Exactly-once delivery. One message processing option shown. Note that in no scenario is the message sent for processing (dashed arrow) more than once.
The basic idea is that both sender and receiver keep a record of the state until they’re both convinced that the message was delivered. If, during the process, one of the ACKs gets lost and a duplicate message is sent, the receiver is still tracking that message, so it can ignore the duplicate.
That’s all well and good from the limited standpoint of message delivery, but messages have a purpose; presumably their intent is that something happen as a result of the message. These are quaintly referred to as “side effects” in the discussions, which they are in a computer science sense, but they may really be the “main effect” desired as a result of the message.
If the message instructs an oven to change its temperature, then, from a message-sending standpoint, the fact that the temperature changed after the message is a side effect. But from a system standpoint, it’s the main effect; that’s the purpose of the message. So there’s a question of liability here: if the message doesn’t arrive, then clearly there’s a messaging problem. But if the message arrives, but then something goes wrong in the process of changing the temperature, who’s responsible?
MQTT washes its hands of the issue as soon as the receiving end confirms that it got the message. If the process of sending the message further on in the receiving system breaks down, or if some other system element fails, then, in reality, you still need a message resend. But if a “reduce by 10 degrees” message is received and processed, but something in an internal acknowledgment loop fails, then the message might be processed twice with the net result of reducing the temperature by 20 degrees.
The basic problem is that there are many steps along the way where failure can occur. Even in cases of redundancy or failover, there are typically delays either in detecting that there’s a problem or in performing a switch, and things can go wrong in those timing gaps.
The solution here comes back to “idempotence”: to quote Wikipedia, this “is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application.” You have to do some work on the receiving side to make this happen.
Essentially, the receiving side has to take full ownership of state; it can’t count on the sender to share in storage of the state in case state changes get lost between sender and receiver. This is the principle behind the REST architecture, and it may be familiar in how the HTTP protocol works.
You could also argue that, instead of sending the instruction, “reduce by 10 degrees,” the message should say, “Set to this new temperature” (which happens to be 10 degrees lower). But that shifts state responsibility to the sender. If the receiver’s true state changed and the sender didn’t find out, then the state is effectively undefined.
Bringing things back to MQTT, the problem is that the protocol considers itself successful simply when it delivers its message exactly once. That’s a parochial viewpoint. If reliability is important enough to use that QoS level, then you can’t count on the QoS, because it gives only message, not system, guarantees.
(At least, that's my take on the whole debate...)
posted by Bryon Moyer
What does it take to be successful as a maker of an Internet-of-Things (IoT) edge-node product? This question doesn’t have an obvious answer, partly because there are many contributors to success. But in a discussion with Open Silicon at the recent IoT DevCon, it became clear to me that taking their view into account makes for two opposing forces – forces they’re trying to unify.
On the one side, there are numerous companies trying to jump into the IoT fray. The IoT involves many technologies, and many of these companies don’t come to the party as experts in all of them. So platforms are a common way to get designers up and running as quickly as possible.
Such platforms pre-package many of the technology bits – that’s the point. The good news is that you don’t have to futz with those bits to get things working, and you can focus on domain-specific functionality and get to market quickly. The bad news is that, if everyone uses such platforms, then there is little room left for differentiation, since most of the underlying plumbing is the same.
That might not be an issue in the early days. If you’re the first guy to do a rattlesnake early-alert app for hikers, then that’s your differentiation; no one else does it. Simple. But once you have competition, if you’re all using similar platforms, then your only real differentiation is your app. Anyone ever notice the user interfaces on phone apps? Yeah, they’re dead dumb simple because fat thumbs can’t poke small buttons and over-40 eyes can’t read tiny print. (Hint: Google – or any – maps… little help?) So there’s not much room to differentiate in the app itself.
Open Silicon says that, even if you can use lower-level software to differentiate, the economics don’t work out. (I’m taking that at face value; I haven’t seen their math, nor have I gone through the process myself.) Their assertion is that you need to customize in hardware to differentiate in a way that will pay dividends.
They’re trying to ease IoT SoC development by providing the building blocks for an IoT edge node, from the sensor interface (not the actual sensor) through to the wireless radio, and then work with their customers to establish customization, implemented in hardware, that differentiates the end product. The idea is that, based on the IP already available to Open Silicon (either their own or via IP subs), they can spin something up very quickly. While not typical, they had one tier-1 customer that finished a design from spec to tapeout in six months. And, because most of this can be done on older process nodes (in the 65 – 180-nm range), mask costs aren’t as astronomical as might be feared on an aggressive node.
Their focus tends to be more industrial, so their radio preferences are LoRa for longer range (based largely on a relationship they have with Semtech) and WirelessHART (industrial-strength features over the 802.15.4 radio standard that’s familiar via Zigbee)*.
Differentiating via hardware can be a bold move, especially in a new market, where change is rapid and hardware change is not rapid. This probably isn’t a strategy for a small company with moderate volumes, and I assume that they’re not stating that only companies large enough to afford custom (or customized) SoCs will ever make money. In fact, a large company might have the capacity to do their designs in-house rather than using an outside partner like Open Silicon, so there’s this middle profile of customer that could command large volumes but require outside services.
In addition, the IoT is very much of an agile play: get something out there, see what works, fix what doesn’t. And do it with quick iterations. Accomplishing that with a hardware differentiation angle could be a tough play. But if you can manage it, there may be some money to be made if Open Silicon’s assertion is correct.
posted by Bryon Moyer
Cadence is proposing a new way to approach debug. It’s almost an obvious way, except that this isn’t how most debug has traditionally been done. The real reason this hasn’t been done before is simple: data. We’ll come back to that in a sec.
Their point is that, for most debug today, you have to anticipate where problems are likely to crop up and then manually instrument your code with “printf” statements (or the equivalent) so that you get some visibility into what’s going on with your program.
That works OK for your first simulation run – up to the point when something goes wrong without an accompanying printf to provide clues. So you go back and add more printfs and – and this is the key – you resimulate.
By Cadence’s estimation, 50% of verification effort is debugging, and 25% is running tests. Together, they’re ¾ of the pie. Each resimulation is more test time, and because the debug effort resembles successive approximation as you try to zero in on the cause, it’s less efficient. Their big idea is to make debug more directed and – this is the big part – make it 100% doable after only one verification run.
The result is Indago (no, it doesn’t sound like “indigo”; it’s “in-DAH-go,” apparently Latin for hunting or tracking). There are a few key pieces to this approach.
The main one is the fact that all artifacts – data, logs, code execution, etc. – are captured. In other words, instead of having to decide ahead which data to expose via printf, you simply get everything. That means that debug efforts have all the data they need – no subsequent runs to capture new data are needed.
From there, they have what they call “root cause analysis” that helps point you in the direction of a bug. When a signal is identified by the testbench as being incorrect, the tool can identify a short list of possible causes, and you can drill in from there (even crossing into third-party IP as long as it’s not encrypted).
Finally, they have three apps that they layer above this fundamental technology. One is their Debug Analyzer, which allows multi-language (SystemVerilog, e, and SystemC) code debug. The second is Embedded Software Debug, which helps debug co-verified software and hardware (and optimized for their Palladium emulator and Incisive simulator). Finally, Protocol Debug provides abstraction when debugging protocols so that you can observe what’s happening at a higher level.
These three apps can be run together at the same time. To some extent, they provide alternative views of the same information, and they stay synchronized. You can move back and forth between them, say, highlighting something in one and then viewing in another.
Indago isn’t tied to Cadence’s verification tools; it can also be used with other engines mixed and matched from different EDA providers.
Finally, a quick word on a buzzphrase that featured prominently in the announcement: Big Data. When you hear that, you might think Hadoop or Lambda Architecture or datamarts or NoSQL searches or any number of mysterious acronyms and algorithms and incantations. Anything up to the point of Deep Learning, which is yet another buzzphrase.
I tried to drill in to see what “Big Data” meant in this context. And, in fact, it’s mostly none of that prior stuff. It’s “big data” in the most general sense, the highest-level big-data concept. And that is, “Grab everything you can, up to and including your mother-in-law, and stash it away cuz you might need it someday.” Indago embraces that aspect – it’s key to eliminating subsequent verification iterations while debugging.
To my earlier point, it’s only in modern times that memory is cheap and big enough (and we can dump data to it fast enough) to where we can afford to be this “wasteful” – after all, an enormous percentage of that stored data will never, ever be used. Unlike in the past, that’s no longer an unacceptable cost. Accelerating debug is worth more than the extra storage.