Keeping it All in Memory

When you think “database,” a number of images might come to mind.

Giant servers with multiple IT experts required to make even the most minute of changes without breaking the entire thing and shutting the company down for weeks while it’s all rebuilt from scratch.
The outsized egos of self-styled demi-gods braving Tasmanian cyclones, breaking nary a bead of sweat.
Monster teams of consultants required to implement what you want in an app overlaid on a monster database on a monster schedule. (Which will slip.)
Or, on a more modest scale, you may have nightmares about being chased by the ugly, multi-appendaged DoCmd object, where you can’t really tell head from tail and aren’t really sure if you’re being eaten (but you’re pretty sure you’re being corrupted – and not in a good way).

Most of us have interacted with a database at some point. In fact, anyone who uses the internet will have, unless he or she exclusively visits old, static sites. But we’ve used more databases in more places than we might expect.

That call you made a few minutes ago on your smartphone? When you looked up the contact? Database. On the phone. Oh, and the call details while you’re in the middle of the call to be sure you’re billed “accurately”? Also a database – not a big, lumbering one, but one small and agile enough to maintain lots of in-process calls (and likely with an extremely efficient custom-crafted round-up function) before sending the info to a big, lumbering database when the call ends.

This is a special category, the in-memory database, that’s finding increasing use in embedded systems. Including very small embedded systems.

As the name implies, these databases aren’t stored on huge, redundant servers: they’re kept in live memory. This means that they’re optimized completely differently: disk I/O is no longer an issue.

It also means they’re not persistent. They’re like the opposite end of the spectrum from a RAID server: if the power goes out, so do they. Actually, that’s not necessarily true, since your contacts don’t disappear if you turn off your phone – but that’s because the “memory” in that case is flash.

Often these form “working” databases for use when you don’t need a persistent record of all of the working details of whatever the database is supporting. And they can typically be rebuilt if something goes wrong, just as routing tables are rebuilt if a router goes down.

And they’re most useful when database transactions and lookups need to be fast: when setting up a call, things can’t take forever. (Even though it may seem that they do. In the old landline days when Bell ruled the phones, a call had to be set up in a specified time, a time far shorter than what cell phones do today. But… alas… that was back in the days when a voice sounded like, well, a voice…)

Not all databases for embedded systems are purely in-memory: there are also hybrid databases, where some tables are in memory and others are persistent.

There are actually a lot of players in this space – Wikipedia lists 33 companies, for what that’s worth. At first glance, one company I spoke to that claims to have a faster and smaller implementation than the others – McObject – would appear to be missing from the list. Then I saw that extremeDB – the product name, not the company name – is on the list. But… it’s red-linked – its page has been deleted.

(I even read through the debate on why it was deleted. Someone didn’t like the fact that company people wrote it – I’ve gotten in trouble for this myself, although not deleted – and the fact that there were links to EE Times and Dr. Dobbs articles, which were described as “spam spam spam.” Someone describing himself as a customer offered to rewrite it given specific examples that would pass muster… that was not provided, and the page was deleted. Ah… democracy in action… “No, I’m not going to tell you what I want, I’m simply going to tell you I don’t like what I have…” Budding CEO perhaps?)

I had a chance to talk with McObject’s Steve Graves at ESC. As a case in point, they claim a code footprint of 150K or less (I know, for some of you anything over 256 bytes is bloat…) and claim read/write accesses of a few microseconds. They attribute their speed to a couple things (other than memory-residence): they work directly with data stored in the form needed by the application, so there’s no futzing around with formats; and they have a native query language that’s faster than SQL. They have an SQL implementation, but it’s slower because the queries have to be parsed and interpreted.

At ESC, they were touting a new clustering capability. This is targeted at telecom guys, where numerous systems might be working off of the same database. Each gets a local copy to work from, and replication across the cluster is managed transparently by the runtime system. This also works with their MVCC capability, which eliminates locks. Since everyone’s working on a copy, there’s no need for locked doors. The only issue comes when more than one person makes a change to the same record. One will get in; the other(s) will be rejected. So coders need to use a loop on each transaction to retry if it doesn’t work the first time.

Their ACID claims (a kind of acid test for databases) may be a bit off the mark: they can presumably handle atomicity, consistency, and isolation (A, C, and I), but their very in-memory nature would suggest that they’re not durable (the D).

They also claim flexibility for those building a database. The tables don’t have to be normalized, reducing design time; they can also include structures in the tables. They support a variety of indexing schemes for different applications, including “R-trees,” useful for geographic information; “Patricia” trees (or tries), used for string searches – in particular, IP addresses; and k-d trees, used for multi-dimensional searches.

That variety of indexing for different applications highlights the range of uses that these kinds of databases are addressing. While a lot of this stuff might just seem like stupid, old-fashioned table look-up, in fact a table look-up is really a degenerate database query. (No, I’m not talking about Megan’s law databases…) If databases are big, huge, monstrous things to manage, then a table look-up seems like a nice way to go.

On the other hand, if databases are easy, small, and fast, then you can simplify your world by making everything look like a database.

As long as it’s not the kind that visits you by night…