We are now in the parallel era. More and more of the machines we use do more than one thing at a time.
Okay, some of them are capable of doing more than one thing at a time, even if the software that runs on them doesn’t know how to take advantage of that.
Problem is, there are many ways to create systems that run things in parallel. Most are very specific and have specific names associated with them. But there are characteristics that all of these techniques share – and yet there doesn’t seem to be a good way to refer to the bigger picture in a clear, concise, and unambiguous way.
Let’s start by reviewing all the ways you can run things concurrently, examining their ins and outs. We can then try to abstract up a level to see what we should call that. OK, truth in advertising – [spoiler alert] actually, I’m not going to have a suggestion; I’m going to punt and ask for help.
1. Multicore – SMP
The term “multicore” typically refers to a hardware architecture where a processor has more than one execution engine. But there are actually other ways people use “multicore”, so let’s start with the simplest of them all, “symmetric multicore processing,” or SMP. Of all the parallel techniques, this is probably the easiest because someone else (in the guise of an operating system) does the work for you.
With SMP, all cores are the same and their environments all look the same. In other words, an operating system can’t tell them apart. The good thing about that is that the OS can simply schedule tasks on whichever cores are available. It doesn’t have to know anything about individual idiosyncrasies that a core might have – because none of them have any.
Your PC is probably the best example of SMP. Wait… maybe “best” isn’t appropriate. How about, “most obvious” or “simplest.”
OK, I tossed this one in here because it’s not much different from SMP in some of its interpretations. The difference here is that, for some people, “multi-processor” means multiple chips with processors in them. In fact, you could have multi-processor and multicore in the same system: dual quad-core is exactly such a situation.
Does it matter whether it’s a chip or a core that’s multi? Well, there are some technical reasons – the chips are more loosely coupled than the cores. But an OS doesn’t care if the cores all look alike; this can also operate under the SMP moniker.
Of course, some people use the term – and, in particular, the gerund “multi-processing” – as a more generic term, but it’s too easily conflated with the more narrow “multi-processor” meaning.
This isn’t an architecture; it’s the concept of taking a process and breaking it into sub-processes called threads. If no one had used this term for this specific thing, it would be a great metaphoric generic term. Alas, too late.
4. Multicore – AMP
From a hardware standpoint, this can be the same as SMP above. The difference is in the OS. Instead of a single OS managing a bundle of processors or cores identically, different cores may have different OSes. Or different instances of the same OS. Or no OS. One simple configuration is to have one core that runs Linux, for example, and manages the other cores by assigning them processes that run “on bare metal.”
One implication of this is the nature of what is running on a core. With SMP, the OS takes a process and schedules threads on the cores. With AMP, anything assigned to a core with its own OS is a process (or, if no OS, is simply a program).
You can also mix SMP and AMP; for example, with eight cores, four could be managed as SMP and assign tasks to the other four on bare metal. Of course, as a system designer, you’ve got to figure out how to make all of that work.
5. Heterogeneous (multicore)
Here the hardware is different for different cores. It could be as simple as having the same core but different memory or peripherals, but, more commonly, the cores themselves are different. Cell phone SoCs are a good example of this, running many different cores for many different tasks, and each core is chosen to optimize its performance on its assigned task.
Other common configurations are host microprocessors working in tandem with worker DSP chips or, increasingly, GPUs.
This also pretty much means you have to look like AMP from an OS standpoint since you don’t have uniform resources.
6. Grid computing
All of the configurations we’ve looked at so far tend to suggest a single computer or even a single chip. Communication between cores can be by lighter-weight transport schemes, or, at least, ones designed to stay in the same county. PCI-Express is an obvious example, but even just an AMBA bus might suffice in an SMP setup with shared memory. For managing the inter-process communication on something more complex than SMP, MCAPI can be used.
But with grid computing, the implication is of different boxes that might lie on different continents. The transport between them is Ethernet or the internet.
Of course, each box in this case runs its own OS; it’s AMP by definition. Inter-process communication is managed by a heavier-weight protocol, MPI, which can handle some of the more complex implications of computers that may come and go from the grid.
7. Many-core or massively parallel computing
“Many-core” is a nuance intended to distinguish the vast majority of simple multicore chips, which will usually have at most four or so cores, from ones that go much further than that, like Tilera’s 64-core chip. It’s harder to deal with many-core chips because of the challenges managing communication, I/O, and memory effectively, so things that work with simple SMP might well fail on a many-core chip.
“Massively parallel” is a term that, more or less, refers to the same thing. But it goes back much further, having referred to such boxes as the Thinking Machines beast of yore. So it has something of a dated feel to it, even though marketers like to pull it out if they think they can wow you with it. Just cuz it sounds cooler.
8. Parallel programming
I include this because “parallel” is an obvious candidate term for all the stuff we’re talking about. But it’s too focused on the “programming” part, suggesting some of the more arcane techniques being dreamt up in universities to make creation of parallel programs easier. Maybe a new language, maybe a new paradigm, maybe new tools.
9. Hardware acceleration
Here we step out of the realm of more obvious instances of parallel computing. And, in fact, sometimes this isn’t parallel at all.
The idea with hardware acceleration is that some task takes too long to do in software, so you assign it to faster hardware. There are two ways to do this: an easy way and a harder way.
The easy way is to have the processor encounter the need for an accelerator in a program and to fire off the accelerator and wait for the response. This is a so-called “blocking” configuration because the program that the processor is running has to stop and wait for the accelerator to return its result – progress is blocked. Yeah, it seems wasteful to have the processor sit around, but it’s still faster than doing the computation in software. And, if there’s more than one process or thread that needs to be handled, the OS can swap contexts while the accelerator is running, so, except for context-switching overhead, it doesn’t have to be too wasteful.
The harder way is to invoke an accelerator in a non-blocking manner. Now if the accelerator doesn’t have to return anything to the processor – it just does something (say, packages up a TCP packet and sends it along) and then comes back looking for more work, then this is trivial. The hard part is when you need the result of the accelerator, but you want the processor to keep going while the accelerator works. This is more or less a fork/join configuration where execution forks, one part continuing on the processor, the other going to the accelerator, with a subsequent join where the results of the two threads come together.
Even if you have only a single core doing the computing, this is parallel execution. But it doesn’t qualify as “multicore” because there’s only one core. But when an accelerator is operating on a non-blocking basis, all of the considerations required for deciding how to create threads for two cores are still required. It’s just that no one thinks of it as multicore.
In fact, this whole search for an uber-term might not be so hard if it weren’t for this one stickler. Say “multicore” and people won’t think of hardware acceleration. Say, “hardware acceleration” and people won’t think of the kinds of things you have to consider for multicore. And yet, in the abstract, they’re really two flavors of the same thing.
So here I’ve identified nine different ways of referring to various aspects of the act of doing more than one thing at a time. Each term is either specific or overloaded (and, hence, ambiguous).
So your assignment is to come up with a term that can be used to refer to any of the above generically. We’re rising up a level of abstraction here, trying to find a new meta-term. Perhaps you think one of the above terms does the trick. (If someone else doesn’t agree, then, by definition, you’re wrong.) More likely, we have to resort to something new. Which means that someone would have to launch it and drive it home.
So… what’s it going to be? Multi-tasking? (Hmmm… that suggests the rough single-parent life or the life-on-the-edge of the commuter with a cell phone in one hand, an iPad in the other hand, and a cup of coffee in the other hand.) Concurrent computing? (Some people don’t think of hardware execution as “computing” because it’s not done by a “computer.”) Concurrent execution? (Sounds like something out of a Sergio Leone movie.)
I’m stuck. You gotta help me out on this one.