We looked at many-core processors recently, and one of the big issues with scaling up the processor count is memory access: if all of those cores need access to the same memory, then that bandwidth becomes the bottleneck. Which makes SMP with many cores very difficult without shared distributed memory structures.
Netlogic has just announced their XLP II family, following on the heels of their XLP processors that have been around for a while. XLP devices go up to 32 CPUs; XLP II goes up to 80 per device, clusterable to 640. And they explicitly claim SMP capability.
So I followed up with them to see how they manage to talk to memory fast enough to feed so many cores. And it bears saying that, assuming each core manages a different memory, it’s “only” 80 cores that have to vie for attention by a single memory manager. Their response was that they have plenty of headroom on their current 32-CPU devices, and the memory manager runs much faster on the new devices, so they believe that memory access will not get in the way.
As to running all 640 together, they have an inter-chip coherency interface to keep all processors and caches in sync. They have a tri-level caching system, although details aren’t available yet.
They are also claiming a “third-generation” inter-process messaging system to speed up the conversations that the CPUs will need to have with each other.
Above and beyond just the many-core aspects, they are also integrating a host of communication-related functions alongside, including their NETL7 knowledge-based processor, which we discussed recently.
Most technical details haven’t been made public, but there is more info on their press release.