feature article
Subscribe Now

Accelerating Innovation in Cloud & Mobile Computing

Part 2

In part one of this article series, I suggested that datacenter architectures could benefit from revisiting the parallel computing innovations of the 1980s, and I waxed lyrically about the Transputer, which struck a chord with a surprising number of readers –  including one reader who wrote “we built a fabulous Transputer board … didn’t sell many of them … I haven’t thought about it for decades.”  It was a heartfelt email, though I am not clear if that was because of his fond memory of the Transputer or the commercial failure of his product.  In any event, I firmly stand by my belief that much could be learned and leveraged from revolutionary parallel computing architectures. 

In mobile computing, I observed that quad+ core CPUs are vastly underutilized in the majority of real-world applications (the notable exception being gaming).  Virtually all of these apps are built on a simple client-server model, taking fractional advantage of mobile CPU horsepower.  I hinted that I would propose a “grand alliance” of cloud computing and billions of powerful, waiting mobile CPUs in a Universal Computing Surface:  computation seamlessly farmed out across datacenters and multi-core mobile devices, with the hardware and operating system optimally assigning/balancing workloads based on CPU horsepower, memory capacity/bandwidth and communication latency/bandwidth.

But I’ve since decided not to go there.  Should someone implement this splendid vision and call it the “Universal Computing Surface,” I ask only for attribution for the name in a footnote.  Yes, I could dive into the applicability of massively parallel architectures, be they message-passing like the Inmos Transputer or shared memory such as Stanford’s DASH (the first non-uniform memory architecture machine).  I am hopeful that we will see these or other innovative ideas advance the state of massive cloud datacenter computing, but, given the need to build said innovations into the hardware and operating systems … well … over the past week I decided to dial back part two of this column to something interesting but implementable in the near term. 

I am thinking here in terms of a “division of computing labor” based on a more conventional API model.  Still parallel computing, mind you, with an application simultaneously taking advantage of real computational horsepower in both the cloud datacenter and the multi-core mobile CPU but implementable on contemporary datacenters and mobile devices.

Consider a real application: an augmented reality “digital assistant” to help navigate and make decisions in an unfamiliar city.  Let’s make it a large foreign city to maximize unfamiliarity and increase the value of the digital assistant.  You are looking across the street at a number of shops and restaurants and would like some translation and guidance.  This requires real computation in real time.

First is the simple matter of your location: GPS is fabulous, but all those skyscrapers are not your friends when you are after a fast and very accurate location fix (no clear line-of-sight to enough satellites).  For the augmented reality assistant to provide useful information (displaying identities and descriptions hovering above the correct shops and restaurants) it needs a very accurate fix on your location.  Your mobile’s camera—with a clear view of the street—has tons of useful information to help make such a very accurate fix.

There are serious challenges here: your mobile has a high-resolution photo of your surroundings, but not the required computational horsepower nor massive database needed for comparison; the cloud datacenter has ample horsepower and petabytes of data, but shipping the high-res photo from your phone requires a lot of expensive bandwidth and far too much time for a responsive and interactive user experience … we’re looking for one second or less for the entire process outlined below.  The solution is to divide and conquer:

  • Leave the high-res photo on your mobile device and put its multi-core CPU to work on an edge detection algorithm.  For added good measure, have your mobile identify signage and perform character recognition.  This workload is well-suited to the multi-core, GPU accelerated CPU in your mobile AND it doesn’t require a lot of DRAM (refer to my lament in the part one of this column).  The result of the edge detection and character recognition is a dataset that is a tiny fraction of the size of the original high-res photo.  Send it to the cloud, with the orientation readings from the magnetic compass and gyroscope.
  • The cloud datacenter now has an edge-only perspective of your surroundings, plus the signage information including street and shop/restaurant names.  Your perspective will not be a perfect match to anything in the database, but a perspective shifting and matching algorithm is a heavy-lifting workload wonderfully suited to powerful nodes with real amounts of memory.  The signage matching doesn’t require tons of computation, but it does require searching tons of data, ideally suited to an industrial-strength database.
  • The cloud datacenter determines your location with great precision and provides names/descriptions for every shop and restaurant on the street.
  • Your mobile displays this information, superimposing the labels above each establishment.  You tap the name of a restaurant.
  • The cloud datacenter searches a database of restaurant reviews and provides the information.
  • You step across the street and look back at your previous location, repeating the above process in a second.

Purists might argue that the above model is still client-server.  I hold that it is parallel computing: the multi-core CPU in the mobile operating on local data that would be prohibitively large (long latency plus large expense) to ship to the cloud; the cloud datacenter operating on heavy compute workloads and massive database queries.  All of this can be accomplished with a well-architected API optimizing the division of workload based on computation speed, memory size and locality, communications latency, and bandwidth and power consumption.

Video processing presents an even greater opportunity, though with its own significant challenges.  Let’s start with an observation: today’s mobile devices can CAPTURE reasonably good quality video, but we’ve not yet seen many mobile applications that PROCESS video.  Again, insufficient DRAM in the mobile device is one limiting factor.  Leveraging the “divide and conquer” mobile-plus-cloud computing model, what applications might become possible?

  • Real-time facial recognition.  Granted this is a hypersensitive area, one that I neither endorse nor condemn.  In any case, we can put the CPU and GPU horsepower to excellent use identifying “facial vertices” and sending this tractable dataset to the cloud for matching.  I will leave the degree of innocuousness (identifying people that you’ve met before, but cannot place the name) and 1984-ness (flagging released felons) to your imagination.
  • Driver assistance for motorcycles and bicycles.  This application must be near-time and wants to be real-time, which presents a unique set of challenges, given communication latency.  But here again, the mobile device on its own is incapable of tasks such as lane departure warning, pedestrian detection, and collision avoidance warning.  And, here again, the mobile CPU plus GPU can perform front-end processing that feeds the needed compute horsepower available in the cloud.
  • Sports mechanics and coaching.  This is a cinch for VC funding, given what I’ve seen of Sand Hill Road’s love of golf, running, and cycling.  Identify and capture the key anatomical points (hands, feet, joints) using the mobile processor, and transmit the modest “stick figure” dataset to the cloud.  Process and analyze the swing, posture, and motion using the cloud’s heavy-duty compute power to provide feedback in real-time.  Real-time feedback is proven to be the most effective form of coaching and produces the quickest improvements.

Turning philosophical for just a moment: truly elegant and massively scalable solutions can be designed and built around a ‘proper’ parallel processing architecture, as touched on earlier.  Yet, we do not need to construct the Universal Computing Surface to perform true parallel processing between mobile devices and cloud datacenters.  We need only to re-think how best to leverage the computational resources at our disposal and put all those multi-core mobile CPUs to work.

Part one of this column explored the datacenter level and opined that parallel-computing concepts from the 1980s could be applied to create more scalable and more flexible cloud computing datacenters.  Here in part two of the column, I moved further up the stack to the application software level and proposed that cloud and mobile computing could converge and deliver a far more powerful user experience than today’s client-server model.

Part one started, however, at the bare-metal computer architecture level with the assertion that the pace of real innovation has slowed dramatically.  Yet, I employed contemporary computer architectures in concocting both the datacenter and converged mobile-cloud schemes.  Am I contradicting myself?  My answer is an emphatic NO.

We are at the early stages of both cloud and mobile computing, and, as is often the case, we are relying on available technology and last-generation tools such as the client-server programming model.  Quite soon, developers will exploit the tremendous power of the latest mobile CPUs and deliver novel user experiences.  Once the envelope is pushed, there is no doubt that the ‘legacy’ foundations—starting with the bare-metal computer architecture level—will be stressed to the point they become limiting factors.

Now is the time to step back and re-examine “how we got here” vis-à-vis computer architectures, infrastructure elements, and programming models.  We will find that many of these constituent technologies were designed to solve other problems; for example, blistering integer and floating-point compute rather than massive data-movement.  Now is the time to identify potential bottlenecks and limiting factors and re-think the “requirements document” for modern cloud and mobile computing.  There are extraordinary opportunities for breakthrough improvements that will create a positive feedback loop and accelerate the growth that is driving technology industries.


About the Author: 

Bruce Kleinman is a senior technology/business executive and principal at FSVadvisors and blogs on fromsiliconvalley.com

8 thoughts on “Accelerating Innovation in Cloud & Mobile Computing”

  1. Pingback: 123movies
  2. Pingback: Dungeon
  3. Pingback: dlan

Leave a Reply

featured blogs
Oct 23, 2020
The Covid-19 pandemic continues to impact our lives in both expected and unexpected ways. Unfortunately, one of the expected ways is a drop in charitable donations. Analysts predict anywhere from a 6% decrease '€“ with many planning for a bigger decline than that. Also, mor...
Oct 23, 2020
[From the last episode: We noted that some inventions, like in-memory compute, aren'€™t intuitive, being driven instead by the math.] We have one more addition to add to our in-memory compute system. Remember that, when we use a regular memory, what goes in is an address '...
Oct 23, 2020
Any suggestions for a 4x4 keypad in which the keys aren'€™t wobbly and you don'€™t have to strike a key dead center for it to make contact?...
Oct 23, 2020
At 11:10am Korean time this morning, Cadence's Elias Fallon delivered one of the keynotes at ISOCC (International System On Chip Conference). It was titled EDA and Machine Learning: The Next Leap... [[ Click on the title to access the full blog on the Cadence Community ...

featured video

Demo: Inuitive NU4000 SoC with ARC EV Processor Running SLAM and CNN

Sponsored by Synopsys

Autonomous vehicles, robotics, augmented and virtual reality all require simultaneous localization and mapping (SLAM) to build a map of the surroundings. Combining SLAM with a neural network engine adds intelligence, allowing the system to identify objects and make decisions. In this demo, Synopsys ARC EV processor’s vision engine (VPU) accelerates KudanSLAM algorithms by up to 40% while running object detection on its CNN engine.

Click here for more information about DesignWare ARC EV Processors for Embedded Vision

featured paper

An engineer’s guide to autonomous and collaborative industrial robots

Sponsored by Texas Instruments

As robots are becoming more commonplace in factories, it is important that they become more intelligent, autonomous, safer and efficient. All of this is enabled with precise motor control, advanced sensing technologies and processing at the edge, all with robust real-time communication. In our e-book, an engineer’s guide to industrial robots, we take an in-depth look at the key technologies used in various robotic applications.

Click here to download the e-book

Featured Chalk Talk

Magnetics for High Voltage

Sponsored by Mouser Electronics and Bourns

With today’s trend toward ever-increasing voltages in energy systems, choosing the right transformer for the job has become an engineering challenge. High voltages demand careful attention to insulation, clearance, and creepage. In this episode of Chalk Talk, Amelia Dalton chats with Cathal Sheehan of Bourns about choosing magnetics for high-voltage applications.

More information about Bourns Magnetics for High Voltage Applications