feature article
Subscribe Now

Accelerating Innovation in Cloud & Mobile Computing

Part 2

In part one of this article series, I suggested that datacenter architectures could benefit from revisiting the parallel computing innovations of the 1980s, and I waxed lyrically about the Transputer, which struck a chord with a surprising number of readers –  including one reader who wrote “we built a fabulous Transputer board … didn’t sell many of them … I haven’t thought about it for decades.”  It was a heartfelt email, though I am not clear if that was because of his fond memory of the Transputer or the commercial failure of his product.  In any event, I firmly stand by my belief that much could be learned and leveraged from revolutionary parallel computing architectures. 

In mobile computing, I observed that quad+ core CPUs are vastly underutilized in the majority of real-world applications (the notable exception being gaming).  Virtually all of these apps are built on a simple client-server model, taking fractional advantage of mobile CPU horsepower.  I hinted that I would propose a “grand alliance” of cloud computing and billions of powerful, waiting mobile CPUs in a Universal Computing Surface:  computation seamlessly farmed out across datacenters and multi-core mobile devices, with the hardware and operating system optimally assigning/balancing workloads based on CPU horsepower, memory capacity/bandwidth and communication latency/bandwidth.

But I’ve since decided not to go there.  Should someone implement this splendid vision and call it the “Universal Computing Surface,” I ask only for attribution for the name in a footnote.  Yes, I could dive into the applicability of massively parallel architectures, be they message-passing like the Inmos Transputer or shared memory such as Stanford’s DASH (the first non-uniform memory architecture machine).  I am hopeful that we will see these or other innovative ideas advance the state of massive cloud datacenter computing, but, given the need to build said innovations into the hardware and operating systems … well … over the past week I decided to dial back part two of this column to something interesting but implementable in the near term. 

I am thinking here in terms of a “division of computing labor” based on a more conventional API model.  Still parallel computing, mind you, with an application simultaneously taking advantage of real computational horsepower in both the cloud datacenter and the multi-core mobile CPU but implementable on contemporary datacenters and mobile devices.

Consider a real application: an augmented reality “digital assistant” to help navigate and make decisions in an unfamiliar city.  Let’s make it a large foreign city to maximize unfamiliarity and increase the value of the digital assistant.  You are looking across the street at a number of shops and restaurants and would like some translation and guidance.  This requires real computation in real time.

First is the simple matter of your location: GPS is fabulous, but all those skyscrapers are not your friends when you are after a fast and very accurate location fix (no clear line-of-sight to enough satellites).  For the augmented reality assistant to provide useful information (displaying identities and descriptions hovering above the correct shops and restaurants) it needs a very accurate fix on your location.  Your mobile’s camera—with a clear view of the street—has tons of useful information to help make such a very accurate fix.

There are serious challenges here: your mobile has a high-resolution photo of your surroundings, but not the required computational horsepower nor massive database needed for comparison; the cloud datacenter has ample horsepower and petabytes of data, but shipping the high-res photo from your phone requires a lot of expensive bandwidth and far too much time for a responsive and interactive user experience … we’re looking for one second or less for the entire process outlined below.  The solution is to divide and conquer:

  • Leave the high-res photo on your mobile device and put its multi-core CPU to work on an edge detection algorithm.  For added good measure, have your mobile identify signage and perform character recognition.  This workload is well-suited to the multi-core, GPU accelerated CPU in your mobile AND it doesn’t require a lot of DRAM (refer to my lament in the part one of this column).  The result of the edge detection and character recognition is a dataset that is a tiny fraction of the size of the original high-res photo.  Send it to the cloud, with the orientation readings from the magnetic compass and gyroscope.
  • The cloud datacenter now has an edge-only perspective of your surroundings, plus the signage information including street and shop/restaurant names.  Your perspective will not be a perfect match to anything in the database, but a perspective shifting and matching algorithm is a heavy-lifting workload wonderfully suited to powerful nodes with real amounts of memory.  The signage matching doesn’t require tons of computation, but it does require searching tons of data, ideally suited to an industrial-strength database.
  • The cloud datacenter determines your location with great precision and provides names/descriptions for every shop and restaurant on the street.
  • Your mobile displays this information, superimposing the labels above each establishment.  You tap the name of a restaurant.
  • The cloud datacenter searches a database of restaurant reviews and provides the information.
  • You step across the street and look back at your previous location, repeating the above process in a second.

Purists might argue that the above model is still client-server.  I hold that it is parallel computing: the multi-core CPU in the mobile operating on local data that would be prohibitively large (long latency plus large expense) to ship to the cloud; the cloud datacenter operating on heavy compute workloads and massive database queries.  All of this can be accomplished with a well-architected API optimizing the division of workload based on computation speed, memory size and locality, communications latency, and bandwidth and power consumption.

Video processing presents an even greater opportunity, though with its own significant challenges.  Let’s start with an observation: today’s mobile devices can CAPTURE reasonably good quality video, but we’ve not yet seen many mobile applications that PROCESS video.  Again, insufficient DRAM in the mobile device is one limiting factor.  Leveraging the “divide and conquer” mobile-plus-cloud computing model, what applications might become possible?

  • Real-time facial recognition.  Granted this is a hypersensitive area, one that I neither endorse nor condemn.  In any case, we can put the CPU and GPU horsepower to excellent use identifying “facial vertices” and sending this tractable dataset to the cloud for matching.  I will leave the degree of innocuousness (identifying people that you’ve met before, but cannot place the name) and 1984-ness (flagging released felons) to your imagination.
  • Driver assistance for motorcycles and bicycles.  This application must be near-time and wants to be real-time, which presents a unique set of challenges, given communication latency.  But here again, the mobile device on its own is incapable of tasks such as lane departure warning, pedestrian detection, and collision avoidance warning.  And, here again, the mobile CPU plus GPU can perform front-end processing that feeds the needed compute horsepower available in the cloud.
  • Sports mechanics and coaching.  This is a cinch for VC funding, given what I’ve seen of Sand Hill Road’s love of golf, running, and cycling.  Identify and capture the key anatomical points (hands, feet, joints) using the mobile processor, and transmit the modest “stick figure” dataset to the cloud.  Process and analyze the swing, posture, and motion using the cloud’s heavy-duty compute power to provide feedback in real-time.  Real-time feedback is proven to be the most effective form of coaching and produces the quickest improvements.

Turning philosophical for just a moment: truly elegant and massively scalable solutions can be designed and built around a ‘proper’ parallel processing architecture, as touched on earlier.  Yet, we do not need to construct the Universal Computing Surface to perform true parallel processing between mobile devices and cloud datacenters.  We need only to re-think how best to leverage the computational resources at our disposal and put all those multi-core mobile CPUs to work.

Part one of this column explored the datacenter level and opined that parallel-computing concepts from the 1980s could be applied to create more scalable and more flexible cloud computing datacenters.  Here in part two of the column, I moved further up the stack to the application software level and proposed that cloud and mobile computing could converge and deliver a far more powerful user experience than today’s client-server model.

Part one started, however, at the bare-metal computer architecture level with the assertion that the pace of real innovation has slowed dramatically.  Yet, I employed contemporary computer architectures in concocting both the datacenter and converged mobile-cloud schemes.  Am I contradicting myself?  My answer is an emphatic NO.

We are at the early stages of both cloud and mobile computing, and, as is often the case, we are relying on available technology and last-generation tools such as the client-server programming model.  Quite soon, developers will exploit the tremendous power of the latest mobile CPUs and deliver novel user experiences.  Once the envelope is pushed, there is no doubt that the ‘legacy’ foundations—starting with the bare-metal computer architecture level—will be stressed to the point they become limiting factors.

Now is the time to step back and re-examine “how we got here” vis-à-vis computer architectures, infrastructure elements, and programming models.  We will find that many of these constituent technologies were designed to solve other problems; for example, blistering integer and floating-point compute rather than massive data-movement.  Now is the time to identify potential bottlenecks and limiting factors and re-think the “requirements document” for modern cloud and mobile computing.  There are extraordinary opportunities for breakthrough improvements that will create a positive feedback loop and accelerate the growth that is driving technology industries.


About the Author: 

Bruce Kleinman is a senior technology/business executive and principal at FSVadvisors and blogs on fromsiliconvalley.com

8 thoughts on “Accelerating Innovation in Cloud & Mobile Computing”

  1. Pingback: 123movies
  2. Pingback: Dungeon
  3. Pingback: dlan

Leave a Reply

featured blogs
Dec 7, 2023
Building on the success of previous years, the 2024 edition of the DATE (Design, Automation and Test in Europe) conference will once again include the Young People Programme. The largest electronic design automation (EDA) conference in Europe, DATE will be held on 25-27 March...
Dec 7, 2023
Explore the different memory technologies at the heart of AI SoC memory architecture and learn about the advantages of SRAM, ReRAM, MRAM, and beyond.The post The Importance of Memory Architecture for AI SoCs appeared first on Chip Design....
Nov 6, 2023
Suffice it to say that everyone and everything in these images was shot in-camera underwater, and that the results truly are haunting....

featured video

Dramatically Improve PPA and Productivity with Generative AI

Sponsored by Cadence Design Systems

Discover how you can quickly optimize flows for many blocks concurrently and use that knowledge for your next design. The Cadence Cerebrus Intelligent Chip Explorer is a revolutionary, AI-driven, automated approach to chip design flow optimization. Block engineers specify the design goals, and generative AI features within Cadence Cerebrus Explorer will intelligently optimize the design to meet the power, performance, and area (PPA) goals in a completely automated way.

Click here for more information

featured paper

3D-IC Design Challenges and Requirements

Sponsored by Cadence Design Systems

While there is great interest in 3D-IC technology, it is still in its early phases. Standard definitions are lacking, the supply chain ecosystem is in flux, and design, analysis, verification, and test challenges need to be resolved. Read this paper to learn about design challenges, ecosystem requirements, and needed solutions. While various types of multi-die packages have been available for many years, this paper focuses on 3D integration and packaging of multiple stacked dies.

Click to read more

featured chalk talk

IoT Data Analysis at the Edge
No longer is machine learning a niche application for electronic engineering. Machine learning is leading a transformative revolution in a variety of electronic designs but implementing machine learning can be a tricky task to complete. In this episode of Chalk Talk, Amelia Dalton and Louis Gobin from STMicroelectronics investigate how STMicroelectronics is helping embedded developers design edge AI solutions. They take a closer look at the benefits of STMicroelectronics NanoEdge-AI® Studio and  STM32Cube.AI and how you can take advantage of them in your next design. 
Jun 28, 2023