feature article
Subscribe Now

Loading Software on the Fly: AppStreamer

Purdue University Solves First-World Problem

“Those who do not learn from history are doomed to repeat it.”  – George Santayana

I swear, people are trying their hardest to resurrect the 1970s. I thought the era of bell-bottom pants, 8-inch floppy disks, and bicycles with banana seats was nostalgia-proof. Yet here we are. First there was cloud computing, a throwback to the days of dumb terminals and remote time-sharing computers. Now we’ve got college-educated kids – good ones, at that – devising ways to revitalize the idea of broadcast TV and video arcades. 

Maybe that’s a bit harsh, but a research paper coming out of Purdue University looks to me like a solution in search of a problem. Or, at least, a solution to a different problem not addressed in the paper. Or maybe I’m just getting old. 

Here’s the short version: They’ve found a way to stream application code to a smartphone in real-time, as it executes. That’s a pretty slick trick. The goal of said streaming is not to improve performance or enhance security, but to reduce the storage requirements on the phone. In other words, when you’ve Instagrammed too many pictures of your food, you’ll have to start streaming your apps instead of installing them. 

Or… you could just delete some of the photos. Or, I dunno, maybe upload some of them to the cloud and free up local storage? Or – radical idea, I admit – not take so many photos of your Frappuccino? Posterity will forgive you. 

The proposed technique swaps storage for bandwidth. Streaming code instead of installing it. The research behind it is impressive, I admit. It all just seems a bit… misguided. 

The paper starts out with the observation that processors don’t actually execute an entire program all at once. At any given moment, your CPU is working on only a small handful of operations, perhaps three or four, depending on the length of its pipeline. You could, theoretically, make the rest of the program disappear and just feed in instructions as needed. Sort of like laying railroad tracks directly in front of a speeding locomotive and pulling them up from behind.  

That’s effectively what the Purdue team is suggesting. To save space, just deliver the bits of the program (see what I did there?) that the processor needs right now and keep the rest in the cloud and download them just in time. 

That’s a fun idea, and it would be easy to accomplish if computer programs ran in a straight line. For straight-line code, you’d just tee up the next instruction, and the next, and the next, until you got to the end. But real programs don’t work that way. They’re messy, they jump around, they loop unpredictably, and so on. How do you know what parts will be needed, and when? 

By watching and observing, that’s how. AppStreamer is essentially an elaborate cache-prediction algorithm. Just as your x86 processor constantly strives to predict what instructions and/or data it’ll need in the next few nanoseconds so it can preload them into the relevant caches, AppStreamer tries to guess what chunks of code you’ll be executing next. Microprocessors do this in hardware, with virtually no historical data to guide them (branch-prediction caches notwithstanding). AppStreamer doesn’t have that restriction. It’s implemented in software and has access to as much historical data as it needs. 

The first step in getting AppStreamer to work is training it, which the researchers do by running the app in question over and over, preferably with different people to give it some variety. Then, they map out the app’s execution path. Does it loop over this section of code several times before jumping over to this section here? That’s good to know; write that down. Over several runs, AppStreamer builds a model of typical execution for that app. 

None of this training requires access to the source code of the app, or even any understanding of how it works. All profiling data is collected empirically, with no foreknowledge of the program’s structure, programming language, or its size. It works much like a debugger or code-coverage analyzer. 

Based on that execution profile, AppStreamer can make educated guesses about what chunks of code you’ll need when you first start the app, what chunks come next, and what you’re not likely to access at all. It also has a good idea about when you’ll need those sections. Most users might need Part B about 45 seconds after Part A, with Part C coming 12 minutes after that. 

To their credit, the researchers took on the toughest apps of all: games. Mobile games can be sensitive to very small delays, on the order of tens of milliseconds. Something like a spreadsheet or an email app (or even Instragram) would’ve been relatively insensitive to delays and a lot easier to stream. Kudos. 

AppStreamer is implemented in Android’s file-system layer. It doesn’t modify the apps at all. It intercepts file accesses, including requests for code from the smartphone’s flash file system. Since Android’s file system uses 4KB blocks, this gives AppStreamer quite a bit of granularity into the execution path. 

Too much granularity, as it happens. The Purdue team reports that most mobile games are somewhere between 1GB and 10GB in size, and tracing that much code with 4KB resolution results in an unworkable amount of data. Moreover, it isn’t necessary. They experimented with different granularities and found that, for the games they tested, something in the range of tens of megabytes gave the right balance between granularity, space efficiency, search depth, and response time. Different apps might have different chunking requirements, so the number isn’t hard-coded but is instead calculated during the training/profiling process. 

As the financial analysts say, past performance is no guarantee of future returns. Even detailed code traces are merely records of how someone else played the game, not necessarily how it’ll play out the next time. Armed with the captured code-execution profiles, AppStreamer applies a continuous-time Markov chain (CTMC) to weight the probability that any given code chunk will be needed. 

Finally, AppStreamer factors in player speed. Experienced players blaze through the game levels faster than n00bs, and AppStreamer takes that into account by fetching offline content sooner. (Beginners are also unlikely to need the boss level, ever.) 

Now that AppStreamer knows (more or less) what code you’ll want and when you’ll want it, it can start to preemptively download the next chunk. Given that most LTE connections deliver bandwidth in the 10–20 Mbit/sec range, and that code chunks are about 10 MB in size, AppStreamer needs to look far ahead – about 30 seconds ahead, in their experience. Otherwise, the code won’t arrive in time and the poor gamer will endure unwelcome delays. Game over for AppStreamer. 

The results look good. In their testing of two mobile games, most test subjects reported little or no noticeable delay in their games. AppStreamer wasn’t entirely invisible, but it was close. Whether it delivered any real benefit to those testers is a different question. 

On one hand, AppStreamer seems like a natural evolution from locally stored content to streaming. Time was, we used to download MP3s and wait to play them after the download completed. Now, we can stream audio in real time without the wait. A step-function improvement came with the advent of video streaming. Rather than wait hours for a movie to download, we could start streaming it in seconds. Netflix, Hulu, Spotify, and countless other content providers have built their entire business on this underlying technology. 

So why not stream programs, too? One reason is that storage is cheap. And some storage is cheaper, and less time-sensitive, than others. We’ve always been able to fill up the hard drives, SSDs, RAM, or flash that our devices provide. No matter how much storage we get, we’ll find a way to exceed it. But when that overflow happens, there’s a natural triage that goes with it. Blurry and out-of-focus photos get deleted first, then old emails, then obsolete documents, and so on. Cloud storage is cheap. Google provides free photo storage with every Pixel smartphone (in return for seeing all your photos). There’s no reason to let your precious device storage overflow. 

Apps, on the other hand, benefit from local storage and local execution. They’re faster, safer, and ours. It’s bad enough when access to our data is mediated by a third-party ISP; now programs are being ransomed, too? “Gee, nice collection of applications you’ve got there. Be a shame if something happened to ’em.” Pay up or lose the apps you already bought. Or just lose them every time you’re out of wireless range, exceed your data quota, or when the power fails. No, thank you. 

The researchers’ efforts are laudable, and they may have even developed some novel methods for modeling and predicting programmatic behavior. But it doesn’t look that way to me. All I see are some well-known algorithms used for branch prediction, weighted probability, and hooking operating system calls, all applied to a dystopian usage scenario. It’s a nice party trick, but one that may not lead directly to widespread use of the applications described. 

One thought on “Loading Software on the Fly: AppStreamer”

  1. I’ve not yet read the paper, but I suspect it’s not code that’s being streamed here, but textures. The bulk of any game’s memory footprint is textures, and then perhaps geometry. This is why 4KB granularity is too small, and 1-1oMB works better. Texture assets are around that size. Code just comes along for the ride.

Leave a Reply

featured blogs
Nov 24, 2021
The need for automatic mesh generation has never been clearer. The CFD Vision 2030 Study called most applied CFD 'onerous' and cited meshing's inability to generate complex meshes on the first... [[ Click on the title to access the full blog on the Cadence Community site. ]]...
Nov 24, 2021
I just saw an amazing video showing Mick Jagger and the Rolling Stones from 2021 mixed with Spot robot dogs from Boston Dynamics....
Nov 23, 2021
We explain clock domain crossing & common challenges faced during the ASIC design flow as chip designers scale up CDC verification for multi-billion-gate ASICs. The post Integration Challenges for Multi-Billion-Gate ASICs: Part 1 – Clock Domain Crossing appeared f...
Nov 8, 2021
Intel® FPGA Technology Day (IFTD) is a free four-day event that will be hosted virtually across the globe in North America, China, Japan, EMEA, and Asia Pacific from December 6-9, 2021. The theme of IFTD 2021 is 'Accelerating a Smart and Connected World.' This virtual event ...

featured video

Achronix VectorPath Accelerator Card Uses PCIe Gen4 x16 to Communicate with AMD Ryzen PC

Sponsored by Achronix

In this demonstration, the Achronix VectorPath™ accelerator card connects to an AMD Ryzen based PC using PCIe Gen4 x16 interface. The host PC issues commands to have the Speedster™7t FPGA on the VectorPath accelerator card write and read to external GDDR6 memory on the board. These data transactions are performed using the Speedster7t FPGA’s 2D network on chip or NoC which eliminates the need to write complex RTL code to design the host PC to GDDR6 memory interface.

Contact Achronix for a Demonstration of Speedster7t FPGA

featured paper

Enable faster real-time control with high-precision position sensors

Sponsored by Texas Instruments

The demand for highly automated industrial systems is continuing to increase and often requires advanced, reliable position sensing solutions to control equipment performance and collect factory-level data. Learn how the industry’s most accurate linear 3D Hall-effect position sensor, the TMAG5170, enables faster real-time control and increased diagnostic capabilities to help design safer, more reliable, automated industrial systems.

Click to read more

featured chalk talk

Just 1-Wire to Power and Operate I2C or SPI Endpoints

Sponsored by Mouser Electronics and Maxim Integrated (now part of Analog Devices)

If you are working on a connection or IO constrained design, a one wire solution could be a great way for you to power and operate your I2C or SPI endpoints. In this episode of Chalk Talk, Amelia Dalton chats with Scott Jones from Maxim Integrated about the DS28E18 communications bridge: a one wire solution that can help you address a variety of system level challenges including protocol conversion, wiring limitations, and communication distance concerns.

Click here for more information about the Maxim Integrated DS28E18EVKIT Evaluation System