Reaching for the Cloud

For once, those clouds on the horizon aren’t harbingers of doom. At least, we don’t think so.

In fact, they’re almost tantalizing. Everyone is looking at them, fantasizing that that’s where they want to be. At least, we think so.

If you’ve ever watched a squirrel come and take something from your hand, you’ve seen that skittish, cautious approach, ready to bolt at any second, then snatching the food and running. Well, that’s kind of the feel you sometimes get about companies approaching the cloud. Everyone wants in, but, well, there are problems, and no one is really ready, and customers aren’t quite there yet, and EDA is harder, and, well, it’s going to happen, just not now.

Well, it’s happening.

Now.

In EDA.

As usual, newcomers tend to glide in ahead of the old guard. They can start from scratch in the cloud, meaning they don’t have to change models and transition usage and code and all of that. Tabula has such a setup: their tools are cloud-based. Slightly outside EDA, embedded C parallelization company Vector Fabrics* also has an entirely cloud-based approach. And Altium actually bought a cloud-computing company, partly for internal use, but also for product delivery.

These guys have created an entire environment, GUIs and all, that operates via a browser. There are no tools to install, no upgrades to manage.

Sounds simple enough.

But, in fact, there can be a number of issues, especially for tools manipulating heavyweight SoC designs using complex flows. How you solve them is also complicated by how you approach the cloud. Many of us think of things like Amazon as cloud computing, and it is. But that’s public cloud computing. One can also build private clouds.

Do it in private

That’s, in fact, the approach that Cadence has taken. Their observation is that the public facilities sound promising, but, for now, the setups they provide aren’t well-suited to EDA: they’re for business applications. Totally different.

So Cadence has built their own cloud internally, configuring things the way they wanted.

This has to start with airtight security. “No one will trust their IP to the cloud!” is the received wisdom. Yet that’s increasingly not true. A carefully provisioned cloud – private or public – can actually be more secure than your average in-house server farm.

Cadence has focused on implementing full flows in their private cloud. You can’t pick and choose, jump in and jump out. You get it all or you get nothing in the cloud, with the exception of bursty verification tools. This is consistent with Cadence’s focus on all-Cadence flows.

And, lest you think this is a recent development for Cadence, they actually announced this facility back in 2008.

Or… in public

Synopsys has taken a different approach by using a public cloud. They decided the only way to figure this out was to try it. So they waded in carefully and quietly with a couple of lead customers, one of which was Qualcomm, the other remaining anonymous. Synopsys has actually been looking into this for more than two years; customers have been involved for around a year. They discussed their experiences at their recent SNUG event.

Their goal is to provide transparent access to the cloud as a resource management tool for bursty verification activity. Ideally, as a designer, you’d start a VCS simulation job, and you wouldn’t really know whether it was happening on your own farm or in the cloud. But that’s not how it works yet.

The first thing they found was that, as was noted by Cadence, Amazon – which is who Synopsys chose – wasn’t well set-up for EDA. They had to apply a fair bit of pressure – they say their size helped – to get Amazon to work with them to provide a suitable architecture.

The result is the ability to provision a cluster with one master node and additional slave nodes, using up to 20 servers (an Amazon limit). For this setup, all the computers are of the same type: you can’t request different grades (even though they exist). While, in principle, you could configure anything you desired, Synopsys wanted to wrap the environment so that your average designer didn’t have to worry about that. Limiting flexibility by using a standard configuration helps to achieve that goal.

Amazon provides a number of very low-level APIs that aren’t particular user-friendly to a designer; Synopsys wrapped that environment in their own CloudConnection setup, which provides a higher-level API that’s EDA-appropriate. They also created a number of other CloudUtils scripts to help with various management tasks.

They are very conscious of the fact that there are two user classes: the designers and the EDA managers or IT guys. The latter tend to control access because, well, that’s what IT guys do: control access. The good news is that all access issues should be transparent to the designer.

Is there a lawyer in the house?

Once you can get to the cloud, the next question is, should you? And that gets to legal policies (not to mention, for some companies, government policies) regarding what can go outside your firewall. And, in fact, many companies have no policy at all – it’s simply not an issue, other than the obvious fact that, as an employee, you’re not supposed to steal stuff. Provisions acknowledging that there might be a legitimate reason for IP going beyond the old boundaries haven’t been worked into most policies.

This actually slowed down the Synopsys project with their trial runs. No one could figure out if it was OK to ship some of the critical code into the cloud, or even how to figure out if it was OK. And it’s still not settled: the only way they made progress was by changing the trial project to an open source one, OpenSPARC, so they could move forward while the lawyers scratched their heads.

Once blessed, the design can be shipped onto the cloud, but this is where people get nervous (assuming it’s not always an open source project). Security is handled in a number of ways.

– As noted, fully-provisioned clouds (public or private) tend to have high levels of security, typically evidenced by various certifications.

– You can protect your data. The flow Synopsys used was to compile on the ground and then send binary files into the cloud. Even if snooped, they would be more or less meaningless. Sending encrypted source is another option, as is providing storage encryption in the cloud.

– With Amazon, you can direct your files to servers located in specific regions if requirements so dictate. Synopsys’s initial work has been in North America, but Amazon has cloud facilities in the US (3 of them), Ireland, Singapore, and – newly announced – Japan. It may be tougher if you specifically have to target, say, France.

Dollars and sense

The economics of cloud-based tools vary, but, in general, the software-as-a-service (SaaS) model can be more expensive. Cadence agrees that this is the case, but they point to the full range of tools available that an individual company (especially a small one) may not have. Any questions of higher cost are immediately redirected to a discussion of higher value.

Synopsys goes somewhat further: they’re not building out full flows. They say the cost is about an order of magnitude too high to go to an all-cloud flow, although the cost is dropping continuously. They see the main value, for now, as being short-term bursty availability of computing resources, and, as such, they’re making only VCS available for the time being.

Synopsys uses an hourly rate. The more hours you sign up for, the lower the rate. They wrap into that rate all the charges that they incur through Amazon. And Amazon charges for everything. Upload a file: it costs I/O and storage. Run a tool: it costs compute time. Send data from one computer to another: <kaching>.

By wrapping all those minute charges into a single hourly rate, no matter what you do, Synopsys is taking on the risk that someone could, for instance, insist on downloading full debug dumps rather than isolating a failure region and just downloading that. It doesn’t make a difference to the user (other than download time – and, to be sure, the upload/download limiter will often be the customer’s outgoing/incoming pipe), but it makes a big difference to Synopsys.

For this reason, Synopsys also doesn’t believe it’s currently cost-effective to do distributed computing in the cloud, even though you’ve got all those computers there. The inter-process communication gets really expensive.

This may sound strange, since high-performance computing (HPC) setups do just this sort of thing. But Amazon doesn’t have an HPC setup – their machines and networks are slower; they didn’t really build for that. So, for now, the value is in running separate regression tests on separate machines so that they run completely independently.

Economic and time considerations also play into the various ways you can manage your access. There are three steps involved in running a job from scratch.

– First, a cluster is allocated and brought up. This is a full boot procedure, with lots of software being loaded up and data being uploaded, so it can take a while.

– Next is the configuration of the cluster – deciding how many slave nodes to use.

– Finally, the run can proceed.

– The reverse happens at the end.

If you’re doing only weekly regression tests, then it makes sense to set everything up each time, tearing back down when finished. The cost of an idle cloud for so long is too high.

If you’re doing nightly regression tests, then it may make sense to keep the cluster intact all the time, leaving the data up there. When not used, you can deallocate nodes and keep the master idle, adding more nodes when you’re ready to run again.

If you’re doing continuous validation, then you’d keep everything up and running all the time. This would be a good model for a small company that is relying on the cloud as the only access to tools (something Synopsys isn’t focusing on now, but which is clearly an upcoming model).

Synopsys also sees the economics favoring batch runs. Get the data up there, run like hell, and get the results back out (transferring as little data as possible). Keeping machines running to handle a GUI for interactive work is too expensive, in their view. On the other hand, this is exactly what Tabula and Vector Fabrics do. The latter uses inexpensive machines to manage the browser session, bringing more powerful nodes into play for the more compute-intensive jobs.

Lessons learned

Overall, Synopsys identified a number of recommendations and observations when determining whether or not companies are “cloud-ready”:

– At present, cloud computing doesn’t work well for heavily-customized workflows. Cadence says the same thing with respect to their private cloud. Stick to a standard flow.

– Each piece of code that might end up on the cloud should be checked to make sure it’s okay to send it outside your firewall. If you don’t have policies, you need them. It’s not too early to start thinking about this.

– We’re a long way away from a multi-vendor flow. Cadence isn’t focused in that direction anyway, but, while Synopsys says they’re open to future collaboration, there are no joint projects going – each company is busy figuring out their own bit (if even that).

– You should know your computer workload before embarking.

– You should put together an internal “mirror” network so that you can pipe-clean your setup. Amazon is an expensive place to debug your configuration.

– IT should have a staging environment and manage the access to the cloud. Users shouldn’t be allowed to submit jobs willy-nilly.

– C-based simulations work particularly well, since no licenses are required.

That last item reflects the complications of intricate flows involving IP of varying provenance to be worked on by various collaborating entities using licenses from different companies. And here you can run into issues that are of a business, not technical, nature.

Using today’s standard way of working (outside the cloud), if a company, say Drudge, is a subcontractor to another company, say Overlord, then Drudge will have to buy their own tools even if Overlord has them. When done, Drudge will ship their completed IP to Overlord, who will integrate it.

Now, assume that both companies are designing in the cloud. This is the perfect opportunity for collaboration, since the various files and computers could literally be in the same room. But if Overlord simply sequestered some code and let Drudge into the environment, then Drudge would be using Overlord’s licenses. There would no longer be two companies buying their own sets of tools: they’d be sharing.

Which Synopsys doesn’t like. So even if the companies are side by side, Drudge will have to download their completed design results and ship it over to Overlord by some other means; Overloard can then upload it back into the cloud. A trip likely going hundreds of miles or more, only to wind up a few feet from where it started. I think it’s called, “protecting the shareholders.”

So what is the status today of cloud computing for EDA?

– Cadence is up and running with their private cloud. They refer to it as a “hosted solution.”

– Synopsys is basically out there, but they’re adding customers cautiously since everyone has a different internal setup and different needs.

– Magma is working on implementing FineSim in the cloud, for release sometime in the third quarter of this year.

– Mentor is, at present, mum (either because there’s nothing going on or because they don’t want to talk about it… not sure which).

– Altium is also making some moves into the cloud, although not the entire suite. For now.

It’s a slow start, no doubt. But, at the very least, now that we’ve proven that designers will let the family jewels out of the safe and into the clouds, we can get on with solving the real problems and making it work.

More info – well, in some cases, links to the companies (for those that have no cloud computing info on their websites):

Altium

Cadence

Magma

Synopsys

Tabula

Vector Fabrics

*Full disclosure: I was formerly part of Vector Fabrics management.