The Clouds Converge

We’ve been watching the cloud computing space, and Synopsys has been playing a visible role in exploring the public cloud for compute resource elasticity during simulation. We took a brief look at an attempt they made to demo the cloud at DAC, which was sabotaged by one of their own hard drives.

So yesterday I got a make-up session from Alex Seibulescu, a Synopsys senior staff engineer. It was also useful in that we weren’t under the same time pressure as a quick DAC presentation might have, so, in addition to showing the simulation run, he was able to delve a bit more into the management of the cloud resources.

The demo consisted of a script executing on a Synopsys computer. In fact, one of their security features involves establishing an IP address from which the cloud can be manipulated. This was Synopsys’s external IP address, so any attempts to run the script from some other IP address would be rejected.

The script is, for the most part, just like any other script. It involves secure shell (SSH) and secure copy (SCP) commands as well as Synopsys’s SC2 command.

The SC2 command is the catch-all command that wraps all of the underlying activities required to manage Amazon. It’s very high-level, making all of the Amazon details opaque. There are four options you can use with the SC2 command: create; query; modify; and delete. These provide log results that can be piped into a file; that file can then be searched for information. For instance, when “create” is used to build a new cluster, the resulting log file can be grepped to extract the cluster ID number, which is required for further commands for that cluster.

His script created a cluster, got the regression suite going, and then did some dynamic resource balancing on the fly. The script checked the progress at a time 50% through when the suite needed to be complete. If it was less than 50% (which, of course, it was), then more cores were added.

The concept of a “cluster” is a master node, a license server, and then zero to some large number (hundreds; theoretically, limited only by what Amazon owns) of worker nodes. Each core on a worker node is considered an instance (a CCI); when you reserve cores, you reserve an entire machine (the machines are not “multi-tenanted” so you’re not sharing with anyone else). The machines have 8 cores, so you have to allocate CCIs in multiples of 8.

For the example, he started with 8 CCIs and then added 32 part way through to finish faster. That’s a simplistic approach: because this can all be done programmatically, you could, for example, figure out how far behind schedule you were to figure out the right number of CCIs to add (rather than just adding a fixed 32).

Each computer is a Linux SMP box, with jobs allocated by a load-sharing program. Synopsys provides sg (Sun GridEngine), which is free, or, if you have LSF licenses, you can use LSF as well.

Once the job completed, the script tarred up the results and downloaded them; the cluster was then destroyed. If the cluster were going to be used again soon, it could be left up, with all worker nodes de-allocated. That would allow all of the configuration, design, and results data to remain in the cluster. Once the cluster is destroyed, all vestiges of the session disappear (meaning it’s critical to make sure you’ve downloaded your results before destroying the cluster).

As the job was running, Alex was able to go to a website to check the progress of the run. There was lots of information – in fact, more than really needed by a typical user (which is useful at this stage for any debug needs).

We also discussed versioning. There’s a version of VCS that a user will get by default; if a different version is needed, the customer can work with Synopsys to make that version available. Synopsys also doesn’t upgrade versions on the fly: any upgrades would typically be initiated by the customer, most likely after that customer has upgraded their own installations to a new version. So there’s no chance that a version would change in the middle of a project.

The Clouds Converge

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk