Almost Instant Replay

It’s 4th and goal, 0:15 to go in the last quarter. The ball is snapped, the quarterback steps back, finds his receiver, and throws. Seeing the play develop, the defender runs to cover the receiver. They both jump in an aerial pas de deux; the ball dances elusively into the air, spins tantalizingly near outstretched fingertips, and falls harmlessly to the ground. While the defender gyrates around in a rather improbable new display of exultation that he hopes will sweep the nation, the receiver cries interference and looks to the referees for justice. The referees call upstairs for a replay so they can judge what happened. To their amazement, they’re told, “Um… we weren’t filming. We can’t see what happened.”

“So, what are we supposed to do?? How are we going to resolve this?”

“Well, I know this is going to sound strange, but the teams are going to have to completely replay the second half, exactly as it happened the first time, so that we can watch that pass more closely.”

It’s one thing to be able to run a live debug session on an actual executing processor, where you have access to detailed information like symbol tables and event traces, whether abstract or low-level. You can interactively work with your debugger to step around code, play with memory, and alter actual execution in an attempt to locate and fix a problem. It’s quite another to debug problems uncovered in validating a new SoC using processor-driven tests.

Processor-driven tests take advantage of the fact that the chip being tested has a processor inside. So rather than relying solely on an external tester with limited access to internal signals to provide all of the testing, you can make use of the internal processor to provide more thorough testing. You write your tests in C (breaking up the code as necessary to fit the code store of the SoC) and have the tester load and execute a chunk of code at a time until the tests are complete. Of course… that’s assuming the processor itself is working properly – the old test-the-tester problem. In reality, if you’ve done a reasonable job of getting the processor right, then processor-driven tests may uncover problems in the rest of the chip or in the processor itself; you just have to be careful not to make the assumption that the processor is always right (an assumption you might make with a tester).

A major consideration is the fact that when you’re signing off an SoC, you don’t have a real processor; you only have a model of a processor. And for sign-off, that model is very accurate – it’s very detailed, operates at the RTL or gate level, and can take a long time to run, on the order or 20 to 50 instructions per second. And if you’ve done most of the detailed checkout on various parts of the chip, you’re not going to be sitting there with eyes glued to the CRT while the tests run. You’re going to run an automated suite of regression tests (that’s “reg” – hard g – test to those of us who are cool enough to be worthy of jargon) overnight or over the weekend. And you’re going to show up in the morning bright and chipper with your steaming hot cuppa joe, expecting to see “Passed” on all the tests and to receive that well-earned pat on the back.

Which is when the soaring strains of the symphony sink into more sorrowful sonorities as you see that there were a number of reg tests that failed half-way through the session. And all you have is a log file that, hopefully, gives you some indication of where things when awry. Now what do you do? Redo the tests from scratch and wait up all night to see what happened? That’s like playing the second half all over again (albeit more deterministically, and without having to relive that abominable victory dance). Even if you can step to the trouble spot and tap some internal signals, there are only so many signals to which you have access; it doesn’t give you the visibility you really need. You do have symbol tables and other static text files from the compiler, but you have to manually (or mentally) apply them to the tests yourself.

Mentor has proposed a new approach to processor test debug with a tool they call Codelink, supporting ARM and MIPS processors (and adding others as demand warrants). It looks something like an instant replay methodology that combines data from a number of places in an attempt to provide much more information for debugging. The intent is to combine results from the RTL- or gate-level sign-off model with an environment reconstruction from a fast-acting ISS to fill in the missing details. The ISS can model what’s happening in the processor, but it needs to be able to follow the actual trajectory of the code that failed. That’s where the results of the detailed simulation run come in.

Codelink starts during the actual reg test run, creating a log of data changes in a number of key structures to which they have access – primarily the general-purpose registers. This provides better visibility into what actually happened from a data standpoint, but it doesn’t say much about what was going on when the failure occurred. The ISS is used for that. After the tests, the ISS replays the tests using the data recorded during the session, creating a model with the kind of debug information you would ordinarily have access to, including variables, the stack, and memory. The ISS simulation is fast by comparison to the RTL-level simulation: Mentor says that a 16-hour test session can be replayed in about 3 seconds.

They’ve integrated this into a graphic debugging environment that allows you to poke around through your code and the results to isolate problems. There is one window for viewing waveforms of all the general-purpose registers; for, example, you can zoom out to see the entire session in one pane, and that might allow you to zero in on a problem area. Your C code is in another pane. You can move forwards and backwards through the code. Holding the mouse over a variable shows the variable value, or you can drag the variable to another pane where variables can be tracked and viewed (viewing variables as waveforms isn’t currently supported, but is a planned enhancement). As you step around in your source code, the waveforms and variable values will track accordingly. You can also step through the assembly code for those situations where you’ve done in-line assembly (not so useful for C code unless you have some way of changing the way your compiler generates assembly…). And you can poke around in memory, looking, for example, at the heap, although the tool won’t try to interpret the heap contents.

There is one thing you can’t do: change data or assert values. This is, after all, a replay – nothing is actually executing during the debug session. You can’t change history. The idea is simply to give a more detailed view of what happened during the test to make it easier to isolate a problem. Trying out a fix or playing with values will require a new simulation, but presumably that will be over a more limited set of code such that you can do it in hours rather than days.

Almost Instant Replay

Related

Leave a Reply Cancel reply

featured video

How NV5, NVIDIA, and Cadence Collaboration Optimizes Data Center Efficiency, Performance, and Reliability

featured chalk talk