Is There a Usability Problem in Verification?
Sun Grid vs. IBM Grid

Benefits of Early Random Regressions

Over the weekend I've been having a discussion with some collegues regarding the usefulness of having the ability to run large batches of random regressions early in a project.  The simplified version of the original question is:

If you had the ability to increase the size of your server farm as needed using a grid computing services from IBM or Sun would it benefit your verification effort?

There are several immediate questions that come to mind regarding the logistics of such as system.  However, I'll leave those for another post.  What I'd like to comment on here is why I think it is extremely valuable to have a large server farm available early in the verification cycle.

  1. Being able to run several thousand iterations of a test demonstrates a certain level of stability of the design features being stressed in the test. 
  2. There are bugs that can only be found with several hundred to several hundred thousand or more iterations of a random test, regardless of test grading or what your coverage metrics are showing you.  This is true at the module level and the full chip level.  You don’t have to wait until the end of the project to find these types of bugs.  Test grading doesn't help here – a test may add absolutely nothing to the amount of functional coverage collected but still find bugs.  Test grading is useful – it helps put together a regression of tests to use when putting together a build and helps you modify tests to get the most bang for the buck.  However, if your functional coverage metrics are not complete (and they never will be) you can never be sure simply by looking at functional coverage whether you’ve completely verified a design or not.
  3. I want to get the maximum benefit out of each test I write.  If I can write a test that keeps one or more designers busy for a week or two fixing all the bugs that drop out of random simulations that gives me time to implement new testbench features ensuring that I am NOT the bottleneck.
  4.  When a verification effort is staffed and scheduled properly, running large random regressions early in the verification cycle produces results (based on my experience on previous projects).  I’ve been able to condition designers to run their own random regressions and leave me alone to develop the test environment in peace.  Many of the engineers I’ve worked with have been very excited about the number of corner cases they’ve found using this method.
  5. Having the ability to quickly run a large regression gives the verification and design engineers near-instant feedback as to the quality of the work they're doing.  It speeds up the cycle of find-fix-retest - potentially by several orders of magnetude (assuming individual simulations themselves can be run in a reasonable amount of time).

As strongly as I feel about the fact that random simulations should be used early and often, there are probably some situations where it isn't appropriate or necessary.  There is no "one size fits all" verification methodology.  Given the licensing and hardware costs of putting together the server farm required to quickly run thousands of tests I can see why many people would object to making this a part of their methodology.  If it was possible to outsource the server farm construction and maintenance and it was secure and convenient for companies to load up a simulation environment on somebody else's farm, grid computing could one day become an equalizing force allowing startups to compete with the Intels, TIs, and Freescales of the world. This post from Sun describes some of the more esoteric possibilities.

Comments