Creating Corporate Standards? Beware...
Trying Out Full Posts In Feed

Local DV Managers Share Thoughts On Verification Metrics

Today I attended the fourth Austin DV Club luncheon. Speakers from ARM, IBM, and Intel gave short talks on the types of verification metrics they've used on recent projects.  Afterward, they participated in a panel discussion and took questions from the audience.  The speakers were Dave Williamson from ARM, Sanjay Gupta from IBM, and Shahram Salamian from Intel. 

Dave's talk (slides) focused on the metrics he's used as a verification manager for ARM in Austin.  He discussed two different types of metrics - "testplan" and "health of design".  For example, testplan metrics would include things like:

  1. % of completed tests.
  2. Number of assertions written.
  3. Amount of random testing completed.

Health of design metrics include things like:

  1. Simulation pass rates.
  2. Bug Rates
  3. Code Stability

According to Dave, testplan metrics provide a best-case look at how things are going (i.e. you're only likely to discover you need to write more tests or assertions, not less) while health of design metrics can give you a false sense of security.  (Note: All of the speakers agreed that metrics of any sort can be severely abused by management.  Thus, the verification team needs to make sure to be in constant communication with management to help them understand how to interpret the statistics).  Dave also commented that it is human nature for people to focus on the metrics, potentially in lieu of creating quality code.  For example, if one of the metrics has to do with the number of tests written per week, people might be tempted to create large numbers of slightly different tests where a few robust random tests might have been better suited for the job.

One thing I found interesting was that there were around 120000 coverage points in ARMs latest design.  Though many of those are cross coverage points, it shows that it is possible to spend a lot of effort getting functional coverage right.

Next up was Sanjay Gupta, Cell Verification Lead for the IBM STI Center (slides).  Sanjay talked about the planning that went into the verification effort from the top down specification to the bottom up implementation. Some interesting points from Sanjay's talk:

  1. No tracking done until the team reached it's first major milestone, the completion of the "golden model".
  2. The effort was divided into unit, island, partition, chip, and system environments.  The unit and island environments were responsible for finding about 95% of the bugs. The full chip environment was responsible for 3.5% of the bugs.  Sanjay didn't discuss how much time was spent in each phase in relation to the number of bugs found, but I wouldn't be surprised if many of the full chip bugs were extremely difficult to find and fix.
  3. The major metrics used by the STI team included effective pass rate, effective coverage, number of checkers, completion of reviews, and bug rate.

Last up was Shahram Salamian, CPU Verification Manager in the Mobility Group at Intel (slides). Processor verification at Intel is divided into several types:

  1. Architectural Verification (AV) - Does the chip meet X86 specifications?
  2. Micro Architectural Verification - Clusters, Full Chip, etc.
  3. Power Management
  4. Formal Verification (where it makes sense)
  5. System Level Verification
  6. Design for Debug (DFD) and Design for Test (DFT)

I wanted to ask Shahram if DFD was primarily focused on debugability in the lab or if features were also added to the design to make it easier to debug during pre-silicon verification, but didn't get the opportunity.  Shahram said a few things that caught my attention:

  1. Intel has 25 years worth of legacy tests that must be run before a chip is taped out.  It typically takes 3-4 quarters to get 90-99% legacy test coverage.
  2. A few bad coverage monitors can skew functional coverage results.  Additional metrics are required to look for coverage holes to attempt to counterbalance any misleading results.
  3. Lines of changed RTL is used as a metric to see how stable a design really is.
  4. An empirical formula is used to calculate "Health of Model" metrics using functional coverage, number of new bugs, unresolved bugs, progress of verification team, etc.

Another comment that was common to all the presentations was that you need to have some historical data to compare your current results to for many of these metrics to have meaning.  At the same time, each project is different, and metrics aren't always collected in the same way.  That means that all metrics and historical comparisons need to be taken with a grain of salt.  If your gut tells you something isn't right, go with that and make sure management knows more work needs to be done.

Overall, the presenters shared a lot of interesting information about the verification metrics used by their respective companies.  I'll post my thoughts on the panel discussion and the luncheon in general later in the week.