Namcook Analytics LLC Software Assessment and Benchmark Model – (C) Capers Jones

Namcook Analytics LLC provides estimating, benchmark, and assessment services for corporate and government clients. The benchmark data is collected using a unique and proprietary method developed by Capers Jones to eliminate “leakage.” Most civilian software projects do not collect complete data. The most common omissions are unpaid overtime, project management, and the work of specialists such as technical writers and quality assurance.

SOFTWARE RISK MASTER
The Namcook benchmark method uses the proprietary Software Risk Master (SRM) tool to show clients a complete prediction for the application to be benchmarked, including unpaid overtime, specialists, and other relevant topics such as numbers and sizes of documents including requirements, design, user manuals, etc. The SRM tool can also provide application size data in cases where the clients do not know application size in terms of function points, source code, or other quantified metrics.

The SRM questionnaire is used for assessment data collection, benchmark data collection, and software estimating. When the initial estimate is done, clients can examine the estimate and either accept the predictions or enter alternate values, assuming the clients are confident that their data is in fact accurate. Since civilian data averages only about 37% complete based on several thousand projects, it usually cannot be trusted at face value.

IFPUG FUNCTION POINTS
The standard metrics used for normalization of data include function points as defined by the International Function Point Users Group (IFPUG) and also logical code statements. However the SRM tool is metric neutral and predicts size in a total of 15 metrics, including COSMIC, FISMA, NESMA and other function point metrics, SNAP metrics, plus story points, use-case points, and others.

Quality data is predicted and collected for requirements defects, design defects, code defects, document defects, and “bad fixes” or secondary defects accidentally included in defect repairs. Quality data is normalized in terms of defect potentials, defect removal efficiency (DRE) and delivered defects. (Defects per function point allow inclusion of requirements and design defects, which are invisible using lines of code.)

Requirements and design defects are normally more numerous than code defects and harder to remove. The current U.S. average for overall defect removal efficiency (DRE) is only about 85%. Best in class results top 99%, but that is uncommon.

Most quality methods are weak in removing requirements defects, which average only about 77% for requirements defects DRE. However requirements modeling and requirements static analysis can raise requirements DRE up to about 99%. Although SRM predicts and can measure “cost per defect” that metric violates standard economic assumptions and is not valid for quality economic analysis.

After collection of client data, the assessment and benchmark data is compared via pattern matching against data from similar projects. The factors used to determine the “pattern” include country, industry, state or province, city, application nature (new or enhancement), application scope (size in function points), application class, application type, and three forms of complexity: problem, code, and data.

DIFFERENCES BETWEEN SYSTEMS
There are major differences between commercial software, military software, web software, information systems, embedded applications, etc. It is necessary to use a formal taxonomy and formal pattern matching to ensure “apples to apples” comparisons of benchmarks.

For example one of the faults of the Agile method is an insistence on comparing Agile projects only to waterfall projects, and ignoring 32 other development methods such as RUP, TSP, Prince2, and the like.

The same fault is found in the “pair programming” literature where the comparisons exclude single-person projects that use inspections and static analysis. This combination usually has better quality than pairs for lower cost.

Other variables that are used for both estimation and benchmarks include CMMI level, client and team experience, any of 34 different methodologies such as Agile, XP, RUP, TSP, etc. Also included are programming languages or combinations of languages such as Java and HTML.

Any language, or combination of all 2,500 known programming languages, can be measured and predicted. Some applications use more than a dozen languages concurrently. About 50 languages are very common include Java, several C dialects such as C# and Objective C, SQL, HTML, and some newer languages such as Ruby or Ruby on Rails or mySQL.

SOFTWARE ASSESSMENT VALUE
The value of software assessments and accurate benchmarks, and also of early sizing, estimating, and risk analysis, is directly proportional to the overall size of the application measured in terms of function points, as shown below in table 1:

Table 1: Normal Software Results based on Application Size
	Note: Costs are based on $10,000 per month

Size in	Schedule	Total	Productivity	Cost in U.S.	Odds of	Odds of
Function	in calendar	Staffing	in Function	Dollars	Project	Outsource
Points	months		Points per		Failure	Litigation
			Staff Month

1	0.02	1	50.00	$200	0.10%	0.00%
10	0.40	1	25.00	$4,000	1.00%	0.01%
100	3.50	2	14.29	$70,000	2.50%	0.25%
1,000	15.00	6	11.11	$900,000	11.00%	1.20%
10,000	35.00	50	5.71	$17,500,000	31.00%	7.50%
100,000	60.00	575	2.90	$345,000,000	47.50%	23.00%

As can be seen from table 1 small software projects are usually successful. Large software systems are seldom successful. It is an interesting phenomenon that in the lawsuits for project failures where the author has been an expert witness, every case except one was larger than 10,000 function points in size.

Many kinds of research and development projects experience high failure rates. That is expected. After more than 60 years of software engineering large software projects should be able to be built with much lower costs, shorter schedules, and lower risks than actually occur. Therefore early sizing and early risk analysis are critical factors when software applications rise about 1,000 function points to the danger zones of large systems.

NAMCOOK VALUE PROPOSITION
The Namcook sizing and estimating method is very quick. Applications are sized in less than 90 seconds. Application cost and schedule estimates take about 5 minutes. Collecting software benchmark data remotely via Skype or conference calls takes between 30 minutes and an hour. Formal software assessments for major projects in the 10,000 function point size range can take up to three hours to collect the assessment data, but these large projects are not common.

(Self-reported data by clients without any validation usually is inaccurate due to “leakage” from both costs data and quality data. For example only a few companies have data on unpaid overtime. Even fewer have data on unit-test defect removal efficiency. Many companies have no early data on defects until testing starts, and some have no quality data until delivery of the software.)

Accurate benchmarks and assessments are valuable because they can help to avoid endemic problems such as poor quality, schedule delays, cost overruns, and outright cancellation. The overall set of cost drivers for software projects in rank order shown in table 2 include:

Table 2: U.S. Software Development Costs in Rank Order

Finding and fixing bugs
Producing paper documents such as requirements and design
Code development
Meetings and communications
Project management
Handling requirements changes or “creep,” which approach 2% per calendar month.

(If you step back and look at the entire industry and not just software development then maintenance, support, and enhancement of legacy applications is the #1 software industry cost driver in the modern era. Another important cost element is the cumulative costs of canceled projects which never get delivered. Yet another and growing cost driver are the expenses of recovering from cyber attacks.)

The Namcook software assessment and benchmark methods, and the SRM software estimates, include all six of the development cost drivers to ensure that no major cost elements are left out by accident.

The SRM predictions for future projects also include 5-years of maintenance and enhancements, the odds of project cancellation, the odds of litigation for outsource contracts, the probable cost of the litigation, and the odds and costs of cyber attacks. However because multiple years of data collection are needed to benchmark these topics, clients seldom ask for benchmark studies of litigation, canceled projects, cyber attacks, or multiple years of maintenance.

Accurate estimates are derived from accurate benchmarks combined with valid assessments of team capabilities, methods, experience, tools, and other ancillary topics. Very accurate estimates are possible for capable teams using effective methodologies. These are often estimated within 3%. A synergy exists between estimating, benchmarks, and effective methods. Competence is predictable with high precision. On the other hand, incompetence usually produces erratic results and often leads to canceled projects, litigation, or at least major cost and schedule overruns.

The Namcook methods for estimates, benchmarks, and assessments are aimed at helping clients achieve state-of-the art results. Early sizing and early estimation prior to completion of full requirements leave time to deploy effective solutions. The earlier projects can be sized and have quality, risk, and productivity estimates the greater the chance for successful projects.

CERM ® RISK INSIGHTS

Future of Quality: Risk™

Namcook Analytics LLC Software Assessment and Benchmark Model – (C) Capers Jones

Leave a Reply Cancel reply