#11 – SOFTWARE PATTERN MATCHING – (C) CAPERS JONES

Pattern matching is a predictive methodology that uses a formal taxonomy to compare results of historical software projects against the possible outcomes of new software projects that are about to start development.

Pattern matching for software starts with a questionnaire that uses multiple-choice questions. These questions elicit information about a new project, such as its nature, scope, class, type, and complexity.

The answers to the questions form a “pattern” that is used to extract data from historical projects that have the same pattern, or a pattern that is very close. Mathematical algorithms have been developed to handle partial matches to historical patterns.

Mathematical approximations are necessary because the total number of patterns formed by the proprietary taxonomy totals 214,200,000. Most of these patterns have never occurred and never will occur. The nucleus of common patterns that occur many times for software is closer to 20,000.

In today’s world pattern matching is a good choice for software sizing and estimating because almost 95% of software applications are not “new” in the sense of never being done before. The majority today are either legacy replacements or minor variations to existing software.

Pattern matching and formal taxonomies have been widely used in science and business, but are comparatively new for software.

Software pattern matching as described here is based on a proprietary taxonomy developed by the author, Capers Jones. The taxonomy uses multiple-choice questions to identify the key attributes of software projects. The taxonomy is used to collect historical benchmark data and also as basis for estimating future projects. The taxonomy is also used for sizing applications.

For sizing, the taxonomy includes project nature, scope, class, type, problem complexity, code complexity, and data complexity. For estimating, additional parameters such as CMMI level, methodology, and team experience are also used.

The pattern matching methodology for software sizing is patent pending and the inventor is Capers Jones. The utility patent application is U.S. Patent Application No. 13/352,434 filed January 18, 2012, called early and rapid sizing for software applications.

The pattern matching approach for software sizing is a standard feature of the Software Risk Master ™ tool (SRM). For example the 2013 SRM taxonomy entries for “project scope” include 34 entries:

	Project Scope
1	Algorithm
2	Maintenance: defect repair
3	Subroutine
4	Module
5	Reusable module
6	Enhancement to a program
7	Small enhancement to a system
8	Disposable prototype or 7% of application
9	Large enhancement to a program
10	Evolutionary prototype or 12% of application
11	Average enhancement to a system
12	Subprogram
13	Standalone program: Smartphone
14	Standalone program: tablet
15	Standalone program: PC
16	Large enhancement to a system
17	Standalone program: Web
18	Standalone program: Cloud
19	Standalone program: embedded
20	Standalone program: mainframe
21	Multi-component program
22	Component of a departmental system
23	Release of a system (base plus)
24	Component of a corporate system
25	Component of an enterprise system
26	New social network system
27	New departmental system
28	Component of a national system
29	New corporate system
30	Component of a global system
31	Massively multiplayer game application
32	New enterprise system
33	New national systems
34	New global system

The in the SRM taxonomy for “project type” include these 25 forms of software:

	Project Type
1	Nonprocedural (generated, query, spreadsheet)
2	Batch application
3	Interactive application
4	Batch database application
5	Interactive GUI application
6	Interactive database application
7	Web application
8	Client/server application
9	Data warehouse application
10	Big data application
11	Computer game
12	Scientific or mathematical program
13	System support or middleware application
14	Service oriented architecture (SOA)
15	Expert system
16	Communications or telecommunications
17	Process control applications
18	Trusted systems
19	Embedded or real-time applications
20	Graphics, animation, or image processing applications
21	Multimedia applications
22	Robotics or mechanical automation applications
23	AI applications
24	Neural net applications
25	Hybrid: multiple types

The total numbers of discrete elements in the full software sizing taxonomy are:

Project Nature	12
Project Scope	34
Project Class	21
Project Type	25
Problem complexity	10
Code Complexity	10
Data Complexity	10
Sum	122

Permutations	214,200,000

With 122 total elements the permutations of the full taxonomy total to 214,200,000 possible patterns. Needless to say more than half of these patterns have never occurred and will never occur.

For the software industry in 2013 the total number of patterns that occur with relatively high frequency is much smaller: about 20,000.

Using Pattern Matching for Sizing Software Applications

To use pattern matching for software sizing, the clients provide answers to the multiple-choice taxonomy questions. The answers to these questions form a distinct “pattern.”

The client’s pattern for a project is then compared against the Software Risk Master ™ knowledge base. Projects with the same or nearly the same patterns are selected.

Due to the large numbers of projects examined and measured over the years, mathematical algorithms have been developed that are based on thousands of projects. These algorithms are quick and also enable matches of patterns that are close but not identical to a client’s taxonomy.

Rather than an actual scan for identical patterns, the SRM algorithms condense the original data and speed up the calculations to a few seconds.

For example if a client were interested in a PBX switching system perhaps a dozen similar projects with the same pattern could be found. These historical PBX switching projects would range from about 1,200 to perhaps 1,700 function points in size, and average about 1,500. The data from the PBX results would be aggregated and presented to the client with the average size being the primary data point for sizing.

However the SRM algorithms are already set for PBX switching systems so merely specifying that type of application will generate a size of around 1,500 function points without needing to scan for specific PBX projects.

Some additional geographic information is also part of the taxonomy, but has no impact on applications size. The full set of topics in the SRM sizing taxonomy would look like the table shown below.

When used with the SRM software tool developed for the invention, four additional factors from public sources are part of the taxonomy (country, region, industry, and city):

Software Risk Master ™ Full Sizing Taxonomy

Country code = 1 (United States)

Region code = 06 (California)

City Code = 408 (San Jose)

Industry code = 1569 (Telecommunications)

Project Nature = 1 (New project)

Project Scope = 21 (New components; new application)

Project Class = 5 (External, bundled with hardware)

Project Type = 14 (Communications or telecommunications)

Problem Complexity = 5 (Average complexity)

Code Complexity = 4 (Below average complexity)

Data Complexity = 6 (Above average complexity)

Primary Size metric = 1 (IFPUG function points with SNAP)

Secondary size metric = 8 (Logical code statements)

Programming language(s) = 14 (CHILL)

Programming language level = 3

Certified reuse percent = 15% (default)

By using numeric codes the taxonomy allows sophisticated statistical analysis. Data can be analyzed by country, by industry, by application type, by application size, by programming language, by metric, by complexity, or by any combination of factors.

The first four items in the full taxonomy use public data. For example the “industry code” is the North American Industry Classification (NAIC) code published by the U.S. Department of Commerce. The country code is taken from the international telephone calling codes. The city code is the telephone area code. The region code for the United States is taken from an alphabetical list of the 50 states published on several web sites and readily available.

The taxonomy is the key to software pattern matching, and indeed a critical topic for many kinds of scientific and statistical analysis.

For sizing, pattern matching is not counting function points. The function points have already been counted for the historical projects. Pattern matching is an effective method for using historical data to show clients the probable size and effort for similar future projects.

Pattern matching does not require any knowledge of the inner structure of the application. It happens that software projects that share the same patterns of external attributes are also about the same size and often have similar schedules, staff sizes, effort, and costs (when adjusted for pay scales, countries, and industries).

Pattern matching provides an early, quick, and accurate method for sizing and estimating software projects based on historical projects with similar patterns and attributes. The software taxonomy of size, nature, scope, class, type, and complexity are key predictors of software application size. One reason for the accuracy of pattern matching is because of the precision of the proprietary taxonomy.

In a sense pattern matching works like a GPS system. By comparing signals from several satellites a GPS receiver can show position within a few yards. With software pattern matching comparing the “signals” from the software taxonomy can provide precise information about software projects.

Pattern matching can produce sizes for software projects in about 90 seconds using the Software Risk Master™ tool. Full development, schedule, staffing, effort, cost, quality, and risk estimates take less than 5 minutes.

Shown below in table 1 are 40 samples sized using the SRM pattern-matching approach. The length of time needed to create these 40 size examples was about 75 minutes or 1.88 minutes per application.

Table 1: Examples of Software Size via Pattern Matching

Using Software Risk Master ™

Application Size in IFPUG Function Points

Oracle 229,434
Windows 7 (all features) 202,150
Microsoft Windows XP 66,238
Google docs 47,668
Microsoft Office 2003 33,736
F15 avionics/weapons 23,109
VA medical records 19,819
Apple I Phone 19,366
IBM IMS data base 18,558
Google search engine 18,640
Linux 17,505
ITT System 12 switching 17,002
Denver Airport luggage (original) 16,661
Child Support Payments (state) 12,546
Facebook 8,404
MapQuest 3,793
Microsoft Project 1,963
Android OS (original version) 1,858
Microsoft Excel 1,578
Garmin GPS navigation (hand held) 1,518
Microsoft Word 1,431
Mozilla Firefox 1,342
Laser printer driver (HP) 1,248
Sun Java compiler 1,185
Wikipedia 1,142
Cochlear implant (embedded) 1,041
Microsoft DOS circa 1998 1,022
Nintendo Gameboy DS 1,002
Casio atomic watch 933
Computer BIOS 857
KnowledgePlan 883
Function Point Workbench 714
Norton anti-virus 700
SPQR/20 699
Golf handicap analysis 662
Google Gmail 590
Twitter (original circa 2009) 541
Freecell computer solitaire 102
Software Risk Master™ prototype 38
ILOVEYOU computer worm 22

It should be noted that manual function point analysis proceeds at a rate of perhaps 500 function points counted per day. To count function points manually for the first example, Oracle, at 229,434 function points would require roughly 459 working days of manual function point analysis. Software Risk Master ™ sized Oracle in 1.8 minutes via pattern matching. (Slow manual counting speed is one of the reasons why function points have been used primarily on small to mid-sized applications when counted manually.)

One issue with sizing by pattern matching is that the function points for a majority of large applications were derived from “backfiring” or mathematical conversion from logical code statements. This method is not reliable. However if there are a number of applications the aggregate or average probably compensates for that issue.

Another issue is that none of the older historical projects use the new SNAP metric which was just released in 2012. This will require additional mathematical adjustments when there is sufficient SNAP data to derive rules and algorithms for assessing the SNAP portions of legacy applications.

Pattern Matching for Productivity and Quality Analysis

Additional variables such as CMMI levels, team experience, programming languages, and work hours per month can be used to perform full project estimates, but are not needed for sizing. However if clients want to know size in logical source code statements, they need to select the programming language(s) from the SRM pull-down table of languages. Multiple languages in the same application are also supported such as Java and HTML or COBOL and SQL.

To measure or estimate software development productivity rates some additional SRM input variables need to be provided by clients. Here too most of the information is in the form of multiple-choice questions. However if a client wants accurate cost estimates they must provide their own local cost structures rather than accepting default values for costs. The SRM productivity factors are shown below:

Software Risk Master ™ Development Estimating Adjustment Factors

Development compensation = $10,000 per month (default)

Maintenance compensation = $8,000 per month (default)

User compensation = $10,000 per month (default)

Additional project costs = $0 (default)

Project financial value (if known) = $0 (default)

Project goals = 3 (Average staffing; average schedule)

Work hours per month = 132 hours per month (default)

Monthly unpaid overtime hours = 0 (default)

Monthly paid overtime hours = 0 (default)

Project CMMI level = 3 (default)

Project Methodology = 8 Agile/Scrum (default)

Methodology experience = 2 (Above average: majority of team are experts)

Client experience level = 4 (Below average: inexperienced with project type)

Project management experience = 2 (Above average: managed many similar projects)

Development team experience = 3 (Average)

Test team experience = 1 (Well above average: all certified test personnel)

Quality assurance experience = 3 (Average)

Customer support experience = 5 (Very inexperienced: totally new to project type)

Maintenance team experience = 3 (Average)

Here too the use of numeric coding for the variables that will impact the project’s schedules, effort, staffing, and cost make statistical analysis fairly straightforward.

The experience questions all are based on a 5-point scale which makes statistical analysis of results comparatively easy:

DEVELOPMENT TEAM EXPERIENCE: _______

All experts
Majority of experts
Even mix of experts and novices
Majority of novices
All novices

As can be seen the central value of 3 represents average results or the center point of a bell-shaped curve.

One common use for pattern matching is to compare the results of various programming methodologies. In order to do this form of comparison this users merely select the methodology they plan to use from the SRM multiple-choice list of 34 software development methods:

	Methods
1	Mashup
2	Hybrid
3	IntegraNova
4	TSP/PSP
5	Microsoft Solutions Framework
6	RUP
7	XP
8	Agile/Scrum
9	Data state design
10	T-VEC
11	Information engineering (IE)
12	Object Oriented
13	EVO
14	RAD
15	Jackson
16	SADT
17	Spiral
18	SSADM
19	Open-source
20	Flow based
21	Iterative
22	Crystal development
23	V-Model
24	Prince2
25	Merise
26	DSDM
27	Clean room
28	ISO/IEC
29	Waterfall
30	Pair programming
31	DoD 2167
32	Proofs of correctness
33	Cowboy
34	None

Because Agile with Scrum are widely used in 2013 this choice is the default method. But it is easy to try any of the others in the SRM taxonomy methodology list.

If the client also wants quality predictions or maintenance and enhancement predictions some additional inputs are needed for these estimates in addition to the ones already shown. For example maintenance costs are strongly correlated to numbers of users and numbers of installations where the software is installed. Quality is strongly correlated to the combination of defect prevention methods, pre-test removal such as inspections, and the set of testing stages used.

As with the variables shown above, most of the SRM inputs are based on multiple-choice questions. Multiple-choice questions are easy to understand and easy for clients to select.

It happens that pattern matching is metric neutral and can produce size data in a variety of metrics simultaneously. The metrics supported include IFPUG function points, COSMIC function points, NESMA function points, FISMA function points, use case points, story points, RICE objects, and several additional metrics.

If you have an application size of an even 1,000 function points using IFPUG version 4.2, here are the approximate sizes predicted for the other 15 metrics. In the prototype SRM version the other metrics are merely displayed as shown below. In a commercial version of SRM users could select which metric they want to use for normalization of output data elements. The 15 metrics currently supported include:

	Alternate Metrics	Size	% of IFPUG
1	Backfired function points	1,000	100.00%
2	Cosmic function points	1,143	114.29%
3	Fast function points	970	97.00%
4	Feature points	1,000	100.00%
5	FISMA function points	1,020	102.00%
6	Full function points	1,170	117.00%
7	Function points light	965	96.50%
8	Mark II function points	1,060	106.00%
9	NESMA function points	1,040	104.00%
10	RICE objects	4,714	471.43%
11	SCCQI “function points”	3,029	302.86%
12	SNAP non functional metrics	235	23.53%
13	Story points	556	55.56%
14	Unadjusted function points	890	89.00%
15	Use case points	333	33.33%

Additional metrics can be added if they have formal definitions. The default size metrics used by Software Risk Master ™ include IFPUG function points and logical code statements. These two metrics are the most common in the United States. SNAP non-functional size metrics were added in 2013 of the SRM prototype but were not present in the original 2011 version since SNAP had not been published at that time.

As this is being written data on the new SNAP metric is just becoming available, so it is probable that the SNAP predictions will be changing fairly soon.

Sizing Accuracy: The State of the Art as of 2013

Sizing accuracy using function points is a disputatious topic in 2013. There are ongoing debates between COSMIC function point users and IFPUG function point users as well as other forms of function points such as NESMA and FISMA as to which method is most accurate.

Because function point counts are performed by human beings using fairly complex sets of rules, there are variances among certified counters when they count the same application. There is no “cesium atom” or absolute standard against which function point accuracy can be measured.

Consider the PBX application cited in this article. If it were counted by 10 certified IFPUG counters and 10 certified COSMIC counters the results would probably be in the following range: IFPUG counters would range between about 1,400 and 1,600 function points and average about 1,500. COSMIC counters would range between about 1,500 and 1,700 function points and average about 1,550. In general COSMIC counts are larger than IFPUG counts. (Coincidentally the differences between COSMIC and IFPUG are close to the differences between Imperial gallons and U.S. gallons.)

(If the new SNAP metrics were included on the IFPUG side, there would be an additional size component. However SNAP is a new concept and is not to be found in historical data for legacy applications. All of the PBX examples in this paper are much older than SNAP.)

An advantage of sizing using Software Risk Master ™ is that if 10 users answered the input questions the same way, the 10 results would be identical.

In 2013 the Object Management Group (OMG) announced a new standard for automated function point counting, using IFPUG as the basis. The OMG standard did not include any discussion of how the counts would compare to normal IFPUG counts. In fact the text of the standard said there would be variances, but did not explain their magnitude.

Another omission from the OMG standard is that it requires analysis of source code. There are more than 2,500 programming languages as of 2013 and the OMG standard did not identify which languages were supported and which were not.

In the context of the PBX switch discussed in this paper, it is unlikely that the OMG standard would be able to count switches coded in CHILL, Electronic Switching PL/I (ES/PLI), Objectivc C, or CORAL all of which were used in the telecommunications sector. As this paper is written the accuracy of the OMG method is unknown or at least unpublished.

One of the theoretical advantages of automatic sizing should be the speed of achieving the size of applications. Manual function point counts average around 500 function points counted per day with about a 20% range based on experience and application complexity. There is nothing in the OMG standard about counting speed. But the amount of preparatory work before the OMG method can be used seems significant. The OMG standard should publish comparative results between manual counts and OMG counts.

In the interest of full disclosure, the Software Risk Master ™ sizing speed averages 1.88 minutes per application regardless of the nominal size of the application. In other words, SRM sizes 100,000 function point applications at the same speed that it sizes 10 function point applications. (SRM does not “count.” It uses pattern matching to show sizes of historical projects with the same patterns as the new application being sized.).

Pattern Matching in Other Business and Scientific Fields

The pattern matching method is new and novel for software, but widely used outside of software by other business sectors.

If you want to buy a house in another community or in your own town the web site of Zillow.com will give you the prices of houses all over the United States via pattern matching. Zillow allows users to specify square feet, style, and various amenities such as swimming pools.

For example if you want to buy a 3,500 square foot home with an in-ground swimming pool on at least five acres in Taos, New Mexico Zillow can find available properties in a few minutes.

If you want to buy a used automobile either Autotrader or the Kelly Blue Book can provide automobile prices using pattern matching. These two sources show automobile prices by geographic area, by manufacturer, by model, by mileage, and by feature such as having satellite radio or a GPS navigation package.

For example if you were interested in the price of a used 2012 Lexus RX350 with all-wheel drive, satellite radio, a GPS, and a premium sound system within 20 miles of Sarasota, Florida that could easily be done using Autotrader.com in a few seconds.

Of course you would still have to negotiate a final price with the seller. Having the average and range of costs for identical or very similar cars is a good starting point for the negotiations. If you decide to omit the satellite radio the price might be a few hundred dollars lower. If you decide you want a car with less than 10,000 miles that will raise the price. The point is that pattern matching is an excellent starting place for decision making.

Pattern matching is also a normal part of medical diagnosis. When a patient visits a general practitioner and presents the classic symptoms of whooping cough, for example, the condition will immediately be diagnosed by the physician because the patterns of millions of whooping cough symptoms have been analyzed for more than 200 years. Of course various lab samples and blood tests will be taken to confirm the diagnosis, but these are used more to ensure against potential malpractice claims than to confirm the diagnosis.

When new and unusual conditions appear, such as Lyme disease, they will often be misdiagnosed until sufficient numbers of patients have been examined to understand the patterns of symptoms. This was actually true for Lyme disease and dozens of patients had been misdiagnosed as having childhood arthritis because some of the Lyme disease symptoms are ambiguous.

Lyme disease was not even recognized as a new illness until a physician did a statistical analysis of the patterns of diagnoses of childhood arthritis centering on the town of Old Lyme, Connecticut. There were far too many cases of childhood arthritis for that to be the true condition, so additional research detected the presence of the Lyme disease bacteria. Still more research eventually found that the vectors of Lyme disease were white-footed mice and common white-tailed deer. Ticks that moved from the deer to mice were the Lyme disease vectors.

(Of course a tick bite mark surrounded by a red circle is a strong indicator of Lyme disease, but this is not always present. Further, it may have been present but in a spot invisible to the patient so it was not noticed until it had faded and other symptoms occurred.)

If you are interested in the size, schedule, and effort to develop a bank ATM processing application in San Francisco, California then pattern matching can provide size and cost information in a few seconds based on dozens of similar projects.

Scientists also use pattern matching to place a newly discovered fish or insect into standard biological categories based on genera, type, class, order, and species.

In all cases the effectiveness of pattern matching is based on the existence of a stable and reliable taxonomy. Pattern matching is new for software, but well understood by many other sciences, engineering fields, and business sectors.

In order for pattern matching to work for software, historical data is needed that encompasses at least 15,000 software projects. Additional mathematical algorithms are needed to process applications that do not have a perfect match to any pattern.

A final advantage of pattern matching for software sizing is that it can be used before software requirements are fully known. This is because the basic taxonomy pattern of an application can be identified very early, and indeed will lead to the requirements that are eventually defined. This is because software projects with the same taxonomy usually have very similar requirements.

Early sizing prior to full requirements make early risk analysis possible. Many risks are directly proportional to application size, so the sooner size is ascertained the quicker potential risks can be evaluated.

Consider the patterns for risks for these six size plateaus of software applications:

1 function point: Close to zero risk.

10 function points: Close to zero risk.

100 function points; low risk; more than 95% success; minor delays and cost overruns.

1,000 function points: risks increase; schedule and cost overruns > 10%.

10,000 function points: major risks; canceled projects occur; overruns > 35%.

100,000 function points; > 50% of projects cancelled; overruns > 55% for survivors.

If a company or government group is planning to build a software application that is likely to be larger than 1,000 function points in size, early sizing and early risk analyses are urgent needs. The earlier size and risks can be evaluated, the more time there will be to deploy effective risk solutions.

Pattern Matching and Early Risk Analysis Prior to Requirements Completion

Software projects are subject to more than 225 risks in all, including security risks, quality risks knowledge risks, financial risks, ethical risks and many others. No risk tool can identify all 225 but Software Risk Master ™ can identify about 25 major risks before projects start and are funded. This gives time to deploy risk solutions.

Here is a small sample of risks for a major application of 10,000 function points or about 533,000 Java statements developed by an average team:

Predicted Risks
Cancellation	25.77%
Negative ROI	32.65%
Cost overrun	28.35%
Schedule slip	34.36%
Unhappy customers	36.00%
Litigation	11.34%
Average Risks	28.08%
Financial Risks	47.58%

Additional risks are specific to various deliverables such as requirements:

Requirements size (pages) = 2,126

Requirements completeness = 73.79%

Amount one person understands = 12.08%

Days required to read requirements = 48.09

Requirements creep or growth = 1,599 function points

Missing requirements = 216

Toxic requirements = 27

Requirements defects = 1,146

Test cases for requirements = 5,472

Because the patented early sizing method of SRM can be used prior to requirements, SRM is the only parametric tool that can predict the size, completeness, and quality of the requirements themselves before projects start. This early prediction allows time to introduce better requirements methods such as joint application design (JAD), quality function deployment (QFD), Rational Doors, T-VEC, IntegraNova, requirements modeling, text static analysis, the FOG readability index, and other recent solutions to chronic requirements problems.

Without multiplying examples “Software Risk Master ™” is aptly named since it predicts risks earlier than other common parametric estimation tools, and it predicts many risks that are not handled by other tools.

Software Document and Paperwork Sizing

The patented sizing method used in Software Risk Master ™ generates size data not only in terms of function points and logical code statements, but the SRM prototype also produces size estimates for 13 document types. The full commercial version will be able to produce document sizes for more than 100 document types including special documents needed for FDA and FAA certification.

Document sizing is an important topic for large software projects and especially for military and defense software, since “producing paper documents” is the top cost driver for defense applications.

Some defense applications produce more than 200 documents with a total of more than 400 English words for every source code statement. The words cost more than twice as much as the source code. Defense software averages almost three times the document volumes and sizes of civilian projects with the same patterns other than being defense applications.

While web applications and small internal projects may produce few (or no) documents and while Agile projects have very few documents, the fact remains that large systems software and large military software projects have major costs associated with the production of requirements, design, plans, status reports, users manuals, help text and dozens of other paper documents.

While a commercial version of SRM will be able to size more than 100 kinds of documents including those needed for FAA and FDA certification, the current prototype sizes 13 as a proof of concept.

The document sizes shown below are samples for a defense application of 25,000 function points. It is easily seen why document sizing is needed in parametric software estimation tools.

	Document
	Sizes	Pages	Words	Percent
				Complete
1	Requirements	4,936	1,974,490	61.16%
2	Architecture	748	299,110	70.32%
3	Initial design	6,183	2,473,272	55.19%
4	Detail design	12,418	4,967,182	65.18%
5	Test plans	2,762	1,104,937	55.37%
6	Development Plans	1,375	550,000	68.32%
7	Cost estimates	748	299,110	71.32%
8	User manuals	4,942	1,976,783	80.37%
9	HELP text	4,965	1,986,151	81.37%
10	Course materials	3,625	1,450,000	79.85%
11	Status reports	3,007	1,202,721	70.32%
12	Change requests	5,336	2,134,284	66.16%
13	Bug reports	29,807	11,922,934	76.22%
	TOTAL	80,852	32,340,974	69.32%

Predicting document sizes and completeness before requirements is a standard SRM feature. This feature becomes progressively important as application size increases in terms of function points because paperwork volumes go up faster than function point sizes goes up. It is particularly important for defense applications because the main cost drivers for military software are:

Military software cost drivers:
1) The cost of producing English words
2) The cost of finding and fixing bugs
3) The cost of cancelled projects
4) The cost of avoiding security flaws
5) The cost of meetings and communications
6) The cost of programming or coding
7) The cost of project management

The function point communities have concentrated primarily on sizing only in terms of function points and the more recent SNAP metrics. Function points and SNAP are certainly important, but to understand software costs and schedules the sizes of all deliverables need to be predicted too.

SRM predicts size in terms of IFPUG function points and logical code statements, and it also predicts size for document numbers and volumes, and for numbers of test cases needed for each form of test. It also predicts “defect potentials” or probable number of software bugs that might be found in requirements, design, code, user manuals, and “bad fixes” or secondary defects.

SRM also sizes requirements creep and the growth of applications over time. Typically requirements creep is close to 2% per calendar month. An application sized at 10,000 function points are the end of requirements could easily grow to 12,000 function points by the time of delivery.

(The author has been an expert witness in litigation where requirements creep doubled the initial size at requirements; from 10,000 to 20,000 function points over a four-year contract. The litigation was because the client did not want to pay the vendor for the changes, even though the contract specified payments for out-of-scope changes. The court decided in favor of the vendor, because function points are based on user requirements.)

As of 2013 patented SRM sizing method predicts the sizes of more software deliverables than any other tool or method and also does so about six months earlier than any other method. SRM is also the only tool to predict requirements growth throughout development and for five years of post-release usage. Post-release growth averages about 8% per year, with occasional “mid-life kickers” where many new features are added to keep up with competitive applications.

Pattern Matching and Software Benchmark Statistical Analysis

When both sections of the taxonomy are joined together the result is a very powerful tool for pattern analysis or statistical research on software productivity, quality, successes, and failures. The taxonomy also is a good condensation of benchmark data.

Note that the consolidated version includes confidential information that would not be used for published statistical studies. These confidential topics include the name of the company and the name of the project. However if the method is used privately inside of companies such as Microsoft or IBM, they would want to record the proprietary information.

It should be noted that the projects studied by the author using the SRM taxonomy were all studied under non-disclosure agreements. This makes it legally impossible to identify specific companies. Therefore the company and project identification information is concealed and encrypted and not open to public scrutiny.

Software Risk Master ™ Full Benchmark and Estimating Taxonomy

Security Level: Company Confidential

Company Name: XYZ Telecommunications

Business unit: San Jose Development Laboratory

Project Name: Sample PBX Switching system

Project Manager: J. Doe

Data provided by: Capers Jones

Team members interviewed: A. Doe, B. Doe, C. Doe, J.Doe (manager)

Interview method: On-site meeting

Interview clock hours: 3.0

Interview team hours: 12.0

Date of data collection: 03/04/2013

Project start date: 03/09/2013

Desired completion date: 03/09/2014

Actual completion date: Unknown

Country code = 1 (United States)

Region code = 06 (California)

City Code = 408 (San Jose)

Industry code = 1569 (Telecommunications)

Project Nature = 1 (New project)

Project Scope = 21 (New components; new application)

Project Class = 5 (External, bundled with hardware)

Project Type = 14 (Communications or telecommunications)

Problem Complexity = 5 (Average complexity)

Code Complexity = 4 (Below average complexity)

Data Complexity = 6 (Above average complexity)

Primary Size metric = 1 (IFPUG function points with SNAP)

Secondary size metric = 8 (Logical code statements)

Programming language(s) = 14 (CHILL)

Programming language level = 3

Certified reuse percent = 15% (default – can be adjusted by users)

Development compensation = $10,000 per month (default)

Maintenance compensation = $8,000 per month (default)

User compensation = $10,000 per month (default)

Additional project costs = $0 (default)

Project financial value (if known) = $0 (default)

Project goals = 3 (Average staffing; average schedule)

Work hours per month = 132 hours per month (default)

Monthly unpaid overtime hours = 0 (default)

Monthly paid overtime hours = 0 (default)

Project CMMI level = 3 (default)

Project Methodology = 8 Agile/Scrum (default)

Methodology experience = 2 (Above average: majority of team are experts)

Client experience level = 4 (Below average: inexperienced with project type)

Project management experience = 2 (Above average: managed many similar projects)

Development team experience = 3 (Average)

Test team experience = 1 (Well above average: all certified test personnel)

Quality assurance experience = 3 (Average)

Customer support experience = 5 (Very inexperienced: totally new to project type)

Maintenance team experience = 3 (Average)

Note that the taxonomy captures in a concise fashion all of the major factors that influence the results of software projects for better or for worse. A good taxonomy is a working tool for many scientific fields, and software engineering is no exception.

By converting all of the critical variable information into numeric form statistical benchmark studies are easy to carry out.

The automated prototype SRM tool uses a short version of the author’s full assessment and benchmark questionnaire. A full commercial version would include additional topics that will collect and predict the results of:

Any combination of ISO standards used for the application.
The presence or absence of certified project personnel such as by the Project Management Institute (PMI) or various test and quality assurance professional associations, or by Microsoft, IBM, and other corporations that offer certifications.
Specific tool suites used for the application such as the Mercury test tool suite, the Coverity or CAST static analysis tools, or the CAI automated project work bench (APO).

The full version of the SRM questionnaire is annotated like a Michelin Guide. Questions are annotated with a star system. The four-star “****” questions are the most important.

The original idea for SRM was to capture every factor that influences software projects by as much as 1%. However this turned out to be impossible for legal and policy reasons. A number of influential factors cannot be measured or studied. Topics where law or policy prohibits measurements include the appraisal scores of team members, their academic grade averages, their age, and their membership in trade unions.

As to the latter factor, trade unions, in many organizations where software personnel are unionized it is not permitted to collect benchmark data or measure team performance at all because these violate union rules.

Software Risk Master ™ Benchmarks and Estimating Output Information

The input taxonomy data discussed here feeds into the Software Risk Master ™ tool. The outputs from the tool include, but are not limited to, the following set of 45 factors:

Software Risk Master ™ Outputs

Size in IFPUG function points
Size in logical code statements
Probable size of requirements creep
Probable size of deferred functions
Size in 12 other metrics (story points, use-case points, COSMIC, NESMA, etc.)
Size and completeness of software documents
Numbers of test cases needed for all test stages
Development staffing by activity
Development staffing by occupation (analysts, coders, testers, etc.)
Development schedules by activity and net schedule
Probability of achieving desired target schedule
Development costs by activity and total cost
Productivity in work hours per function point
Productivity in function points per staff month
Development costs per activity and total costs
Development costs per function point by activity and in total
Defect potentials by origin (requirements, design, code, documents, bad fixes)
Defect prevention effectiveness (JAD, Quality Function Deployment, etc.)
Pre-test defect removal efficiency for inspections and static analysis
Testing defect removal efficiency for all major forms of testing
Delivered defects by severity level
Cost of quality (COQ) for the application
Technical Debt (TD) for the application
Total Cost of Ownership (TCO)
Probable number of “error prone modules” if any
Reliability in mean time to failure (MTTF)
Stabilization period after delivery
Security vulnerabilities present at delivery
Installation and user training
Maintenance (defect repairs) for five years after delivery
Enhancements (new features) for five years after delivery
Customer support for five years after delivery
Project management for five year after delivery
Odds of litigation for breach of contract for outsource projects
Cost of litigation for plaintiff and defendant if case goes through trial
Venture capital investment for start-up software companies
Dilution of ownership due to multiple rounds of venture capital
Risk of project cancelation
Risk of major schedule delays
Risk of major cost overruns
Risk of litigation for poor quality
Risk of poor customer satisfaction
Risk of executive dissatisfaction
Risk of poor team morale
Risk of post-release security attacks

The taxonomy and Software Risk Master ™ are designed for ease of use and achieving rapid results. SRM can size any application in about 90 seconds. The full set of input questions can be entered in less than five minutes for experienced users and no more than 10 minutes for first-time users.

Once the inputs are complete, SRM produces estimates in just a few seconds. The speed is so fast that SRM works well as a teaching tool because students don’t have to wait or spend time carrying out model calibration.

Another benefit of high-speed data entry and quick predictions is that it makes it very interesting and even enjoyable to try alternate scenarios. For example SRM can predict the results of Waterfall, Agile, XP, RUP, and TSP in less than 15 minutes. About five minutes are needed for the initial inputs, and then only about 30 seconds to change assumptions to switch from one method to another.

Table 2 shows a sample development prediction from Software Risk Master ™ for a generic systems software application of 1,000 function points or 53,000 Java statements:

Table 2: Example of Activity Software Estimating Equations

Application Class				External Systems Software
Programming Language(s)				Java
Application Size in Function Points				1,000
Application Size in Lines of Code				53,000
Work Hours per Month				132
Average Monthly Salary				$10,000

Activity	Ascope	Prate	Whours/	Staff	Effort	Schedule	Cost	Percent
			Func. Pt.		Months	Months

Requirements	500	75.00	1.76	2.00	13.33	6.67	$133,333	10.00%
Prototyping	500	175.00	0.75	2.00	5.71	2.86	$57,143	4.29%
Design	400	75.00	1.76	2.50	13.33	5.33	$133,333	10.00%
Design Reviews	250	175.00	0.75	4.00	5.71	1.43	$57,143	4.29%
Coding	200	30.00	4.40	5.00	33.33	6.67	$333,333	25.01%
Code Inspections	125	160.00	0.83	8.00	6.25	0.78	$62,500	4.69%
Testing	150	35.00	3.77	6.67	28.57	4.29	$285,714	21.44%
Quality Assurance	1000	175.00	0.75	1.00	5.71	5.71	$57,143	4.29%
Documentation	1000	215.00	0.61	1.00	4.65	4.65	$46,512	3.49%
Management	1000	60.00	2.20	1.00	16.67	16.67	$166,667	12.50%

TOTAL	147	7.50	17.59	6.80	133.28	16.67	$1,332,821	100.00%

Note that some abbreviations were needed to fit the table on the page in portrait mode.

The column labeled “Ascope” stands for “Assignment Scope” which is the number of function points one person can be responsible for.

The column labeled “Prate” stands for “Production Rate” and is the amount of functionality that one person can finish in one calendar month with 132 work hours. Raising or lowering the number of work hours per month has an impact on this variable.

The column labeled “Whours” stands for “Work hours per function point.” This is essentially the reciprocal of function points per staff month. The two are easily converted back and forth. Here too raising or lowering the number of work hours would change the result.

Unpaid overtime would shorten schedules and lower costs, since the work is being done for free. Paid overtime, on the other hand, would shorten schedules but would raise costs due to the normal premium pay of 150% for paid overtime. In some cases special overtime such as work on public holidays may have a higher premium of 200%.

The default metrics for showing productivity rates are work hours per function point and function points per work month. It is planned in later versions to allow users to select any time unit that matches local conventions, such as hours, days, weeks, months, or years. Smaller projects below 250 function points normally use hours. Larger systems above 10,000 function points normally use months.

The sample above uses only 10 activities. In a commercial version of SRM the number of activities can be expanded to 50 if the users want a more detailed prediction. In normal use, which is prior to the completion of requirements, the details of 50 activities are a distraction. Ten activities are all that are needed to show clients the likely outcome of a project before its requirements are fully known.

SRM has a utility feature that makes side-by-side comparison easy. The utility is able to convert applications to any desired even number. For example if three PBX applications were 1,250, 1,475, and 1,600 function points in size SRM can convert all of them to an even 1,500 for side-by-side comparisons. This is a special feature that is not true estimation because the original technology stack is locked. However the size adjustments do match the empirical result that as sizes get bigger, paperwork and defect volumes grow faster than size in function points or logical code statements.

Some of the samples in this report used the size conversion feature, such as the examples of the 10 PBX switching applications shown below.

Because changing assumptions is easy to do, it is possible to explore many different options for a future project. Since PBX switches were discussed earlier, table 3 illustrates the possible results for doing the same PBX switch using 10 different programming languages:

Table 3: Productivity Rates for 10 Versions of the Same Software Project
(A PBX Switching system of 1,500 Function Points in Size)

Language	Effort	Funct. Pt.	Work Hrs.	LOC per	LOC per
	(Months)	per Staff	per	Staff	Staff
		Month	Funct. Pt.	Month	Hour

Assembly	781.91	1.92	68.81	480	3.38
C	460.69	3.26	40.54	414	3.13
CHILL	392.69	3.82	34.56	401	3.04
PASCAL	357.53	4.20	31.46	382	2.89
PL/I	329.91	4.55	29.03	364	2.76
Ada83	304.13	4.93	26.76	350	2.65
C++	293.91	5.10	25.86	281	2.13
Ada95	269.81	5.56	23.74	272	2.06
Objective C	216.12	6.94	19.02	201	1.52
Smalltalk	194.64	7.71	17.13	162	1.23

Average	360.13	4.17	31.69	366	2.77

In addition to productivity measures and predictions, SRM also carries out quality measures and predictions. Table 4 shows the possible quality results for the same PBX switch using 10 different programming languages:

Table 4: Delivered Defects for 10 Versions of the Same Software Project
(A PBX Switching System of 1,500 Function Points in Size)

Language	Total	Defect	Delivered	Delivered	Delivered
	Defects	Removal	Defects	Defects	Defects
		Efficiency		per	per
				Funct. Pt.	KLOC

Assembly	12,835	91.00%	1,155	0.77	3.08
C	8,813	92.00%	705	0.47	3.70
CHILL	8,093	93.00%	567	0.38	3.60
PASCAL	7,635	94.00%	458	0.31	3.36
PL/I	7,276	94.00%	437	0.29	3.64
Ada83	6,981	95.00%	349	0.23	3.28
C++	6,622	93.00%	464	0.31	5.62
Ada95	6,426	96.00%	257	0.17	3.50
Objective C	5,772	96.00%	231	0.15	5.31
Smalltalk	5,510	96.00%	220	0.15	7.00

Average	7,580	94.00%	455	0.30	3.45

Software Risk Master ™ predicts size, productivity, and quality using both function points and logical code statements. However readers are cautioned that only function points produce correct economic results.

Lines of code metrics actually reverse true economic productivity results and make the lowest-level programming languages look better than modern high-level languages. Table 5 shows the productivity rankings of the 10 samples as measured using both function points and lines of code:

Table 5: Rankings of Productivity Levels Using Function Point Metrics
	and Lines of Code (LOC) Metrics

	Productivity Ranking		Productivity Ranking
	Using Function Point		Using LOC Metrics
	Metrics

1	Smalltalk	1	Assembly
2	Objective C	2	C
3	Ada95	3	CHILL
4	C++	4	PASCAL
5	Ada83	5	PL/I
6	PL/I	6	Ada83
7	PASCAL	7	C++
8	CHILL	8	Ada95
9	C	9	Objective C
10	Assembly	10	Smalltalk

Because “lines of code” metrics violate standard economic assumptions and show incorrect reversed productivity rates, LOC should be considered to be professional malpractice for economic studies that involve more than one programming language.

Incidentally the venerable “cost per defect metric” also violates standard economic assumptions and does not show quality economics at all. Cost per defect penalizes quality and achieves its lowest values for the buggiest software applications!

SRM displays data using both LOC and cost per defect as well as function points. The reason for this is to show clients exactly what is wrong with LOC and cost per defect, because the errors of these metrics are not well understood.

Another use of pattern matching is to compare various software development methods. Table 4 illustrates the results for 10 common software development methods. Table 6 is not a PBX switch but a generic IT application of 1000 function points:

	Table 6: Software Schedules, Staff, Effort, Productivity

	Methodologies		Schedule	Staff	Effort	FP	Development
			Months		Months	Month	Cost

1	Extreme (XP)		11.78	7	84	11.89	$630,860
2	Agile/scrum		11.82	7	84	11.85	$633,043
3	TSP		12.02	7	86	11.64	$644,070
4	CMMI 5/ spiral		12.45	7	83	12.05	$622,257
5	OO		12.78	8	107	9.31	$805,156
6	RUP		13.11	8	101	9.58	$756,157
7	Pair/iterative		13.15	12	155	9.21	$1,160,492
8	CMMI 3/iterative		13.34	8	107	9.37	$800,113
9	Proofs/waterfall		13.71	12	161	6.21	$1,207,500
10	CMMI 1/waterfall		15.85	10	158	6.51	$1,188,870

		Average	13.00	8.6	112.6	9.762	$844,852

When used in estimating mode, Software Risk Master ™ could produce these 10 examples in roughly 12 minutes. It would take about 5 minutes for the first prediction and then changing methodologies takes less than 30 seconds each. Of course these 10 examples are all the same size. Sizing each one separately takes about 90 seconds per application with SRM.

Large software projects can have up to 116 different kinds of occupation group. In today’s world many specialists are needed. The current prototype of SRM predicts the staffing levels for 20 of these occupation groups.

Staffing predictions vary with project size as do the numbers of kinds of specialists that are likely to be deployed.

The following list of specialists and generalists is taken from a prediction for a 25,000 function point military application.

At this large size all 20 of the occupation groups are used and the organization structure will no doubt involve over a dozen organizational units such as a project office, several development groups, one or more test teams, an integration and configuration control group, software quality assurance, technical publications, and others. There will also be metrics specialists and function point counters, although function point counting is often carried out by contract personnel rather than by in-house employees.

	Occupation Groups and Part-Time Specialists

		Normal	Peak
		Staff	Staff

1	Programmers	94	141
2	Testers	83	125
3	Designers	37	61
4	Business analysts	37	57
5	Technical writers	16	23
6	Quality assurance	14	22
7	1st line managers	15	21
8	Data base administration	8	11
9	Project Office staff	7	10
10	Administrative support	8	11
11	Configuration control	5	7
12	Project librarians	4	6
13	2nd line managers	3	4
14	Estimating specialists	3	4
15	Architects	2	3
16	Security specialists	1	2
17	Performance specialists	1	2
18	Function point counters	1	2
19	Human factors specialists	1	2
20	3rd line managers	1	1

There are also predictions for organization structures. For example large systems above 10,000 function points in size normally have project offices. They also tend to have specialized test departments rather than having testing done by the developers themselves.

Correcting “Leakage” From Software Benchmark Data

A common benchmark problem with software projects developed under a cost-center model is that of “leakage.” Historical data has gaps and omissions. Sometimes omits more than 60% of the actual effort and costs. The most common omissions are unpaid overtime, management, and the work of part-time specialists such as quality assurance, business analysts, function point counters, and project office personnel.

Projects that are built under time and materials contract or under a profit model tend to be more accurate, since they need high accuracy in order to bill clients the correct amounts.

Software Risk Master ™ has an effective method for correcting leakage that is based on pattern matching. Prior to collecting actual benchmark data the project is run through SRM in predictive estimating mode.

The SRM algorithms and knowledge base know the most common patterns of leakage and offer corrected values. If the clients agree with the SRM predictions, then the SRM estimate becomes the benchmark. If the client wants to add information or make adjustments, they can be made to the SRM outputs, which speeds up and simplifies benchmark data collection time. Following in table 7 are 25 software development activities with the ones that tend to “leak” being identified:

Table 7: Common Leakage Patterns from Software Historical Data

Activities Performed Completeness of historical data

01 Requirements Missing or Incomplete

02 Prototyping Missing or Incomplete

03 Architecture Missing or Incomplete

04 Project planning Missing or Incomplete

05 Initial analysis and design Missing or Incomplete

06 Detail design Incomplete

07 Design reviews Missing or Incomplete

08 Coding Complete

09 Reusable code acquisition Missing or Incomplete

10 Purchased package acquisition Missing or Incomplete

11 Code inspections Missing or Incomplete

12 Independent verification and validation Complete

13 Configuration management Missing or Incomplete

14 Integration Missing or Incomplete

15 User documentation Missing or Incomplete

16 Unit testing Incomplete

17 Function testing Incomplete

18 Integration testing Incomplete

19 System testing Incomplete

20 Field testing Missing or Incomplete

21 Acceptance testing Missing or Incomplete

22 Independent testing Complete

23 Quality assurance Missing or Incomplete

24 Installation and training Missing or Incomplete

25 Project management Missing or Incomplete

26 Total project resources, costs Incomplete

On average projects developed under a cost-center model, which means that they do not charge users for development, historical data is only about 37% complete.

Quality data also leaks, since many companies don’t measure bugs or defects until after release. Only a few major companies such as IBM and AT&T start collecting defect during requirements and continue through static analysis, inspections, all forms of testing, and out into the fields.

IBM was so interested in complete quality data that they asked for volunteers to record bugs found via desk checking and unit testing, which are normally unmeasured private forms of defect removal. The volunteer data allowed IBM to calculate the defect removal efficiency levels of both desk checks and unit testing.

Because finding and fixing bugs is the #1 cost driver for major software projects, SRM is very thorough in both measuring and predicting the results of all known forms of defect removal: inspections, static analysis, and many kinds of testing.

Table 8 shows approximate levels of defect removal efficiency a full series of pre-test defect removal and test stages. Table 8 illustrates a major system of 10,000 function points or 533,000 Java statements.

Few real projects use so many different forms of defect removal so table 8 is a hypothetical example of really advanced quality control:

	Table 8: Pre-Test and Test Defect Removal Predictions from SRM
	(Note: 10,000 function points or 533,000 Java statements)

	Pre-Test Defect	Architect.	Require.	Design	Code	Document	TOTALS
	Removal Methods	Defects per	Defects per	Defects per	Defects per	Defects per
		Function	Function	Function	Function	Function
		Point	Point	Point	Point	Point

	Defect Potentials per FP	0.25	0.95	1.15	1.35	0.55	4.25

	Defect potentials	3,408	12,950	15,676	18,403	7,497	57,935

	Security flaw %	1.50%	0.75%	2.00%	3.00%	0.00%	7.25%

1	Requirement inspection	5.00%	87.00%	10.00%	5.00%	8.50%	25.14%
	Defects discovered	170	11,267	1,568	920	637	14,562
	Bad-fix injection	5	338	47	28	19	437
	Defects remaining	3,232	1,346	14,062	17,455	6,841	42,936

2	Architecture inspection	85.00%	12.00%	10.00%	2.50%	12.00%	12.98%
	Defects discovered	2,748	161	1,406	436	821	5,572
	Bad-fix injection	82	5	42	13	25	167
	Defects remaining	402	1,179	12,613	17,006	5,995	37,196

3	Design inspection	10.00%	14.00%	87.00%	7.00%	26.00%	37.45%
	Defects discovered	40	165	10,974	1,190	1,559	13,928
	Bad-fix injection	1	5	329	36	47	696
	Defects remaining	361	1,009	1,311	15,779	4,390	22,850

4	Code inspection	12.50%	15.00%	25.00%	85.00%	15.00%	63.87%
	Defects discovered	45	151	328	13,413	658	14,595
	Bad-fix injection	1	5	10	402	20	438
	Defects remaining	315	853	973	1,965	3,712	7,817

5	Static Analysis	2.00%	2.00%	10.00%	87.00%	3.00%	24.83%
	Defects discovered	6	17	97	1,709	111	1,941
	Bad-fix injection	0	1	3	51	3	58
	Defects remaining	308	836	873	204	3,597	5,818

6	IV & V	10.00%	12.00%	23.00%	7.00%	20.00%	18.32%
	Defects discovered	31	100	201	14	719	1,066
	Bad-fix injection	1	3	6	0	22	32
	Defects remaining	276	732	666	189	2,856	4,720

7	SQA review	10.00%	17.00%	20.00%	12.00%	17.00%	25.52%
	Defects discovered	28	125	133	23	486	794
	Bad-fix injection	1	4	4	1	15	40
	Defects remaining	248	604	529	166	2,356	3,887

	Pre-test defects removed	3,160	12,346	15,148	18,237	5,142	54,032
	Pre-test efficiency %	92.73%	95.33%	96.63%	99.10%	68.58%	93.26%


	Test Defect Removal
	Stages
		Architect.	Require.	Design	Code	Document	Total
1	Subroutine testing	0.00%	1.00%	5.00%	45.00%	2.00%	3.97%
	Defects discovered	0	6	26	75	47	154
	Bad-fix injection	0	0	1	2	1	5
	Defects remaining	248	598	502	89	2,307	3,728

2	Unit testing	2.50%	4.00%	7.00%	35.00%	10.00%	8.42%
	Defects discovered	6	24	35	31	231	327
	Bad-fix injection	0	1	1	1	7	10
	Defects remaining	241	573	465	57	2,070	3,391

3	Function testing	7.50%	5.00%	22.00%	37.50%	25.00%	20.29%
	Defects discovered	18	29	102	21	517	688
	Bad-fix injection	1	1	3	1	16	21
	Defects remaining	223	544	360	35	1,537	2,682

4	Regression testing	2.00%	2.00%	5.00%	33.00%	7.50%	5.97%
	Defects discovered	4	11	18	12	115	160
	Bad-fix injection	0	0	1	0	3	5
	Defects remaining	218	533	341	23	1,418	2,517

5	Integration testing	6.00%	20.00%	27.00%	33.00%	22.00%	21.11%
	Defects discovered	13	107	92	8	312	531
	Bad-fix injection	0	3	3	0	9	16
	Defects remaining	205	423	246	15	1,097	1,970

6	Performance testing	14.00%	2.00%	20.00%	18.00%	2.50%	5.92%
	Defects discovered	29	8	49	3	27	117
	Bad-fix injection	1	0	1	0	1	3
	Defects remaining	175	414	196	12	1,068	1,850

7	Security testing	12.00%	15.00%	23.00%	8.00%	2.50%	8.42%
	Defects discovered	21	62	45	1	27	156
	Bad-fix injection	1	2	1	0	1	5
	Defects remaining	154	350	149	11	1,041	1,690

8	Usability testing	12.00%	17.00%	15.00%	5.00%	55.00%	39.86%
	Defects discovered	18	60	22	1	573	673
	Bad-fix injection	1	2	1	0	17	20
	Defects remaining	135	289	126	11	451	996

9	System testing	16.00%	12.00%	18.00%	38.00%	34.00%	23.74%
	Defects discovered	22	35	23	4	153	236
	Bad-fix injection	1	1	1	0	5	7
	Defects remaining	112	253	103	7	293	752

10	Cloud testing	10.00%	5.00%	13.00%	10.00%	20.00%	12.84%
	Defects discovered	11	13	13	1	59	97
	Bad-fix injection	0	0	0	0	2	3
	Defects remaining	101	240	89	6	233	669

11	Independent testing	12.00%	10.00%	11.00%	10.00%	23.00%	14.96%
	Defects discovered	12	24	10	1	54	100
	Bad-fix injection	0	1	0	0	2	3
	Defects remaining	88	215	79	5	178	566

12	Field (Beta) testing	14.00%	12.00%	14.00%	17.00%	34.00%	19.55%
	Defects discovered	12	26	11	1	60	111
	Bad-fix injection	0	1	0	0	2	3
	Defects remaining	76	189	68	4	115	452

13	Acceptance testing	13.00%	14.00%	15.00%	12.00%	24.00%	19.43%
	Defects discovered	11	22	9	1	46	89
	Bad-fix injection	0	1	0	0	1	3
	Defects remaining	65	166	58	4	68	360

	Test Defects Removed	183	438	471	162	2,288	3,527
	Testing Efficiency %	73.96%	72.55%	89.05%	97.86%	97.11%	90.74%

	Total Defects Removed	3,343	12,784	15,618	18,399	7,429	57,559
	Total Bad-fix injection	100	384	469	552	223	1,727
	Cumulative Removal %	98.11%	98.72%	99.63%	99.98%	99.09%	99.35%

	Remaining Defects	65	166	58	4	68	376
	High-severity Defects	10	28	11	1	9	56
	Security flaws	0	0	1	0	0	2

	Remaining Defects	0.0047	0.0122	0.0042	0.0003	0.0050	0.0276
	per Function Point

	Remaining Defects	4.73	12.17	4.25	0.26	5.00	27.58
	per K Function Points

	Remaining Defects	0.12	0.31	0.11	0.01	0.13	0.70
	per KLOC

Table 8 shows a total of 8 pre-test removal activities and 13 test stages. Very few projects use this many forms of defect removal. An “average” U.S. software project would static analysis and probably 4 kinds of testing: 1) unit test, 2) function test, 3) regression test, and 4) system test. Average U.S. defect removal circa 2013 is below 90%. Only a few top companies such as IBM achieve DRE results higher than 99%.

Military and defense software, medical systems, and systems software for complex physical devices such as telephone switching systems and computer operating systems would use several kinds of inspections, static analysis, and at least six to eight forms of testing. For example only military projects tend to use independent verification and validation (IV&V) and independent testing.

Table 8 is intended to show the full range of defect removal operations that can be measured and predicted using Software Risk Master ™. Table 8 also assumes that all defect removal personnel are top-guns, fully trained, and that test personnel are certified.

Pattern matching is useful for measuring and predicting quality as well as for measuring and predicting software development productivity.

Summary and Conclusions about Software Pattern Matching

Pattern matching based on formal taxonomies has had a long history in science and has proven its value time and again. Pattern matching for business decisions such as real estate appraisals or automobile costs are more recent but no less effective and useful.

The Software Risk Master ™ tool uses pattern matching as the basis for sizing applications, process assessments, benchmark data collection, and predictive estimating of future software projects.

As of 2013 more than 95% of software applications are not “new” in the sense that they have never been designed or built before. The vast majority of modern software projects are either replacements for legacy applications or minor variations on existing software.

Whenever there are large numbers of similar projects that have been built before and have accurate historical data available, pattern matching is the most effective and efficient way of capturing and using historical results to predict future outcomes.

References and Readings on Software Pattern Matching

The primary citation for modern taxonomic analysis is:

Linneaeus, Carl; Systema Naturae; privately published in Sweden in 1735.

The American Society of Indexing has a special interest group on taxonomy creation and analysis: www.taxonomies-sig.org.

Note: All of the author’s books use various forms of taxonomy such as defect classifications, defect removal methods, and application classes and types.

Jones, Capers; “A Short History of Lines of Code Metrics”; Namcook Analytics Technical Report; Narragansett, RI; 2012.

This report provides a mathematical proof that “lines of code” metrics violate standard economic assumptions. LOC metrics make requirements and design invisible. Worse, LOC metrics penalize high-level languages. The report asserts that LOC should be deemed professional malpractice if used to compare results between different programming languages. There are other legitimate purposes for LOC, such as merely measuring coding speed.

Jones, Capers; “A Short History of the Cost Per Defect Metrics”; Namcook Analytics Technical Report; Narragansett, RI 2012.

This report provides a mathematical proof that “cost per defect” penalizes quality and achieves its lowest values for the buggiest software applications. It also points out that the urban legend that “cost per defect after release is 100 times larger than early elimination” is not true. The reason for expansion of cost per defect for down-stream defect repairs is due to ignoring fixed costs. The cost per defect metric also ignores many economic topics such as the fact that high quality leads to shorter schedules.

Jones, Capers; “Early Sizing and Early Risk Analysis”; Capers Jones & Associates LLC;

Narragansett, RI; July 2011.

Jones, Capers and Bonsignour, Olivier; The Economics of Software Quality; Addison Wesley Longman, Boston, MA; ISBN 10: 0-13-258220—1; 2011; 585 pages.

Jones, Capers; Software Engineering Best Practices; McGraw Hill, New York, NY; ISBN 978-0-07-162161-8; 2010; 660 pages.

Jones, Capers; Applied Software Measurement; McGraw Hill, New York, NY; ISBN 978-0-07-150244-3; 2008; 662 pages.

Jones, Capers; Estimating Software Costs; McGraw Hill, New York, NY; 2007; ISBN-13: 978-0-07-148300-1.

Jones, Capers; Software Assessments, Benchmarks, and Best Practices; Addison Wesley Longman, Boston, MA; ISBN 0-201-48542-7; 2000; 657 pages.

Jones, Capers; Conflict and Litigation Between Software Clients and Developers; Software Productivity Research, Inc.; Burlington, MA; September 2007; 53 pages; (SPR technical report).

CERM ® RISK INSIGHTS

Future of Quality: Risk™

#11 – SOFTWARE PATTERN MATCHING – (C) CAPERS JONES

Leave a Reply Cancel reply