Pattern matching is a predictive methodology that uses a formal taxonomy to compare results of historical software projects against the possible outcomes of new software projects that are about to start development.
Pattern matching for software starts with a questionnaire that uses multiple-choice questions. These questions elicit information about a new project, such as its nature, scope, class, type, and complexity.
The answers to the questions form a “pattern” that is used to extract data from historical projects that have the same pattern, or a pattern that is very close. Mathematical algorithms have been developed to handle partial matches to historical patterns.
Mathematical approximations are necessary because the total number of patterns formed by the proprietary taxonomy totals 214,200,000. Most of these patterns have never occurred and never will occur. The nucleus of common patterns that occur many times for software is closer to 20,000.
In today’s world pattern matching is a good choice for software sizing and estimating because almost 95% of software applications are not “new” in the sense of never being done before. The majority today are either legacy replacements or minor variations to existing software.
Pattern matching and formal taxonomies have been widely used in science and business, but are comparatively new for software.
Software pattern matching as described here is based on a proprietary taxonomy developed by the author, Capers Jones. The taxonomy uses multiple-choice questions to identify the key attributes of software projects. The taxonomy is used to collect historical benchmark data and also as basis for estimating future projects. The taxonomy is also used for sizing applications.
For sizing, the taxonomy includes project nature, scope, class, type, problem complexity, code complexity, and data complexity. For estimating, additional parameters such as CMMI level, methodology, and team experience are also used.
The pattern matching methodology for software sizing is patent pending and the inventor is Capers Jones. The utility patent application is U.S. Patent Application No. 13/352,434 filed January 18, 2012, called early and rapid sizing for software applications.
The pattern matching approach for software sizing is a standard feature of the Software Risk Master ™ tool (SRM). For example the 2013 SRM taxonomy entries for “project scope” include 34 entries:
Project Scope | ||
1 |
Algorithm | |
2 |
Maintenance: defect repair | |
3 |
Subroutine | |
4 |
Module | |
5 |
Reusable module | |
6 |
Enhancement to a program | |
7 |
Small enhancement to a system | |
8 |
Disposable prototype or 7% of application | |
9 |
Large enhancement to a program | |
10 |
Evolutionary prototype or 12% of application | |
11 |
Average enhancement to a system | |
12 |
Subprogram | |
13 |
Standalone program: Smartphone | |
14 |
Standalone program: tablet | |
15 |
Standalone program: PC | |
16 |
Large enhancement to a system | |
17 |
Standalone program: Web | |
18 |
Standalone program: Cloud | |
19 |
Standalone program: embedded | |
20 |
Standalone program: mainframe | |
21 |
Multi-component program | |
22 |
Component of a departmental system | |
23 |
Release of a system (base plus) | |
24 |
Component of a corporate system | |
25 |
Component of an enterprise system | |
26 |
New social network system | |
27 |
New departmental system | |
28 |
Component of a national system | |
29 |
New corporate system | |
30 |
Component of a global system | |
31 |
Massively multiplayer game application | |
32 |
New enterprise system | |
33 |
New national systems | |
34 |
New global system |
The in the SRM taxonomy for “project type” include these 25 forms of software:
Project Type | |||
1 |
Nonprocedural (generated, query, spreadsheet) | ||
2 |
Batch application | ||
3 |
Interactive application | ||
4 |
Batch database application | ||
5 |
Interactive GUI application | ||
6 |
Interactive database application | ||
7 |
Web application | ||
8 |
Client/server application | ||
9 |
Data warehouse application | ||
10 |
Big data application | ||
11 |
Computer game | ||
12 |
Scientific or mathematical program | ||
13 |
System support or middleware application | ||
14 |
Service oriented architecture (SOA) | ||
15 |
Expert system | ||
16 |
Communications or telecommunications | ||
17 |
Process control applications | ||
18 |
Trusted systems | ||
19 |
Embedded or real-time applications | ||
20 |
Graphics, animation, or image processing applications | ||
21 |
Multimedia applications | ||
22 |
Robotics or mechanical automation applications | ||
23 |
AI applications | ||
24 |
Neural net applications | ||
25 |
Hybrid: multiple types |
The total numbers of discrete elements in the full software sizing taxonomy are:
Project Nature |
12 |
Project Scope |
34 |
Project Class |
21 |
Project Type |
25 |
Problem complexity |
10 |
Code Complexity |
10 |
Data Complexity |
10 |
Sum |
122 |
Permutations |
214,200,000 |
With 122 total elements the permutations of the full taxonomy total to 214,200,000 possible patterns. Needless to say more than half of these patterns have never occurred and will never occur.
For the software industry in 2013 the total number of patterns that occur with relatively high frequency is much smaller: about 20,000.
Using Pattern Matching for Sizing Software Applications
To use pattern matching for software sizing, the clients provide answers to the multiple-choice taxonomy questions. The answers to these questions form a distinct “pattern.”
The client’s pattern for a project is then compared against the Software Risk Master ™ knowledge base. Projects with the same or nearly the same patterns are selected.
Due to the large numbers of projects examined and measured over the years, mathematical algorithms have been developed that are based on thousands of projects. These algorithms are quick and also enable matches of patterns that are close but not identical to a client’s taxonomy.
Rather than an actual scan for identical patterns, the SRM algorithms condense the original data and speed up the calculations to a few seconds.
For example if a client were interested in a PBX switching system perhaps a dozen similar projects with the same pattern could be found. These historical PBX switching projects would range from about 1,200 to perhaps 1,700 function points in size, and average about 1,500. The data from the PBX results would be aggregated and presented to the client with the average size being the primary data point for sizing.
However the SRM algorithms are already set for PBX switching systems so merely specifying that type of application will generate a size of around 1,500 function points without needing to scan for specific PBX projects.
Some additional geographic information is also part of the taxonomy, but has no impact on applications size. The full set of topics in the SRM sizing taxonomy would look like the table shown below.
When used with the SRM software tool developed for the invention, four additional factors from public sources are part of the taxonomy (country, region, industry, and city):
Software Risk Master ™ Full Sizing Taxonomy
Country code = 1 (United States)
Region code = 06 (California)
City Code = 408 (San Jose)
Industry code = 1569 (Telecommunications)
Project Nature = 1 (New project)
Project Scope = 21 (New components; new application)
Project Class = 5 (External, bundled with hardware)
Project Type = 14 (Communications or telecommunications)
Problem Complexity = 5 (Average complexity)
Code Complexity = 4 (Below average complexity)
Data Complexity = 6 (Above average complexity)
Primary Size metric = 1 (IFPUG function points with SNAP)
Secondary size metric = 8 (Logical code statements)
Programming language(s) = 14 (CHILL)
Programming language level = 3
Certified reuse percent = 15% (default)
By using numeric codes the taxonomy allows sophisticated statistical analysis. Data can be analyzed by country, by industry, by application type, by application size, by programming language, by metric, by complexity, or by any combination of factors.
The first four items in the full taxonomy use public data. For example the “industry code” is the North American Industry Classification (NAIC) code published by the U.S. Department of Commerce. The country code is taken from the international telephone calling codes. The city code is the telephone area code. The region code for the United States is taken from an alphabetical list of the 50 states published on several web sites and readily available.
The taxonomy is the key to software pattern matching, and indeed a critical topic for many kinds of scientific and statistical analysis.
For sizing, pattern matching is not counting function points. The function points have already been counted for the historical projects. Pattern matching is an effective method for using historical data to show clients the probable size and effort for similar future projects.
Pattern matching does not require any knowledge of the inner structure of the application. It happens that software projects that share the same patterns of external attributes are also about the same size and often have similar schedules, staff sizes, effort, and costs (when adjusted for pay scales, countries, and industries).
Pattern matching provides an early, quick, and accurate method for sizing and estimating software projects based on historical projects with similar patterns and attributes. The software taxonomy of size, nature, scope, class, type, and complexity are key predictors of software application size. One reason for the accuracy of pattern matching is because of the precision of the proprietary taxonomy.
In a sense pattern matching works like a GPS system. By comparing signals from several satellites a GPS receiver can show position within a few yards. With software pattern matching comparing the “signals” from the software taxonomy can provide precise information about software projects.
Pattern matching can produce sizes for software projects in about 90 seconds using the Software Risk Master™ tool. Full development, schedule, staffing, effort, cost, quality, and risk estimates take less than 5 minutes.
Shown below in table 1 are 40 samples sized using the SRM pattern-matching approach. The length of time needed to create these 40 size examples was about 75 minutes or 1.88 minutes per application.
Table 1: Examples of Software Size via Pattern Matching
Using Software Risk Master ™
Application Size in IFPUG Function Points
- Oracle 229,434
- Windows 7 (all features) 202,150
- Microsoft Windows XP 66,238
- Google docs 47,668
- Microsoft Office 2003 33,736
- F15 avionics/weapons 23,109
- VA medical records 19,819
- Apple I Phone 19,366
- IBM IMS data base 18,558
- Google search engine 18,640
- Linux 17,505
- ITT System 12 switching 17,002
- Denver Airport luggage (original) 16,661
- Child Support Payments (state) 12,546
- Facebook 8,404
- MapQuest 3,793
- Microsoft Project 1,963
- Android OS (original version) 1,858
- Microsoft Excel 1,578
- Garmin GPS navigation (hand held) 1,518
- Microsoft Word 1,431
- Mozilla Firefox 1,342
- Laser printer driver (HP) 1,248
- Sun Java compiler 1,185
- Wikipedia 1,142
- Cochlear implant (embedded) 1,041
- Microsoft DOS circa 1998 1,022
- Nintendo Gameboy DS 1,002
- Casio atomic watch 933
- Computer BIOS 857
- KnowledgePlan 883
- Function Point Workbench 714
- Norton anti-virus 700
- SPQR/20 699
- Golf handicap analysis 662
- Google Gmail 590
- Twitter (original circa 2009) 541
- Freecell computer solitaire 102
- Software Risk Master™ prototype 38
- ILOVEYOU computer worm 22
It should be noted that manual function point analysis proceeds at a rate of perhaps 500 function points counted per day. To count function points manually for the first example, Oracle, at 229,434 function points would require roughly 459 working days of manual function point analysis. Software Risk Master ™ sized Oracle in 1.8 minutes via pattern matching. (Slow manual counting speed is one of the reasons why function points have been used primarily on small to mid-sized applications when counted manually.)
One issue with sizing by pattern matching is that the function points for a majority of large applications were derived from “backfiring” or mathematical conversion from logical code statements. This method is not reliable. However if there are a number of applications the aggregate or average probably compensates for that issue.
Another issue is that none of the older historical projects use the new SNAP metric which was just released in 2012. This will require additional mathematical adjustments when there is sufficient SNAP data to derive rules and algorithms for assessing the SNAP portions of legacy applications.
Pattern Matching for Productivity and Quality Analysis
Additional variables such as CMMI levels, team experience, programming languages, and work hours per month can be used to perform full project estimates, but are not needed for sizing. However if clients want to know size in logical source code statements, they need to select the programming language(s) from the SRM pull-down table of languages. Multiple languages in the same application are also supported such as Java and HTML or COBOL and SQL.
To measure or estimate software development productivity rates some additional SRM input variables need to be provided by clients. Here too most of the information is in the form of multiple-choice questions. However if a client wants accurate cost estimates they must provide their own local cost structures rather than accepting default values for costs. The SRM productivity factors are shown below:
Software Risk Master ™ Development Estimating Adjustment Factors
Development compensation = $10,000 per month (default)
Maintenance compensation = $8,000 per month (default)
User compensation = $10,000 per month (default)
Additional project costs = $0 (default)
Project financial value (if known) = $0 (default)
Project goals = 3 (Average staffing; average schedule)
Work hours per month = 132 hours per month (default)
Monthly unpaid overtime hours = 0 (default)
Monthly paid overtime hours = 0 (default)
Project CMMI level = 3 (default)
Project Methodology = 8 Agile/Scrum (default)
Methodology experience = 2 (Above average: majority of team are experts)
Client experience level = 4 (Below average: inexperienced with project type)
Project management experience = 2 (Above average: managed many similar projects)
Development team experience = 3 (Average)
Test team experience = 1 (Well above average: all certified test personnel)
Quality assurance experience = 3 (Average)
Customer support experience = 5 (Very inexperienced: totally new to project type)
Maintenance team experience = 3 (Average)
Here too the use of numeric coding for the variables that will impact the project’s schedules, effort, staffing, and cost make statistical analysis fairly straightforward.
The experience questions all are based on a 5-point scale which makes statistical analysis of results comparatively easy:
DEVELOPMENT TEAM EXPERIENCE: _______
- All experts
- Majority of experts
- Even mix of experts and novices
- Majority of novices
- All novices
As can be seen the central value of 3 represents average results or the center point of a bell-shaped curve.
One common use for pattern matching is to compare the results of various programming methodologies. In order to do this form of comparison this users merely select the methodology they plan to use from the SRM multiple-choice list of 34 software development methods:
Methods | ||||
1 |
Mashup | |||
2 |
Hybrid | |||
3 |
IntegraNova | |||
4 |
TSP/PSP | |||
5 |
Microsoft Solutions Framework | |||
6 |
RUP | |||
7 |
XP | |||
8 |
Agile/Scrum | |||
9 |
Data state design | |||
10 |
T-VEC | |||
11 |
Information engineering (IE) | |||
12 |
Object Oriented | |||
13 |
EVO | |||
14 |
RAD | |||
15 |
Jackson | |||
16 |
SADT | |||
17 |
Spiral | |||
18 |
SSADM | |||
19 |
Open-source | |||
20 |
Flow based | |||
21 |
Iterative | |||
22 |
Crystal development | |||
23 |
V-Model | |||
24 |
Prince2 | |||
25 |
Merise | |||
26 |
DSDM | |||
27 |
Clean room | |||
28 |
ISO/IEC | |||
29 |
Waterfall | |||
30 |
Pair programming | |||
31 |
DoD 2167 | |||
32 |
Proofs of correctness | |||
33 |
Cowboy | |||
34 |
None | |||
Because Agile with Scrum are widely used in 2013 this choice is the default method. But it is easy to try any of the others in the SRM taxonomy methodology list.
If the client also wants quality predictions or maintenance and enhancement predictions some additional inputs are needed for these estimates in addition to the ones already shown. For example maintenance costs are strongly correlated to numbers of users and numbers of installations where the software is installed. Quality is strongly correlated to the combination of defect prevention methods, pre-test removal such as inspections, and the set of testing stages used.
As with the variables shown above, most of the SRM inputs are based on multiple-choice questions. Multiple-choice questions are easy to understand and easy for clients to select.
It happens that pattern matching is metric neutral and can produce size data in a variety of metrics simultaneously. The metrics supported include IFPUG function points, COSMIC function points, NESMA function points, FISMA function points, use case points, story points, RICE objects, and several additional metrics.
If you have an application size of an even 1,000 function points using IFPUG version 4.2, here are the approximate sizes predicted for the other 15 metrics. In the prototype SRM version the other metrics are merely displayed as shown below. In a commercial version of SRM users could select which metric they want to use for normalization of output data elements. The 15 metrics currently supported include:
Alternate Metrics |
Size |
% of IFPUG |
|
1 |
Backfired function points |
1,000 |
100.00% |
2 |
Cosmic function points |
1,143 |
114.29% |
3 |
Fast function points |
970 |
97.00% |
4 |
Feature points |
1,000 |
100.00% |
5 |
FISMA function points |
1,020 |
102.00% |
6 |
Full function points |
1,170 |
117.00% |
7 |
Function points light |
965 |
96.50% |
8 |
Mark II function points |
1,060 |
106.00% |
9 |
NESMA function points |
1,040 |
104.00% |
10 |
RICE objects |
4,714 |
471.43% |
11 |
SCCQI “function points” |
3,029 |
302.86% |
12 |
SNAP non functional metrics |
235 |
23.53% |
13 |
Story points |
556 |
55.56% |
14 |
Unadjusted function points |
890 |
89.00% |
15 |
Use case points |
333 |
33.33% |
Additional metrics can be added if they have formal definitions. The default size metrics used by Software Risk Master ™ include IFPUG function points and logical code statements. These two metrics are the most common in the United States. SNAP non-functional size metrics were added in 2013 of the SRM prototype but were not present in the original 2011 version since SNAP had not been published at that time.
As this is being written data on the new SNAP metric is just becoming available, so it is probable that the SNAP predictions will be changing fairly soon.
Sizing Accuracy: The State of the Art as of 2013
Sizing accuracy using function points is a disputatious topic in 2013. There are ongoing debates between COSMIC function point users and IFPUG function point users as well as other forms of function points such as NESMA and FISMA as to which method is most accurate.
Because function point counts are performed by human beings using fairly complex sets of rules, there are variances among certified counters when they count the same application. There is no “cesium atom” or absolute standard against which function point accuracy can be measured.
Consider the PBX application cited in this article. If it were counted by 10 certified IFPUG counters and 10 certified COSMIC counters the results would probably be in the following range: IFPUG counters would range between about 1,400 and 1,600 function points and average about 1,500. COSMIC counters would range between about 1,500 and 1,700 function points and average about 1,550. In general COSMIC counts are larger than IFPUG counts. (Coincidentally the differences between COSMIC and IFPUG are close to the differences between Imperial gallons and U.S. gallons.)
(If the new SNAP metrics were included on the IFPUG side, there would be an additional size component. However SNAP is a new concept and is not to be found in historical data for legacy applications. All of the PBX examples in this paper are much older than SNAP.)
An advantage of sizing using Software Risk Master ™ is that if 10 users answered the input questions the same way, the 10 results would be identical.
In 2013 the Object Management Group (OMG) announced a new standard for automated function point counting, using IFPUG as the basis. The OMG standard did not include any discussion of how the counts would compare to normal IFPUG counts. In fact the text of the standard said there would be variances, but did not explain their magnitude.
Another omission from the OMG standard is that it requires analysis of source code. There are more than 2,500 programming languages as of 2013 and the OMG standard did not identify which languages were supported and which were not.
In the context of the PBX switch discussed in this paper, it is unlikely that the OMG standard would be able to count switches coded in CHILL, Electronic Switching PL/I (ES/PLI), Objectivc C, or CORAL all of which were used in the telecommunications sector. As this paper is written the accuracy of the OMG method is unknown or at least unpublished.
One of the theoretical advantages of automatic sizing should be the speed of achieving the size of applications. Manual function point counts average around 500 function points counted per day with about a 20% range based on experience and application complexity. There is nothing in the OMG standard about counting speed. But the amount of preparatory work before the OMG method can be used seems significant. The OMG standard should publish comparative results between manual counts and OMG counts.
In the interest of full disclosure, the Software Risk Master ™ sizing speed averages 1.88 minutes per application regardless of the nominal size of the application. In other words, SRM sizes 100,000 function point applications at the same speed that it sizes 10 function point applications. (SRM does not “count.” It uses pattern matching to show sizes of historical projects with the same patterns as the new application being sized.).
Pattern Matching in Other Business and Scientific Fields
The pattern matching method is new and novel for software, but widely used outside of software by other business sectors.
If you want to buy a house in another community or in your own town the web site of Zillow.com will give you the prices of houses all over the United States via pattern matching. Zillow allows users to specify square feet, style, and various amenities such as swimming pools.
For example if you want to buy a 3,500 square foot home with an in-ground swimming pool on at least five acres in Taos, New Mexico Zillow can find available properties in a few minutes.
If you want to buy a used automobile either Autotrader or the Kelly Blue Book can provide automobile prices using pattern matching. These two sources show automobile prices by geographic area, by manufacturer, by model, by mileage, and by feature such as having satellite radio or a GPS navigation package.
For example if you were interested in the price of a used 2012 Lexus RX350 with all-wheel drive, satellite radio, a GPS, and a premium sound system within 20 miles of Sarasota, Florida that could easily be done using Autotrader.com in a few seconds.
Of course you would still have to negotiate a final price with the seller. Having the average and range of costs for identical or very similar cars is a good starting point for the negotiations. If you decide to omit the satellite radio the price might be a few hundred dollars lower. If you decide you want a car with less than 10,000 miles that will raise the price. The point is that pattern matching is an excellent starting place for decision making.
Pattern matching is also a normal part of medical diagnosis. When a patient visits a general practitioner and presents the classic symptoms of whooping cough, for example, the condition will immediately be diagnosed by the physician because the patterns of millions of whooping cough symptoms have been analyzed for more than 200 years. Of course various lab samples and blood tests will be taken to confirm the diagnosis, but these are used more to ensure against potential malpractice claims than to confirm the diagnosis.
When new and unusual conditions appear, such as Lyme disease, they will often be misdiagnosed until sufficient numbers of patients have been examined to understand the patterns of symptoms. This was actually true for Lyme disease and dozens of patients had been misdiagnosed as having childhood arthritis because some of the Lyme disease symptoms are ambiguous.
Lyme disease was not even recognized as a new illness until a physician did a statistical analysis of the patterns of diagnoses of childhood arthritis centering on the town of Old Lyme, Connecticut. There were far too many cases of childhood arthritis for that to be the true condition, so additional research detected the presence of the Lyme disease bacteria. Still more research eventually found that the vectors of Lyme disease were white-footed mice and common white-tailed deer. Ticks that moved from the deer to mice were the Lyme disease vectors.
(Of course a tick bite mark surrounded by a red circle is a strong indicator of Lyme disease, but this is not always present. Further, it may have been present but in a spot invisible to the patient so it was not noticed until it had faded and other symptoms occurred.)
If you are interested in the size, schedule, and effort to develop a bank ATM processing application in San Francisco, California then pattern matching can provide size and cost information in a few seconds based on dozens of similar projects.
Scientists also use pattern matching to place a newly discovered fish or insect into standard biological categories based on genera, type, class, order, and species.
In all cases the effectiveness of pattern matching is based on the existence of a stable and reliable taxonomy. Pattern matching is new for software, but well understood by many other sciences, engineering fields, and business sectors.
In order for pattern matching to work for software, historical data is needed that encompasses at least 15,000 software projects. Additional mathematical algorithms are needed to process applications that do not have a perfect match to any pattern.
A final advantage of pattern matching for software sizing is that it can be used before software requirements are fully known. This is because the basic taxonomy pattern of an application can be identified very early, and indeed will lead to the requirements that are eventually defined. This is because software projects with the same taxonomy usually have very similar requirements.
Early sizing prior to full requirements make early risk analysis possible. Many risks are directly proportional to application size, so the sooner size is ascertained the quicker potential risks can be evaluated.
Consider the patterns for risks for these six size plateaus of software applications:
1 function point: Close to zero risk.
10 function points: Close to zero risk.
100 function points; low risk; more than 95% success; minor delays and cost overruns.
1,000 function points: risks increase; schedule and cost overruns > 10%.
10,000 function points: major risks; canceled projects occur; overruns > 35%.
100,000 function points; > 50% of projects cancelled; overruns > 55% for survivors.
If a company or government group is planning to build a software application that is likely to be larger than 1,000 function points in size, early sizing and early risk analyses are urgent needs. The earlier size and risks can be evaluated, the more time there will be to deploy effective risk solutions.
Pattern Matching and Early Risk Analysis Prior to Requirements Completion
Software projects are subject to more than 225 risks in all, including security risks, quality risks knowledge risks, financial risks, ethical risks and many others. No risk tool can identify all 225 but Software Risk Master ™ can identify about 25 major risks before projects start and are funded. This gives time to deploy risk solutions.
Here is a small sample of risks for a major application of 10,000 function points or about 533,000 Java statements developed by an average team:
Predicted Risks | |
Cancellation |
25.77% |
Negative ROI |
32.65% |
Cost overrun |
28.35% |
Schedule slip |
34.36% |
Unhappy customers |
36.00% |
Litigation |
11.34% |
Average Risks |
28.08% |
Financial Risks |
47.58% |
Additional risks are specific to various deliverables such as requirements:
Requirements size (pages) = 2,126
Requirements completeness = 73.79%
Amount one person understands = 12.08%
Days required to read requirements = 48.09
Requirements creep or growth = 1,599 function points
Missing requirements = 216
Toxic requirements = 27
Requirements defects = 1,146
Test cases for requirements = 5,472
Because the patented early sizing method of SRM can be used prior to requirements, SRM is the only parametric tool that can predict the size, completeness, and quality of the requirements themselves before projects start. This early prediction allows time to introduce better requirements methods such as joint application design (JAD), quality function deployment (QFD), Rational Doors, T-VEC, IntegraNova, requirements modeling, text static analysis, the FOG readability index, and other recent solutions to chronic requirements problems.
Without multiplying examples “Software Risk Master ™” is aptly named since it predicts risks earlier than other common parametric estimation tools, and it predicts many risks that are not handled by other tools.
Software Document and Paperwork Sizing
The patented sizing method used in Software Risk Master ™ generates size data not only in terms of function points and logical code statements, but the SRM prototype also produces size estimates for 13 document types. The full commercial version will be able to produce document sizes for more than 100 document types including special documents needed for FDA and FAA certification.
Document sizing is an important topic for large software projects and especially for military and defense software, since “producing paper documents” is the top cost driver for defense applications.
Some defense applications produce more than 200 documents with a total of more than 400 English words for every source code statement. The words cost more than twice as much as the source code. Defense software averages almost three times the document volumes and sizes of civilian projects with the same patterns other than being defense applications.
While web applications and small internal projects may produce few (or no) documents and while Agile projects have very few documents, the fact remains that large systems software and large military software projects have major costs associated with the production of requirements, design, plans, status reports, users manuals, help text and dozens of other paper documents.
While a commercial version of SRM will be able to size more than 100 kinds of documents including those needed for FAA and FDA certification, the current prototype sizes 13 as a proof of concept.
The document sizes shown below are samples for a defense application of 25,000 function points. It is easily seen why document sizing is needed in parametric software estimation tools.
|
Document |
|
||
Sizes |
Pages |
Words |
Percent |
|
Complete |
||||
1 |
Requirements |
4,936 |
1,974,490 |
61.16% |
2 |
Architecture |
748 |
299,110 |
70.32% |
3 |
Initial design |
6,183 |
2,473,272 |
55.19% |
4 |
Detail design |
12,418 |
4,967,182 |
65.18% |
5 |
Test plans |
2,762 |
1,104,937 |
55.37% |
6 |
Development Plans |
1,375 |
550,000 |
68.32% |
7 |
Cost estimates |
748 |
299,110 |
71.32% |
8 |
User manuals |
4,942 |
1,976,783 |
80.37% |
9 |
HELP text |
4,965 |
1,986,151 |
81.37% |
10 |
Course materials |
3,625 |
1,450,000 |
79.85% |
11 |
Status reports |
3,007 |
1,202,721 |
70.32% |
12 |
Change requests |
5,336 |
2,134,284 |
66.16% |
13 |
Bug reports |
29,807 |
11,922,934 |
76.22% |
TOTAL |
80,852 |
32,340,974 |
69.32% |
Predicting document sizes and completeness before requirements is a standard SRM feature. This feature becomes progressively important as application size increases in terms of function points because paperwork volumes go up faster than function point sizes goes up. It is particularly important for defense applications because the main cost drivers for military software are:
Military software cost drivers: | |
1) The cost of producing English words | |
2) The cost of finding and fixing bugs | |
3) The cost of cancelled projects | |
4) The cost of avoiding security flaws | |
5) The cost of meetings and communications | |
6) The cost of programming or coding | |
7) The cost of project management |
The function point communities have concentrated primarily on sizing only in terms of function points and the more recent SNAP metrics. Function points and SNAP are certainly important, but to understand software costs and schedules the sizes of all deliverables need to be predicted too.
SRM predicts size in terms of IFPUG function points and logical code statements, and it also predicts size for document numbers and volumes, and for numbers of test cases needed for each form of test. It also predicts “defect potentials” or probable number of software bugs that might be found in requirements, design, code, user manuals, and “bad fixes” or secondary defects.
SRM also sizes requirements creep and the growth of applications over time. Typically requirements creep is close to 2% per calendar month. An application sized at 10,000 function points are the end of requirements could easily grow to 12,000 function points by the time of delivery.
(The author has been an expert witness in litigation where requirements creep doubled the initial size at requirements; from 10,000 to 20,000 function points over a four-year contract. The litigation was because the client did not want to pay the vendor for the changes, even though the contract specified payments for out-of-scope changes. The court decided in favor of the vendor, because function points are based on user requirements.)
As of 2013 patented SRM sizing method predicts the sizes of more software deliverables than any other tool or method and also does so about six months earlier than any other method. SRM is also the only tool to predict requirements growth throughout development and for five years of post-release usage. Post-release growth averages about 8% per year, with occasional “mid-life kickers” where many new features are added to keep up with competitive applications.
Pattern Matching and Software Benchmark Statistical Analysis
When both sections of the taxonomy are joined together the result is a very powerful tool for pattern analysis or statistical research on software productivity, quality, successes, and failures. The taxonomy also is a good condensation of benchmark data.
Note that the consolidated version includes confidential information that would not be used for published statistical studies. These confidential topics include the name of the company and the name of the project. However if the method is used privately inside of companies such as Microsoft or IBM, they would want to record the proprietary information.
It should be noted that the projects studied by the author using the SRM taxonomy were all studied under non-disclosure agreements. This makes it legally impossible to identify specific companies. Therefore the company and project identification information is concealed and encrypted and not open to public scrutiny.
Software Risk Master ™ Full Benchmark and Estimating Taxonomy
Security Level: Company Confidential
Company Name: XYZ Telecommunications
Business unit: San Jose Development Laboratory
Project Name: Sample PBX Switching system
Project Manager: J. Doe
Data provided by: Capers Jones
Team members interviewed: A. Doe, B. Doe, C. Doe, J.Doe (manager)
Interview method: On-site meeting
Interview clock hours: 3.0
Interview team hours: 12.0
Date of data collection: 03/04/2013
Project start date: 03/09/2013
Desired completion date: 03/09/2014
Actual completion date: Unknown
Country code = 1 (United States)
Region code = 06 (California)
City Code = 408 (San Jose)
Industry code = 1569 (Telecommunications)
Project Nature = 1 (New project)
Project Scope = 21 (New components; new application)
Project Class = 5 (External, bundled with hardware)
Project Type = 14 (Communications or telecommunications)
Problem Complexity = 5 (Average complexity)
Code Complexity = 4 (Below average complexity)
Data Complexity = 6 (Above average complexity)
Primary Size metric = 1 (IFPUG function points with SNAP)
Secondary size metric = 8 (Logical code statements)
Programming language(s) = 14 (CHILL)
Programming language level = 3
Certified reuse percent = 15% (default – can be adjusted by users)
Development compensation = $10,000 per month (default)
Maintenance compensation = $8,000 per month (default)
User compensation = $10,000 per month (default)
Additional project costs = $0 (default)
Project financial value (if known) = $0 (default)
Project goals = 3 (Average staffing; average schedule)
Work hours per month = 132 hours per month (default)
Monthly unpaid overtime hours = 0 (default)
Monthly paid overtime hours = 0 (default)
Project CMMI level = 3 (default)
Project Methodology = 8 Agile/Scrum (default)
Methodology experience = 2 (Above average: majority of team are experts)
Client experience level = 4 (Below average: inexperienced with project type)
Project management experience = 2 (Above average: managed many similar projects)
Development team experience = 3 (Average)
Test team experience = 1 (Well above average: all certified test personnel)
Quality assurance experience = 3 (Average)
Customer support experience = 5 (Very inexperienced: totally new to project type)
Maintenance team experience = 3 (Average)
Note that the taxonomy captures in a concise fashion all of the major factors that influence the results of software projects for better or for worse. A good taxonomy is a working tool for many scientific fields, and software engineering is no exception.
By converting all of the critical variable information into numeric form statistical benchmark studies are easy to carry out.
The automated prototype SRM tool uses a short version of the author’s full assessment and benchmark questionnaire. A full commercial version would include additional topics that will collect and predict the results of:
- Any combination of ISO standards used for the application.
- The presence or absence of certified project personnel such as by the Project Management Institute (PMI) or various test and quality assurance professional associations, or by Microsoft, IBM, and other corporations that offer certifications.
- Specific tool suites used for the application such as the Mercury test tool suite, the Coverity or CAST static analysis tools, or the CAI automated project work bench (APO).
The full version of the SRM questionnaire is annotated like a Michelin Guide. Questions are annotated with a star system. The four-star “****” questions are the most important.
The original idea for SRM was to capture every factor that influences software projects by as much as 1%. However this turned out to be impossible for legal and policy reasons. A number of influential factors cannot be measured or studied. Topics where law or policy prohibits measurements include the appraisal scores of team members, their academic grade averages, their age, and their membership in trade unions.
As to the latter factor, trade unions, in many organizations where software personnel are unionized it is not permitted to collect benchmark data or measure team performance at all because these violate union rules.
Software Risk Master ™ Benchmarks and Estimating Output Information
The input taxonomy data discussed here feeds into the Software Risk Master ™ tool. The outputs from the tool include, but are not limited to, the following set of 45 factors:
Software Risk Master ™ Outputs
- Size in IFPUG function points
- Size in logical code statements
- Probable size of requirements creep
- Probable size of deferred functions
- Size in 12 other metrics (story points, use-case points, COSMIC, NESMA, etc.)
- Size and completeness of software documents
- Numbers of test cases needed for all test stages
- Development staffing by activity
- Development staffing by occupation (analysts, coders, testers, etc.)
- Development schedules by activity and net schedule
- Probability of achieving desired target schedule
- Development costs by activity and total cost
- Productivity in work hours per function point
- Productivity in function points per staff month
- Development costs per activity and total costs
- Development costs per function point by activity and in total
- Defect potentials by origin (requirements, design, code, documents, bad fixes)
- Defect prevention effectiveness (JAD, Quality Function Deployment, etc.)
- Pre-test defect removal efficiency for inspections and static analysis
- Testing defect removal efficiency for all major forms of testing
- Delivered defects by severity level
- Cost of quality (COQ) for the application
- Technical Debt (TD) for the application
- Total Cost of Ownership (TCO)
- Probable number of “error prone modules” if any
- Reliability in mean time to failure (MTTF)
- Stabilization period after delivery
- Security vulnerabilities present at delivery
- Installation and user training
- Maintenance (defect repairs) for five years after delivery
- Enhancements (new features) for five years after delivery
- Customer support for five years after delivery
- Project management for five year after delivery
- Odds of litigation for breach of contract for outsource projects
- Cost of litigation for plaintiff and defendant if case goes through trial
- Venture capital investment for start-up software companies
- Dilution of ownership due to multiple rounds of venture capital
- Risk of project cancelation
- Risk of major schedule delays
- Risk of major cost overruns
- Risk of litigation for poor quality
- Risk of poor customer satisfaction
- Risk of executive dissatisfaction
- Risk of poor team morale
- Risk of post-release security attacks
The taxonomy and Software Risk Master ™ are designed for ease of use and achieving rapid results. SRM can size any application in about 90 seconds. The full set of input questions can be entered in less than five minutes for experienced users and no more than 10 minutes for first-time users.
Once the inputs are complete, SRM produces estimates in just a few seconds. The speed is so fast that SRM works well as a teaching tool because students don’t have to wait or spend time carrying out model calibration.
Another benefit of high-speed data entry and quick predictions is that it makes it very interesting and even enjoyable to try alternate scenarios. For example SRM can predict the results of Waterfall, Agile, XP, RUP, and TSP in less than 15 minutes. About five minutes are needed for the initial inputs, and then only about 30 seconds to change assumptions to switch from one method to another.
Table 2 shows a sample development prediction from Software Risk Master ™ for a generic systems software application of 1,000 function points or 53,000 Java statements:
Table 2: Example of Activity Software Estimating Equations | ||||||||
Application Class | External Systems Software | |||||||
Programming Language(s) |
Java |
|||||||
Application Size in Function Points | 1,000 | |||||||
Application Size in Lines of Code | 53,000 | |||||||
Work Hours per Month |
132 |
|||||||
Average Monthly Salary |
$10,000 |
|||||||
Activity |
Ascope |
Prate |
Whours/ |
Staff |
Effort |
Schedule |
Cost |
Percent |
Func. Pt. |
Months |
Months |
||||||
|
|
|||||||
Requirements |
500 |
75.00 |
1.76 |
2.00 |
13.33 |
6.67 |
$133,333 |
10.00% |
Prototyping |
500 |
175.00 |
0.75 |
2.00 |
5.71 |
2.86 |
$57,143 |
4.29% |
Design |
400 |
75.00 |
1.76 |
2.50 |
13.33 |
5.33 |
$133,333 |
10.00% |
Design Reviews |
250 |
175.00 |
0.75 |
4.00 |
5.71 |
1.43 |
$57,143 |
4.29% |
Coding |
200 |
30.00 |
4.40 |
5.00 |
33.33 |
6.67 |
$333,333 |
25.01% |
Code Inspections |
125 |
160.00 |
0.83 |
8.00 |
6.25 |
0.78 |
$62,500 |
4.69% |
Testing |
150 |
35.00 |
3.77 |
6.67 |
28.57 |
4.29 |
$285,714 |
21.44% |
Quality Assurance |
1000 |
175.00 |
0.75 |
1.00 |
5.71 |
5.71 |
$57,143 |
4.29% |
Documentation |
1000 |
215.00 |
0.61 |
1.00 |
4.65 |
4.65 |
$46,512 |
3.49% |
Management |
1000 |
60.00 |
2.20 |
1.00 |
16.67 |
16.67 |
$166,667 |
12.50% |
TOTAL |
147 |
7.50 |
17.59 |
6.80 |
133.28 |
16.67 |
$1,332,821 |
100.00% |
Note that some abbreviations were needed to fit the table on the page in portrait mode.
The column labeled “Ascope” stands for “Assignment Scope” which is the number of function points one person can be responsible for.
The column labeled “Prate” stands for “Production Rate” and is the amount of functionality that one person can finish in one calendar month with 132 work hours. Raising or lowering the number of work hours per month has an impact on this variable.
The column labeled “Whours” stands for “Work hours per function point.” This is essentially the reciprocal of function points per staff month. The two are easily converted back and forth. Here too raising or lowering the number of work hours would change the result.
Unpaid overtime would shorten schedules and lower costs, since the work is being done for free. Paid overtime, on the other hand, would shorten schedules but would raise costs due to the normal premium pay of 150% for paid overtime. In some cases special overtime such as work on public holidays may have a higher premium of 200%.
The default metrics for showing productivity rates are work hours per function point and function points per work month. It is planned in later versions to allow users to select any time unit that matches local conventions, such as hours, days, weeks, months, or years. Smaller projects below 250 function points normally use hours. Larger systems above 10,000 function points normally use months.
The sample above uses only 10 activities. In a commercial version of SRM the number of activities can be expanded to 50 if the users want a more detailed prediction. In normal use, which is prior to the completion of requirements, the details of 50 activities are a distraction. Ten activities are all that are needed to show clients the likely outcome of a project before its requirements are fully known.
SRM has a utility feature that makes side-by-side comparison easy. The utility is able to convert applications to any desired even number. For example if three PBX applications were 1,250, 1,475, and 1,600 function points in size SRM can convert all of them to an even 1,500 for side-by-side comparisons. This is a special feature that is not true estimation because the original technology stack is locked. However the size adjustments do match the empirical result that as sizes get bigger, paperwork and defect volumes grow faster than size in function points or logical code statements.
Some of the samples in this report used the size conversion feature, such as the examples of the 10 PBX switching applications shown below.
Because changing assumptions is easy to do, it is possible to explore many different options for a future project. Since PBX switches were discussed earlier, table 3 illustrates the possible results for doing the same PBX switch using 10 different programming languages:
Table 3: Productivity Rates for 10 Versions of the Same Software Project | ||||||
(A PBX Switching system of 1,500 Function Points in Size) |
|
|||||
Language |
Effort |
Funct. Pt. |
Work Hrs. |
LOC per |
LOC per |
|
|
(Months) |
per Staff |
per |
Staff |
Staff |
|
|
Month |
Funct. Pt. |
Month |
Hour |
||
Assembly |
781.91 |
1.92 |
68.81 |
480 |
3.38 |
|
C |
460.69 |
3.26 |
40.54 |
414 |
3.13 |
|
CHILL |
392.69 |
3.82 |
34.56 |
401 |
3.04 |
|
PASCAL |
357.53 |
4.20 |
31.46 |
382 |
2.89 |
|
PL/I |
329.91 |
4.55 |
29.03 |
364 |
2.76 |
|
Ada83 |
304.13 |
4.93 |
26.76 |
350 |
2.65 |
|
C++ |
293.91 |
5.10 |
25.86 |
281 |
2.13 |
|
Ada95 |
269.81 |
5.56 |
23.74 |
272 |
2.06 |
|
Objective C |
216.12 |
6.94 |
19.02 |
201 |
1.52 |
|
Smalltalk |
194.64 |
7.71 |
17.13 |
162 |
1.23 |
|
Average |
360.13 |
4.17 |
31.69 |
366 |
2.77 |
|
In addition to productivity measures and predictions, SRM also carries out quality measures and predictions. Table 4 shows the possible quality results for the same PBX switch using 10 different programming languages:
Table 4: Delivered Defects for 10 Versions of the Same Software Project | ||||||
(A PBX Switching System of 1,500 Function Points in Size) |
|
|||||
|
|
|
|
|
|
|
Language |
Total |
Defect |
Delivered |
Delivered |
Delivered |
|
|
Defects |
Removal |
Defects |
Defects |
Defects |
|
|
|
Efficiency |
|
per |
per |
|
|
|
|
|
Funct. Pt. |
KLOC |
|
Assembly |
12,835 |
91.00% |
1,155 |
0.77 |
3.08 |
|
C |
8,813 |
92.00% |
705 |
0.47 |
3.70 |
|
CHILL |
8,093 |
93.00% |
567 |
0.38 |
3.60 |
|
PASCAL |
7,635 |
94.00% |
458 |
0.31 |
3.36 |
|
PL/I |
7,276 |
94.00% |
437 |
0.29 |
3.64 |
|
Ada83 |
6,981 |
95.00% |
349 |
0.23 |
3.28 |
|
C++ |
6,622 |
93.00% |
464 |
0.31 |
5.62 |
|
Ada95 |
6,426 |
96.00% |
257 |
0.17 |
3.50 |
|
Objective C |
5,772 |
96.00% |
231 |
0.15 |
5.31 |
|
Smalltalk |
5,510 |
96.00% |
220 |
0.15 |
7.00 |
|
Average |
7,580 |
94.00% |
455 |
0.30 |
3.45 |
Software Risk Master ™ predicts size, productivity, and quality using both function points and logical code statements. However readers are cautioned that only function points produce correct economic results.
Lines of code metrics actually reverse true economic productivity results and make the lowest-level programming languages look better than modern high-level languages. Table 5 shows the productivity rankings of the 10 samples as measured using both function points and lines of code:
Table 5: Rankings of Productivity Levels Using Function Point Metrics | ||||
and Lines of Code (LOC) Metrics | ||||
Productivity Ranking |
|
Productivity Ranking |
||
Using Function Point |
|
Using LOC Metrics |
||
Metrics |
|
|
||
1 |
Smalltalk |
1 |
Assembly | |
2 |
Objective C |
2 |
C | |
3 |
Ada95 |
3 |
CHILL | |
4 |
C++ |
4 |
PASCAL | |
5 |
Ada83 |
5 |
PL/I | |
6 |
PL/I |
6 |
Ada83 | |
7 |
PASCAL |
7 |
C++ | |
8 |
CHILL |
8 |
Ada95 | |
9 |
C |
9 |
Objective C | |
10 |
Assembly |
10 |
Smalltalk |
Because “lines of code” metrics violate standard economic assumptions and show incorrect reversed productivity rates, LOC should be considered to be professional malpractice for economic studies that involve more than one programming language.
Incidentally the venerable “cost per defect metric” also violates standard economic assumptions and does not show quality economics at all. Cost per defect penalizes quality and achieves its lowest values for the buggiest software applications!
SRM displays data using both LOC and cost per defect as well as function points. The reason for this is to show clients exactly what is wrong with LOC and cost per defect, because the errors of these metrics are not well understood.
Another use of pattern matching is to compare various software development methods. Table 4 illustrates the results for 10 common software development methods. Table 6 is not a PBX switch but a generic IT application of 1000 function points:
Table 6: Software Schedules, Staff, Effort, Productivity | |||||||
Methodologies |
Schedule |
Staff |
Effort |
FP |
Development |
||
Months |
|
Months |
Month |
Cost |
|||
1 |
Extreme (XP) |
11.78 |
7 |
84 |
11.89 |
$630,860 |
|
2 |
Agile/scrum |
11.82 |
7 |
84 |
11.85 |
$633,043 |
|
3 |
TSP |
12.02 |
7 |
86 |
11.64 |
$644,070 |
|
4 |
CMMI 5/ spiral |
12.45 |
7 |
83 |
12.05 |
$622,257 |
|
5 |
OO |
12.78 |
8 |
107 |
9.31 |
$805,156 |
|
6 |
RUP |
13.11 |
8 |
101 |
9.58 |
$756,157 |
|
7 |
Pair/iterative |
13.15 |
12 |
155 |
9.21 |
$1,160,492 |
|
8 |
CMMI 3/iterative |
13.34 |
8 |
107 |
9.37 |
$800,113 |
|
9 |
Proofs/waterfall |
13.71 |
12 |
161 |
6.21 |
$1,207,500 |
|
10 |
CMMI 1/waterfall |
15.85 |
10 |
158 |
6.51 |
$1,188,870 |
|
Average |
13.00 |
8.6 |
112.6 |
9.762 |
$844,852 |
When used in estimating mode, Software Risk Master ™ could produce these 10 examples in roughly 12 minutes. It would take about 5 minutes for the first prediction and then changing methodologies takes less than 30 seconds each. Of course these 10 examples are all the same size. Sizing each one separately takes about 90 seconds per application with SRM.
Large software projects can have up to 116 different kinds of occupation group. In today’s world many specialists are needed. The current prototype of SRM predicts the staffing levels for 20 of these occupation groups.
Staffing predictions vary with project size as do the numbers of kinds of specialists that are likely to be deployed.
The following list of specialists and generalists is taken from a prediction for a 25,000 function point military application.
At this large size all 20 of the occupation groups are used and the organization structure will no doubt involve over a dozen organizational units such as a project office, several development groups, one or more test teams, an integration and configuration control group, software quality assurance, technical publications, and others. There will also be metrics specialists and function point counters, although function point counting is often carried out by contract personnel rather than by in-house employees.
Occupation Groups and Part-Time Specialists | |||
Normal |
Peak |
||
Staff |
Staff |
||
1 |
Programmers |
94 |
141 |
2 |
Testers |
83 |
125 |
3 |
Designers |
37 |
61 |
4 |
Business analysts |
37 |
57 |
5 |
Technical writers |
16 |
23 |
6 |
Quality assurance |
14 |
22 |
7 |
1st line managers |
15 |
21 |
8 |
Data base administration |
8 |
11 |
9 |
Project Office staff |
7 |
10 |
10 |
Administrative support |
8 |
11 |
11 |
Configuration control |
5 |
7 |
12 |
Project librarians |
4 |
6 |
13 |
2nd line managers |
3 |
4 |
14 |
Estimating specialists |
3 |
4 |
15 |
Architects |
2 |
3 |
16 |
Security specialists |
1 |
2 |
17 |
Performance specialists |
1 |
2 |
18 |
Function point counters |
1 |
2 |
19 |
Human factors specialists |
1 |
2 |
20 |
3rd line managers |
1 |
1 |
There are also predictions for organization structures. For example large systems above 10,000 function points in size normally have project offices. They also tend to have specialized test departments rather than having testing done by the developers themselves.
Correcting “Leakage” From Software Benchmark Data
A common benchmark problem with software projects developed under a cost-center model is that of “leakage.” Historical data has gaps and omissions. Sometimes omits more than 60% of the actual effort and costs. The most common omissions are unpaid overtime, management, and the work of part-time specialists such as quality assurance, business analysts, function point counters, and project office personnel.
Projects that are built under time and materials contract or under a profit model tend to be more accurate, since they need high accuracy in order to bill clients the correct amounts.
Software Risk Master ™ has an effective method for correcting leakage that is based on pattern matching. Prior to collecting actual benchmark data the project is run through SRM in predictive estimating mode.
The SRM algorithms and knowledge base know the most common patterns of leakage and offer corrected values. If the clients agree with the SRM predictions, then the SRM estimate becomes the benchmark. If the client wants to add information or make adjustments, they can be made to the SRM outputs, which speeds up and simplifies benchmark data collection time. Following in table 7 are 25 software development activities with the ones that tend to “leak” being identified:
Table 7: Common Leakage Patterns from Software Historical Data
Activities Performed Completeness of historical data
01 Requirements Missing or Incomplete
02 Prototyping Missing or Incomplete
03 Architecture Missing or Incomplete
04 Project planning Missing or Incomplete
05 Initial analysis and design Missing or Incomplete
06 Detail design Incomplete
07 Design reviews Missing or Incomplete
08 Coding Complete
09 Reusable code acquisition Missing or Incomplete
10 Purchased package acquisition Missing or Incomplete
11 Code inspections Missing or Incomplete
12 Independent verification and validation Complete
13 Configuration management Missing or Incomplete
14 Integration Missing or Incomplete
15 User documentation Missing or Incomplete
16 Unit testing Incomplete
17 Function testing Incomplete
18 Integration testing Incomplete
19 System testing Incomplete
20 Field testing Missing or Incomplete
21 Acceptance testing Missing or Incomplete
22 Independent testing Complete
23 Quality assurance Missing or Incomplete
24 Installation and training Missing or Incomplete
25 Project management Missing or Incomplete
26 Total project resources, costs Incomplete
On average projects developed under a cost-center model, which means that they do not charge users for development, historical data is only about 37% complete.
Quality data also leaks, since many companies don’t measure bugs or defects until after release. Only a few major companies such as IBM and AT&T start collecting defect during requirements and continue through static analysis, inspections, all forms of testing, and out into the fields.
IBM was so interested in complete quality data that they asked for volunteers to record bugs found via desk checking and unit testing, which are normally unmeasured private forms of defect removal. The volunteer data allowed IBM to calculate the defect removal efficiency levels of both desk checks and unit testing.
Because finding and fixing bugs is the #1 cost driver for major software projects, SRM is very thorough in both measuring and predicting the results of all known forms of defect removal: inspections, static analysis, and many kinds of testing.
Table 8 shows approximate levels of defect removal efficiency a full series of pre-test defect removal and test stages. Table 8 illustrates a major system of 10,000 function points or 533,000 Java statements.
Few real projects use so many different forms of defect removal so table 8 is a hypothetical example of really advanced quality control:
Table 8: Pre-Test and Test Defect Removal Predictions from SRM | |||||||
(Note: 10,000 function points or 533,000 Java statements) | |||||||
Pre-Test Defect |
Architect. |
Require. |
Design |
Code |
Document |
TOTALS |
|
Removal Methods |
Defects per |
Defects per |
Defects per |
Defects per |
Defects per |
|
|
Function |
Function |
Function |
Function |
Function |
|
||
Point |
Point |
Point |
Point |
Point |
|
||
Defect Potentials per FP |
0.25 |
0.95 |
1.15 |
1.35 |
0.55 |
4.25 |
|
Defect potentials |
3,408 |
12,950 |
15,676 |
18,403 |
7,497 |
57,935 |
|
Security flaw % |
1.50% |
0.75% |
2.00% |
3.00% |
0.00% |
7.25% |
|
1 |
Requirement inspection |
5.00% |
87.00% |
10.00% |
5.00% |
8.50% |
25.14% |
Defects discovered |
170 |
11,267 |
1,568 |
920 |
637 |
14,562 |
|
Bad-fix injection |
5 |
338 |
47 |
28 |
19 |
437 |
|
Defects remaining |
3,232 |
1,346 |
14,062 |
17,455 |
6,841 |
42,936 |
|
2 |
Architecture inspection |
85.00% |
12.00% |
10.00% |
2.50% |
12.00% |
12.98% |
Defects discovered |
2,748 |
161 |
1,406 |
436 |
821 |
5,572 |
|
Bad-fix injection |
82 |
5 |
42 |
13 |
25 |
167 |
|
Defects remaining |
402 |
1,179 |
12,613 |
17,006 |
5,995 |
37,196 |
|
3 |
Design inspection |
10.00% |
14.00% |
87.00% |
7.00% |
26.00% |
37.45% |
Defects discovered |
40 |
165 |
10,974 |
1,190 |
1,559 |
13,928 |
|
Bad-fix injection |
1 |
5 |
329 |
36 |
47 |
696 |
|
Defects remaining |
361 |
1,009 |
1,311 |
15,779 |
4,390 |
22,850 |
|
4 |
Code inspection |
12.50% |
15.00% |
25.00% |
85.00% |
15.00% |
63.87% |
Defects discovered |
45 |
151 |
328 |
13,413 |
658 |
14,595 |
|
Bad-fix injection |
1 |
5 |
10 |
402 |
20 |
438 |
|
Defects remaining |
315 |
853 |
973 |
1,965 |
3,712 |
7,817 |
|
5 |
Static Analysis |
2.00% |
2.00% |
10.00% |
87.00% |
3.00% |
24.83% |
Defects discovered |
6 |
17 |
97 |
1,709 |
111 |
1,941 |
|
Bad-fix injection |
0 |
1 |
3 |
51 |
3 |
58 |
|
Defects remaining |
308 |
836 |
873 |
204 |
3,597 |
5,818 |
|
6 |
IV & V |
10.00% |
12.00% |
23.00% |
7.00% |
20.00% |
18.32% |
Defects discovered |
31 |
100 |
201 |
14 |
719 |
1,066 |
|
Bad-fix injection |
1 |
3 |
6 |
0 |
22 |
32 |
|
Defects remaining |
276 |
732 |
666 |
189 |
2,856 |
4,720 |
|
7 |
SQA review |
10.00% |
17.00% |
20.00% |
12.00% |
17.00% |
25.52% |
Defects discovered |
28 |
125 |
133 |
23 |
486 |
794 |
|
Bad-fix injection |
1 |
4 |
4 |
1 |
15 |
40 |
|
Defects remaining |
248 |
604 |
529 |
166 |
2,356 |
3,887 |
|
Pre-test defects removed |
3,160 |
12,346 |
15,148 |
18,237 |
5,142 |
54,032 |
|
Pre-test efficiency % |
92.73% |
95.33% |
96.63% |
99.10% |
68.58% |
93.26% |
|
|
|||||||
Test Defect Removal |
|||||||
Stages |
|||||||
Architect. |
Require. |
Design |
Code |
Document |
Total |
||
1 |
Subroutine testing |
0.00% |
1.00% |
5.00% |
45.00% |
2.00% |
3.97% |
Defects discovered |
0 |
6 |
26 |
75 |
47 |
154 |
|
Bad-fix injection |
0 |
0 |
1 |
2 |
1 |
5 |
|
Defects remaining |
248 |
598 |
502 |
89 |
2,307 |
3,728 |
|
|
|
|
|
|
|
||
2 |
Unit testing |
2.50% |
4.00% |
7.00% |
35.00% |
10.00% |
8.42% |
Defects discovered |
6 |
24 |
35 |
31 |
231 |
327 |
|
Bad-fix injection |
0 |
1 |
1 |
1 |
7 |
10 |
|
Defects remaining |
241 |
573 |
465 |
57 |
2,070 |
3,391 |
|
3 |
Function testing |
7.50% |
5.00% |
22.00% |
37.50% |
25.00% |
20.29% |
Defects discovered |
18 |
29 |
102 |
21 |
517 |
688 |
|
Bad-fix injection |
1 |
1 |
3 |
1 |
16 |
21 |
|
Defects remaining |
223 |
544 |
360 |
35 |
1,537 |
2,682 |
|
4 |
Regression testing |
2.00% |
2.00% |
5.00% |
33.00% |
7.50% |
5.97% |
Defects discovered |
4 |
11 |
18 |
12 |
115 |
160 |
|
Bad-fix injection |
0 |
0 |
1 |
0 |
3 |
5 |
|
Defects remaining |
218 |
533 |
341 |
23 |
1,418 |
2,517 |
|
5 |
Integration testing |
6.00% |
20.00% |
27.00% |
33.00% |
22.00% |
21.11% |
Defects discovered |
13 |
107 |
92 |
8 |
312 |
531 |
|
Bad-fix injection |
0 |
3 |
3 |
0 |
9 |
16 |
|
Defects remaining |
205 |
423 |
246 |
15 |
1,097 |
1,970 |
|
6 |
Performance testing |
14.00% |
2.00% |
20.00% |
18.00% |
2.50% |
5.92% |
Defects discovered |
29 |
8 |
49 |
3 |
27 |
117 |
|
Bad-fix injection |
1 |
0 |
1 |
0 |
1 |
3 |
|
Defects remaining |
175 |
414 |
196 |
12 |
1,068 |
1,850 |
|
7 |
Security testing |
12.00% |
15.00% |
23.00% |
8.00% |
2.50% |
8.42% |
Defects discovered |
21 |
62 |
45 |
1 |
27 |
156 |
|
Bad-fix injection |
1 |
2 |
1 |
0 |
1 |
5 |
|
Defects remaining |
154 |
350 |
149 |
11 |
1,041 |
1,690 |
|
8 |
Usability testing |
12.00% |
17.00% |
15.00% |
5.00% |
55.00% |
39.86% |
Defects discovered |
18 |
60 |
22 |
1 |
573 |
673 |
|
Bad-fix injection |
1 |
2 |
1 |
0 |
17 |
20 |
|
Defects remaining |
135 |
289 |
126 |
11 |
451 |
996 |
|
9 |
System testing |
16.00% |
12.00% |
18.00% |
38.00% |
34.00% |
23.74% |
Defects discovered |
22 |
35 |
23 |
4 |
153 |
236 |
|
Bad-fix injection |
1 |
1 |
1 |
0 |
5 |
7 |
|
Defects remaining |
112 |
253 |
103 |
7 |
293 |
752 |
|
10 |
Cloud testing |
10.00% |
5.00% |
13.00% |
10.00% |
20.00% |
12.84% |
Defects discovered |
11 |
13 |
13 |
1 |
59 |
97 |
|
Bad-fix injection |
0 |
0 |
0 |
0 |
2 |
3 |
|
Defects remaining |
101 |
240 |
89 |
6 |
233 |
669 |
|
11 |
Independent testing |
12.00% |
10.00% |
11.00% |
10.00% |
23.00% |
14.96% |
Defects discovered |
12 |
24 |
10 |
1 |
54 |
100 |
|
Bad-fix injection |
0 |
1 |
0 |
0 |
2 |
3 |
|
Defects remaining |
88 |
215 |
79 |
5 |
178 |
566 |
|
12 |
Field (Beta) testing |
14.00% |
12.00% |
14.00% |
17.00% |
34.00% |
19.55% |
Defects discovered |
12 |
26 |
11 |
1 |
60 |
111 |
|
Bad-fix injection |
0 |
1 |
0 |
0 |
2 |
3 |
|
Defects remaining |
76 |
189 |
68 |
4 |
115 |
452 |
|
13 |
Acceptance testing |
13.00% |
14.00% |
15.00% |
12.00% |
24.00% |
19.43% |
Defects discovered |
11 |
22 |
9 |
1 |
46 |
89 |
|
Bad-fix injection |
0 |
1 |
0 |
0 |
1 |
3 |
|
Defects remaining |
65 |
166 |
58 |
4 |
68 |
360 |
|
Test Defects Removed |
183 |
438 |
471 |
162 |
2,288 |
3,527 |
|
Testing Efficiency % |
73.96% |
72.55% |
89.05% |
97.86% |
97.11% |
90.74% |
|
Total Defects Removed |
3,343 |
12,784 |
15,618 |
18,399 |
7,429 |
57,559 |
|
Total Bad-fix injection |
100 |
384 |
469 |
552 |
223 |
1,727 |
|
Cumulative Removal % |
98.11% |
98.72% |
99.63% |
99.98% |
99.09% |
99.35% |
|
Remaining Defects |
65 |
166 |
58 |
4 |
68 |
376 |
|
High-severity Defects |
10 |
28 |
11 |
1 |
9 |
56 |
|
Security flaws |
0 |
0 |
1 |
0 |
0 |
2 |
|
Remaining Defects |
0.0047 |
0.0122 |
0.0042 |
0.0003 |
0.0050 |
0.0276 |
|
per Function Point | |||||||
Remaining Defects |
4.73 |
12.17 |
4.25 |
0.26 |
5.00 |
27.58 |
|
per K Function Points | |||||||
Remaining Defects |
0.12 |
0.31 |
0.11 |
0.01 |
0.13 |
0.70 |
|
per KLOC | |||||||
Table 8 shows a total of 8 pre-test removal activities and 13 test stages. Very few projects use this many forms of defect removal. An “average” U.S. software project would static analysis and probably 4 kinds of testing: 1) unit test, 2) function test, 3) regression test, and 4) system test. Average U.S. defect removal circa 2013 is below 90%. Only a few top companies such as IBM achieve DRE results higher than 99%.
Military and defense software, medical systems, and systems software for complex physical devices such as telephone switching systems and computer operating systems would use several kinds of inspections, static analysis, and at least six to eight forms of testing. For example only military projects tend to use independent verification and validation (IV&V) and independent testing.
Table 8 is intended to show the full range of defect removal operations that can be measured and predicted using Software Risk Master ™. Table 8 also assumes that all defect removal personnel are top-guns, fully trained, and that test personnel are certified.
Pattern matching is useful for measuring and predicting quality as well as for measuring and predicting software development productivity.
Summary and Conclusions about Software Pattern Matching
Pattern matching based on formal taxonomies has had a long history in science and has proven its value time and again. Pattern matching for business decisions such as real estate appraisals or automobile costs are more recent but no less effective and useful.
The Software Risk Master ™ tool uses pattern matching as the basis for sizing applications, process assessments, benchmark data collection, and predictive estimating of future software projects.
As of 2013 more than 95% of software applications are not “new” in the sense that they have never been designed or built before. The vast majority of modern software projects are either replacements for legacy applications or minor variations on existing software.
Whenever there are large numbers of similar projects that have been built before and have accurate historical data available, pattern matching is the most effective and efficient way of capturing and using historical results to predict future outcomes.
References and Readings on Software Pattern Matching
The primary citation for modern taxonomic analysis is:
Linneaeus, Carl; Systema Naturae; privately published in Sweden in 1735.
The American Society of Indexing has a special interest group on taxonomy creation and analysis: www.taxonomies-sig.org.
Note: All of the author’s books use various forms of taxonomy such as defect classifications, defect removal methods, and application classes and types.
Jones, Capers; “A Short History of Lines of Code Metrics”; Namcook Analytics Technical Report; Narragansett, RI; 2012.
This report provides a mathematical proof that “lines of code” metrics violate standard economic assumptions. LOC metrics make requirements and design invisible. Worse, LOC metrics penalize high-level languages. The report asserts that LOC should be deemed professional malpractice if used to compare results between different programming languages. There are other legitimate purposes for LOC, such as merely measuring coding speed.
Jones, Capers; “A Short History of the Cost Per Defect Metrics”; Namcook Analytics Technical Report; Narragansett, RI 2012.
This report provides a mathematical proof that “cost per defect” penalizes quality and achieves its lowest values for the buggiest software applications. It also points out that the urban legend that “cost per defect after release is 100 times larger than early elimination” is not true. The reason for expansion of cost per defect for down-stream defect repairs is due to ignoring fixed costs. The cost per defect metric also ignores many economic topics such as the fact that high quality leads to shorter schedules.
Jones, Capers; “Early Sizing and Early Risk Analysis”; Capers Jones & Associates LLC;
Narragansett, RI; July 2011.
Jones, Capers and Bonsignour, Olivier; The Economics of Software Quality; Addison Wesley Longman, Boston, MA; ISBN 10: 0-13-258220—1; 2011; 585 pages.
Jones, Capers; Software Engineering Best Practices; McGraw Hill, New York, NY; ISBN 978-0-07-162161-8; 2010; 660 pages.
Jones, Capers; Applied Software Measurement; McGraw Hill, New York, NY; ISBN 978-0-07-150244-3; 2008; 662 pages.
Jones, Capers; Estimating Software Costs; McGraw Hill, New York, NY; 2007; ISBN-13: 978-0-07-148300-1.
Jones, Capers; Software Assessments, Benchmarks, and Best Practices; Addison Wesley Longman, Boston, MA; ISBN 0-201-48542-7; 2000; 657 pages.
Jones, Capers; Conflict and Litigation Between Software Clients and Developers; Software Productivity Research, Inc.; Burlington, MA; September 2007; 53 pages; (SPR technical report).