#11 – SOFTWARE PATTERN MATCHING – (C) CAPERS JONES

Capers Jones pixPattern matching is a predictive methodology that uses a formal taxonomy to compare results of historical software projects against the possible outcomes of new software projects that are about to start development.

Pattern matching for software starts with a questionnaire that uses multiple-choice questions.  These questions elicit information about a new project, such as its nature, scope, class, type, and complexity.

The answers to the questions form a “pattern” that is used to extract data from historical projects that have the same pattern, or a pattern that is very close.  Mathematical algorithms have been developed to handle partial matches to historical patterns.

Mathematical approximations are necessary because the total number of patterns formed by the proprietary taxonomy totals 214,200,000.  Most of these patterns have never occurred and never will occur.  The nucleus of common patterns that occur many times for software is closer to 20,000.

In today’s world pattern matching is a good choice for software sizing and estimating because almost 95% of software applications are not “new” in the sense of never being done before.  The majority today are either legacy replacements or minor variations to existing software.

Pattern matching and formal taxonomies have been widely used in science and business, but are comparatively new for software.

Software pattern matching as described here is based on a proprietary taxonomy developed by the author, Capers Jones.  The taxonomy uses multiple-choice questions to identify the key attributes of software projects.  The taxonomy is used to collect historical benchmark data and also as basis for estimating future projects.  The taxonomy is also used for sizing applications.

For sizing, the taxonomy includes project nature, scope, class, type, problem complexity, code complexity, and data complexity. For estimating, additional parameters such as CMMI level, methodology, and team experience are also used.

The pattern matching methodology for software sizing is patent pending and the inventor is Capers Jones.  The utility patent application is U.S. Patent Application No. 13/352,434 filed January 18, 2012, called early and rapid sizing for software applications.

The pattern matching approach for software sizing is a standard feature of the Software Risk Master ™ tool (SRM).  For example the 2013 SRM taxonomy entries for “project scope” include 34 entries:

Project Scope

1

Algorithm

2

Maintenance:  defect repair

3

Subroutine

4

Module

5

Reusable module

6

Enhancement to a program

7

Small enhancement to a system

8

Disposable prototype or 7% of application

9

Large enhancement to a program

10

Evolutionary prototype or 12% of application

11

Average enhancement to a system

12

Subprogram

13

Standalone program: Smartphone

14

Standalone program: tablet

15

Standalone program: PC

16

Large enhancement to a system

17

Standalone program: Web

18

Standalone program: Cloud

19

Standalone program: embedded

20

Standalone program: mainframe

21

Multi-component program

22

Component of a departmental system

23

Release of a system (base plus)

24

Component of a corporate system

25

Component of an enterprise system

26

New social network system

27

New departmental system

28

Component of a national system

29

New corporate system

30

Component of a global system

31

Massively multiplayer game application

32

New enterprise system

33

New national systems

34

New global system

The in the SRM taxonomy for “project type” include these 25 forms of software:

Project Type

1

Nonprocedural (generated, query, spreadsheet)

2

Batch application

3

Interactive application

4

Batch database application

5

Interactive GUI application

6

Interactive database application

7

Web application

8

Client/server application

9

Data warehouse application

10

Big data application

11

Computer game

12

Scientific or mathematical program

13

System support or middleware application

14

Service oriented architecture (SOA)

15

Expert system

16

Communications or telecommunications

17

Process control applications

18

Trusted systems

19

Embedded or real-time applications

20

Graphics, animation, or image processing applications

21

Multimedia applications

22

Robotics or mechanical automation applications

23

AI applications

24

Neural net applications

25

Hybrid: multiple types

The total numbers of discrete elements in the full software sizing taxonomy are:

Project Nature

12

Project Scope

34

Project Class

21

Project Type

25

Problem complexity

10

Code Complexity

10

Data Complexity

10

Sum

122

Permutations

214,200,000

With 122 total elements the permutations of the full taxonomy total to 214,200,000 possible patterns.  Needless to say more than half of these patterns have never occurred and will never occur.

For the software industry in 2013 the total number of patterns that occur with relatively high frequency is much smaller:  about 20,000.

Using Pattern Matching for Sizing Software Applications

To use pattern matching for software sizing, the clients provide answers to the multiple-choice taxonomy questions.  The answers to these questions form a distinct “pattern.”

The client’s pattern for a project is then compared against the Software Risk Master ™ knowledge base.  Projects with the same or nearly the same patterns are selected.

Due to the large numbers of projects examined and measured over the years, mathematical algorithms have been developed that are based on thousands of projects.  These algorithms are quick and also enable matches of patterns that are close but not identical to a client’s taxonomy.

Rather than an actual scan for identical patterns, the SRM algorithms condense the original data and speed up the calculations to a few seconds.

For example if a client were interested in a PBX switching system perhaps a dozen similar projects with the same pattern could be found.  These historical PBX switching projects would range from about 1,200 to perhaps 1,700 function points in size, and average about 1,500.  The data from the PBX results would be aggregated and presented to the client with the average size being the primary data point for sizing.

However the SRM algorithms are already set for PBX switching systems so merely specifying that type of application will generate a size of around 1,500 function points without needing to scan for specific PBX projects.

Some additional geographic information is also part of the taxonomy, but has no impact on applications size.  The full set of topics in the SRM sizing taxonomy would look like the table shown below.

When used with the SRM software tool developed for the invention, four additional factors from public sources are part of the taxonomy (country, region, industry, and city):

Software Risk Master ™ Full Sizing Taxonomy

Country code =                                1            (United States)

Region code =                               06            (California)

City Code =                                                 408            (San Jose)

Industry code =                               1569            (Telecommunications)

 

Project Nature =                                1  (New project)

Project Scope =                              21  (New components; new application)

Project Class =                                5  (External, bundled with hardware)

Project Type =                                          14  (Communications or telecommunications)

Problem Complexity =                   5  (Average complexity)

Code Complexity =                                4  (Below average complexity)

Data Complexity =                                6  (Above average complexity)

Primary Size metric =                                1  (IFPUG function points with SNAP)

Secondary size metric =                    8  (Logical code statements)

Programming language(s) =                  14  (CHILL)

Programming language level =       3

Certified reuse percent =                   15% (default)

By using numeric codes the taxonomy allows sophisticated statistical analysis.  Data can be analyzed by country, by industry, by application type, by application size, by programming language, by metric, by complexity, or by any combination of factors.

The first four items in the full taxonomy use public data.  For example the “industry code” is the North American Industry Classification (NAIC) code published by the U.S. Department of Commerce.  The country code is taken from the international telephone calling codes.  The city code is the telephone area code.  The region code for the United States is taken from an alphabetical list of the 50 states published on several web sites and readily available.

The taxonomy is the key to software pattern matching, and indeed a critical topic for many kinds of scientific and statistical analysis.

For sizing, pattern matching is not counting function points. The function points have already been counted for the historical projects.  Pattern matching is an effective method for using historical data to show clients the probable size and effort for similar future projects.

Pattern matching does not require any knowledge of the inner structure of the application.  It happens that software projects that share the same patterns of external attributes are also about the same size and often have similar schedules, staff sizes, effort, and costs (when adjusted for pay scales, countries, and industries).

Pattern matching provides an early, quick, and accurate method for sizing and estimating software projects based on historical projects with similar patterns and attributes.  The software taxonomy of size, nature, scope, class, type, and complexity are key predictors of software application size.  One reason for the accuracy of pattern matching is because of the precision of the proprietary taxonomy.

In a sense pattern matching works like a GPS system.  By comparing signals from several satellites a GPS receiver can show position within a few yards.  With software pattern matching comparing the “signals” from the software taxonomy can provide precise information about software projects.

Pattern matching can produce sizes for software projects in about 90 seconds using the Software Risk Master™ tool.  Full development, schedule, staffing, effort, cost, quality, and risk estimates take less than 5 minutes.

Shown below in table 1 are 40 samples sized using the SRM pattern-matching approach.  The length of time needed to create these 40 size examples was about 75 minutes or 1.88 minutes per application.

Table 1:  Examples of Software Size via Pattern Matching

                Using Software Risk Master ™

Application Size in IFPUG Function Points

  1. Oracle 229,434
  2. Windows 7 (all features) 202,150
  3. Microsoft Windows XP   66,238
  4. Google docs   47,668
  5. Microsoft Office 2003   33,736
  6. F15 avionics/weapons   23,109
  7. VA medical records   19,819
  8. Apple I Phone   19,366
  9. IBM IMS data base   18,558
  10. Google search engine   18,640
  11. Linux   17,505
  12. ITT System 12 switching   17,002
  13. Denver Airport luggage (original)   16,661
  14. Child Support Payments (state)   12,546
  15. Facebook     8,404
  16. MapQuest     3,793
  17. Microsoft Project     1,963
  18. Android OS (original version)     1,858
  19. Microsoft Excel     1,578
  20. Garmin GPS navigation (hand held)     1,518
  21. Microsoft Word     1,431
  22. Mozilla Firefox     1,342
  23. Laser printer driver (HP)     1,248
  24. Sun Java compiler     1,185
  25. Wikipedia     1,142
  26. Cochlear implant (embedded)     1,041
  27. Microsoft DOS circa 1998     1,022
  28. Nintendo Gameboy DS     1,002
  29. Casio atomic watch        933
  30. Computer BIOS        857
  31. KnowledgePlan        883
  32. Function Point Workbench                   714
  33. Norton anti-virus        700
  34. SPQR/20        699
  35. Golf handicap analysis        662
  36. Google Gmail        590
  37. Twitter (original circa 2009)        541
  38. Freecell computer solitaire        102
  39. Software Risk Master™ prototype          38
  40. ILOVEYOU computer worm          22

It should be noted that manual function point analysis proceeds at a rate of perhaps 500 function points counted per day.  To count function points manually for the first example, Oracle, at 229,434 function points would require roughly 459 working days of manual function point analysis.  Software Risk Master ™ sized Oracle in 1.8 minutes via pattern matching.  (Slow manual counting speed is one of the reasons why function points have been used primarily on small to mid-sized applications when counted manually.)

One issue with sizing by pattern matching is that the function points for a majority of large applications were derived from “backfiring” or mathematical conversion from logical code statements.  This method is not reliable.  However if there are a number of applications the aggregate or average probably compensates for that issue.

Another issue is that none of the older historical projects use the new SNAP metric which was just released in 2012.  This will require additional mathematical adjustments when there is sufficient SNAP data to derive rules and algorithms for assessing the SNAP portions of legacy applications.

Pattern Matching for Productivity and Quality Analysis

Additional variables such as CMMI levels, team experience, programming languages, and work hours per month can be used to perform full project estimates, but are not needed for sizing.  However if clients want to know size in logical source code statements, they need to select the programming language(s) from the SRM pull-down table of languages.  Multiple languages in the same application are also supported such as Java and HTML or COBOL and SQL.

To measure or estimate software development productivity rates some additional SRM input variables need to be provided by clients.  Here too most of the information is in the form of multiple-choice questions.  However if a client wants accurate cost estimates they must provide their own local cost structures rather than accepting default values for costs.  The SRM productivity factors are shown below:

Software Risk Master ™ Development Estimating Adjustment Factors

Development compensation =    $10,000 per month (default)

Maintenance compensation =                  $8,000 per month (default)

User compensation =                            $10,000 per month (default)

Additional project costs =                   $0 (default)

Project financial value (if known) =   $0 (default)

Project goals =                                      3 (Average staffing; average schedule)

Work hours per month =                      132 hours per month (default)

Monthly unpaid overtime hours =        0 (default)

Monthly paid overtime hours =            0 (default)

Project CMMI level =                                      3 (default)

Project Methodology =                           8 Agile/Scrum (default)

Methodology experience =                          2 (Above average: majority of team are experts)

Client experience level =                          4 (Below average: inexperienced with project type)

Project management experience =        2 (Above average: managed many similar projects)

Development team experience =          3 (Average)

Test team experience =                          1 (Well above average:  all certified test personnel)

Quality assurance experience =            3 (Average)

Customer support experience =            5 (Very inexperienced:  totally new to project type)

Maintenance team experience =              3 (Average)

Here too the use of numeric coding for the variables that will impact the project’s schedules, effort, staffing, and cost make statistical analysis fairly straightforward.

The experience questions all are based on a 5-point scale which makes statistical analysis of results comparatively easy:

DEVELOPMENT TEAM EXPERIENCE:                                                _______

  1. All experts
  2. Majority of experts
  3. Even mix of experts and novices
  4. Majority of novices
  5. All novices

As can be seen the central value of 3 represents average results or the center point of a bell-shaped curve.

One common use for pattern matching is to compare the results of various programming methodologies.  In order to do this form of comparison this users merely select the methodology they plan to use from the SRM multiple-choice list of 34 software development methods:

Methods

1

Mashup

2

Hybrid

3

IntegraNova

4

TSP/PSP

5

Microsoft Solutions Framework

6

RUP

7

XP

8

Agile/Scrum

9

Data state design

10

T-VEC

11

Information engineering (IE)

12

Object Oriented

13

EVO

14

RAD

15

Jackson

16

SADT

17

Spiral

18

SSADM

19

Open-source

20

Flow based

21

Iterative

22

Crystal development

23

V-Model

24

Prince2

25

Merise

26

DSDM

27

Clean room

28

ISO/IEC

29

Waterfall

30

Pair programming

31

DoD 2167

32

Proofs of correctness

33

Cowboy

34

None

Because Agile with Scrum are widely used in 2013 this choice is the default method.  But it is easy to try any of the others in the SRM taxonomy methodology list.

If the client also wants quality predictions or maintenance and enhancement predictions some additional inputs are needed for these estimates in addition to the ones already shown.   For example maintenance costs are strongly correlated to numbers of users and numbers of installations where the software is installed.  Quality is strongly correlated to the combination of defect prevention methods, pre-test removal such as inspections, and the set of testing stages used.

As with the variables shown above, most of the SRM inputs are based on multiple-choice questions.  Multiple-choice questions are easy to understand and easy for clients to select.

It happens that pattern matching is metric neutral and can produce size data in a variety of metrics simultaneously.  The metrics supported include IFPUG function points, COSMIC function points, NESMA function points, FISMA function points, use case points, story points, RICE objects, and several additional metrics.

If you have an application size of an even 1,000 function points using IFPUG version 4.2, here are the approximate sizes predicted for the other 15 metrics.  In the prototype SRM version the other metrics are merely displayed as shown below.  In a commercial version of SRM users could select which metric they want to use for normalization of output data elements.  The 15 metrics currently supported include:

Alternate Metrics

Size

% of IFPUG

1

Backfired function points

               1,000

100.00%

2

Cosmic function points

               1,143

114.29%

3

Fast function points

                  970

97.00%

4

Feature points

               1,000

100.00%

5

FISMA function points

               1,020

102.00%

6

Full function points

               1,170

117.00%

7

Function points light

                  965

96.50%

8

Mark II function points

               1,060

106.00%

9

NESMA function points

               1,040

104.00%

10

RICE objects

               4,714

471.43%

11

 SCCQI “function points”

               3,029

302.86%

12

SNAP non functional metrics

                  235

23.53%

13

Story points

                  556

55.56%

14

Unadjusted function points

                  890

89.00%

15

Use case points

                  333

33.33%

Additional metrics can be added if they have formal definitions.  The default size metrics used by Software Risk Master ™ include IFPUG function points and logical code statements.  These two metrics are the most common in the United States.  SNAP non-functional size metrics were added in 2013 of the SRM prototype but were not present in the original 2011 version since SNAP had not been published at that time.

As this is being written data on the new SNAP metric is just becoming available, so it is probable that the SNAP predictions will be changing fairly soon.

Sizing Accuracy: The State of the Art as of 2013

Sizing accuracy using function points is a disputatious topic in 2013.  There are ongoing debates between COSMIC function point users and IFPUG function point users as well as other forms of function points such as NESMA and FISMA as to which method is most accurate.

Because function point counts are performed by human beings using fairly complex sets of rules, there are variances among certified counters when they count the same application.  There is no “cesium atom” or absolute standard against which function point accuracy can be measured.

Consider the PBX application cited in this article.  If it were counted by 10 certified IFPUG counters and 10 certified COSMIC counters the results would probably be in the following range:  IFPUG counters would range between about 1,400 and 1,600 function points and average about 1,500.  COSMIC counters would range between about 1,500 and 1,700 function points and average about 1,550.  In general COSMIC counts are larger than IFPUG counts.  (Coincidentally the differences between COSMIC and IFPUG are close to the differences between Imperial gallons and U.S. gallons.)

(If the new SNAP metrics were included on the IFPUG side, there would be an additional size component.  However SNAP is a new concept and is not to be found in historical data for legacy applications. All of the PBX examples in this paper are much older than SNAP.)

An advantage of sizing using Software Risk Master ™ is that if 10 users answered the input questions the same way, the 10 results would be identical.

In 2013 the Object Management Group (OMG) announced a new standard for automated function point counting, using IFPUG as the basis.  The OMG standard did not include any discussion of how the counts would compare to normal IFPUG counts.  In fact the text of the standard said there would be variances, but did not explain their magnitude.

Another omission from the OMG standard is that it requires analysis of source code.  There are more than 2,500 programming languages as of 2013 and the OMG standard did not identify which languages were supported and which were not.

In the context of the PBX switch discussed in this paper, it is unlikely that the OMG standard would be able to count switches coded in CHILL, Electronic Switching PL/I (ES/PLI), Objectivc C, or CORAL all of which were used in the telecommunications sector.  As this paper is written the accuracy of the OMG method is unknown or at least unpublished.

One of the theoretical advantages of automatic sizing should be the speed of achieving the size of applications.  Manual function point counts average around 500 function points counted per day with about a 20% range based on experience and application complexity.  There is nothing in the OMG standard about counting speed.  But the amount of preparatory work before the OMG method can be used seems significant.  The OMG standard should publish comparative results between manual counts and OMG counts.

In the interest of full disclosure, the Software Risk Master ™ sizing speed averages 1.88 minutes per application regardless of the nominal size of the application.  In other words, SRM sizes 100,000 function point applications at the same speed that it sizes 10 function point applications.  (SRM does not “count.”  It uses pattern matching to show sizes of historical projects with the same patterns as the new application being sized.).

Pattern Matching in Other Business and Scientific Fields

The pattern matching method is new and novel for software, but widely used outside of software by other business sectors.

If you want to buy a house in another community or in your own town the web site of Zillow.com will give you the prices of houses all over the United States via pattern matching.  Zillow allows users to specify square feet, style, and various amenities such as swimming pools.

For example if you want to buy a 3,500 square foot home with an in-ground swimming pool on at least five acres in Taos, New Mexico Zillow can find available properties in a few minutes.

If you want to buy a used automobile either Autotrader or the Kelly Blue Book can provide automobile prices using pattern matching.  These two sources show automobile prices by geographic area, by manufacturer, by model, by mileage, and by feature such as having satellite radio or a GPS navigation package.

For example if you were interested in the price of a used 2012 Lexus RX350 with all-wheel drive, satellite radio, a GPS, and a premium sound system within 20 miles of Sarasota, Florida that could easily be done using Autotrader.com in a few seconds.

Of course you would still have to negotiate a final price with the seller.  Having the average and range of costs for identical or very similar cars is a good starting point for the negotiations.  If you decide to omit the satellite radio the price might be a few hundred dollars lower.  If you decide you want a car with less than 10,000 miles that will raise the price.  The point is that pattern matching is an excellent starting place for decision making.

Pattern matching is also a normal part of medical diagnosis.  When a patient visits a general practitioner and presents the classic symptoms of whooping cough, for example, the condition will immediately be diagnosed by the physician because the patterns of millions of whooping cough symptoms have been analyzed for more than 200 years.  Of course various lab samples and blood tests will be taken to confirm the diagnosis, but these are used more to ensure against potential malpractice claims than to confirm the diagnosis.

When new and unusual conditions appear, such as Lyme disease, they will often be misdiagnosed until sufficient numbers of patients have been examined to understand the patterns of symptoms.  This was actually true for Lyme disease and dozens of patients had been misdiagnosed as having childhood arthritis because some of the Lyme disease symptoms are ambiguous.

Lyme disease was not even recognized as a new illness until a physician did a statistical analysis of the patterns of diagnoses of childhood arthritis centering on the town of Old Lyme, Connecticut.  There were far too many cases of childhood arthritis for that to be the true condition, so additional research detected the presence of the Lyme disease bacteria.  Still more research eventually found that the vectors of Lyme disease were white-footed mice and common white-tailed deer.  Ticks that moved from the deer to mice were the Lyme disease vectors.

(Of course a tick bite mark surrounded by a red circle is a strong indicator of Lyme disease, but this is not always present.  Further, it may have been present but in a spot invisible to the patient so it was not noticed until it had faded and other symptoms occurred.)

If you are interested in the size, schedule, and effort to develop a bank ATM processing application in San Francisco, California then pattern matching can provide size and cost information in a few seconds based on dozens of similar projects.

Scientists also use pattern matching to place a newly discovered fish or insect into standard biological categories based on genera, type, class, order, and species.

In all cases the effectiveness of pattern matching is based on the existence of a stable and reliable taxonomy.  Pattern matching is new for software, but well understood by many other sciences, engineering fields, and business sectors.

In order for pattern matching to work for software, historical data is needed that encompasses at least 15,000 software projects.  Additional mathematical algorithms are needed to process applications that do not have a perfect match to any pattern.

A final advantage of pattern matching for software sizing is that it can be used before software requirements are fully known.  This is because the basic taxonomy pattern of an application can be identified very early, and indeed will lead to the requirements that are eventually defined.  This is because software projects with the same taxonomy usually have very similar requirements.

Early sizing prior to full requirements make early risk analysis possible.   Many risks are directly proportional to application size, so the sooner size is ascertained the quicker potential risks can be evaluated.

Consider the patterns for risks for these six size plateaus of software applications:

1 function point:  Close to zero risk.

10 function points: Close to zero risk.

100 function points; low risk; more than 95% success; minor delays and cost overruns.

1,000 function points: risks increase; schedule and cost overruns > 10%.

10,000 function points: major risks; canceled projects occur; overruns > 35%.

100,000 function points; > 50% of projects cancelled; overruns > 55% for survivors.

If a company or government group is planning to build a software application that is likely to be larger than 1,000 function points in size, early sizing and early risk analyses are urgent needs.  The earlier size and risks can be evaluated, the more time there will be to deploy effective risk solutions.

Pattern Matching and Early Risk Analysis Prior to Requirements Completion

Software projects are subject to more than 225 risks in all, including security risks, quality risks knowledge risks, financial risks, ethical risks and many others.  No risk tool can identify all 225 but Software Risk Master ™ can identify about 25 major risks before projects start and are funded.  This gives time to deploy risk solutions.

Here is a small sample of risks for a major application of 10,000 function points or about 533,000 Java statements developed by an average team:

 Predicted Risks
Cancellation

25.77%

Negative ROI

32.65%

Cost overrun

28.35%

Schedule slip

34.36%

Unhappy customers

36.00%

Litigation

11.34%

Average Risks

28.08%

Financial Risks

47.58%

Additional risks are specific to various deliverables such as requirements:

Requirements size (pages) =                            2,126

Requirements completeness  =               73.79%

Amount one person understands =               12.08%

Days required to read requirements =               48.09

Requirements creep or growth =               1,599 function points

Missing requirements =                           216

Toxic requirements =                                       27

Requirements defects =                           1,146

Test cases for requirements =                           5,472

Because the patented early sizing method of SRM can be used prior to requirements, SRM is the only parametric tool that can predict the size, completeness, and quality of the requirements themselves before projects start.  This early prediction allows time to introduce better requirements methods such as joint application design (JAD), quality function deployment (QFD), Rational Doors, T-VEC, IntegraNova, requirements modeling, text static analysis, the FOG readability index, and other recent solutions to chronic requirements problems.

Without multiplying examples “Software Risk Master ™” is aptly named since it predicts risks earlier than other common parametric estimation tools, and it predicts many risks that are not handled by other tools.

Software Document and Paperwork Sizing

The patented sizing method used in Software Risk Master ™ generates size data not only in terms of function points and logical code statements, but the SRM prototype also produces size estimates for 13 document types.  The full commercial version will be able to produce document sizes for more than 100 document types including special documents needed for FDA and FAA certification.

Document sizing is an important topic for large software projects and especially for military and defense software, since “producing paper documents” is the top cost driver for defense applications.

Some defense applications produce more than 200 documents with a total of more than 400 English words for every source code statement.  The words cost more than twice as much as the source code.  Defense software averages almost three times the document volumes and sizes of civilian projects with the same patterns other than being defense applications.

While web applications and small internal projects may produce few (or no) documents and while Agile projects have very few documents, the fact remains that large systems software and large military software projects have major costs associated with the production of requirements, design, plans, status reports, users manuals, help text and dozens of other paper documents.

While a commercial version of SRM will be able to size more than 100 kinds of documents including those needed for FAA and FDA certification, the current prototype sizes 13 as a proof of concept.

The document sizes shown below are samples for a defense application of 25,000 function points.  It is easily seen why document sizing is needed in parametric software estimation tools.

 

 

Document

 

Sizes

Pages

Words

Percent

Complete

1

Requirements

            4,936

          1,974,490

61.16%

2

Architecture

                748

              299,110

70.32%

3

Initial design

            6,183

          2,473,272

55.19%

4

Detail design

          12,418

          4,967,182

65.18%

5

Test plans

            2,762

          1,104,937

55.37%

6

Development Plans

            1,375

              550,000

68.32%

7

Cost estimates

                748

              299,110

71.32%

8

User manuals

            4,942

          1,976,783

80.37%

9

HELP text

            4,965

          1,986,151

81.37%

10

Course materials

            3,625

          1,450,000

79.85%

11

Status reports

            3,007

          1,202,721

70.32%

12

Change requests

            5,336

          2,134,284

66.16%

13

Bug reports

          29,807

        11,922,934

76.22%

TOTAL

          80,852

        32,340,974

69.32%

Predicting document sizes and completeness before requirements is a standard SRM feature.  This feature becomes progressively important as application size increases in terms of function points because paperwork volumes go up faster than function point sizes goes up.  It is particularly important for defense applications because the main cost drivers for military software are:

 

Military software cost drivers:
1) The cost of producing English words
2) The cost of finding and fixing bugs
3) The cost of cancelled projects
4) The cost of avoiding security flaws
5) The cost of meetings and communications
6) The cost of programming or coding
7) The cost of project management

 

The function point communities have concentrated primarily on sizing only in terms of function points and the more recent SNAP metrics.  Function points and SNAP are certainly important, but to understand software costs and schedules the sizes of all deliverables need to be predicted too.

 

SRM predicts size in terms of IFPUG function points and logical code statements, and it also predicts size for document numbers and volumes, and for numbers of test cases needed for each form of test.  It also predicts “defect potentials” or probable number of software bugs that might be found in requirements, design, code, user manuals, and “bad fixes” or secondary defects.

 

SRM also sizes requirements creep and the growth of applications over time.  Typically requirements creep is close to 2% per calendar month.  An application sized at 10,000 function points are the end of requirements could easily grow to 12,000 function points by the time of delivery.

 

(The author has been an expert witness in litigation where requirements creep doubled the initial size at requirements; from 10,000 to 20,000 function points over a four-year contract.  The litigation was because the client did not want to pay the vendor for the changes, even though the contract specified payments for out-of-scope changes.  The court decided in favor of the vendor, because function points are based on user requirements.)

 

As of 2013 patented SRM sizing method predicts the sizes of more software deliverables than any other tool or method and also does so about six months earlier than any other method.  SRM is also the only tool to predict requirements growth throughout development and for five years of post-release usage.  Post-release growth averages about 8% per year, with occasional “mid-life kickers” where many new features are added to keep up with competitive applications.

 

 

Pattern Matching and Software Benchmark Statistical Analysis

When both sections of the taxonomy are joined together the result is a very powerful tool for pattern analysis or statistical research on software productivity, quality, successes, and failures.  The taxonomy also is a good condensation of benchmark data.

Note that the consolidated version includes confidential information that would not be used for published statistical studies.  These confidential topics include the name of the company and the name of the project.  However if the method is used privately inside of companies such as Microsoft or IBM, they would want to record the proprietary information.

It should be noted that the projects studied by the author using the SRM taxonomy were all studied under non-disclosure agreements.  This makes it legally impossible to identify specific companies.  Therefore the company and project identification information is concealed and encrypted and not open to public scrutiny.

Software Risk Master ™ Full Benchmark and Estimating Taxonomy

 

Security Level:                              Company Confidential

Company Name:                              XYZ Telecommunications

Business unit:                                          San Jose Development Laboratory

Project Name:                                          Sample PBX Switching system

Project Manager:                              J. Doe

Data provided by:                              Capers Jones

Team members interviewed:                  A. Doe, B. Doe, C. Doe, J.Doe (manager)

Interview method:                              On-site meeting

Interview clock hours:                  3.0

Interview team hours:                   12.0

Date of data collection:                  03/04/2013

Project start date:                              03/09/2013

Desired completion date:                  03/09/2014

Actual completion date:                  Unknown

 

Country code =                                      1  (United States)

Region code =                                     06  (California)

City Code =                                                       408  (San Jose)

Industry code =                                     1569  (Telecommunications)

 

Project Nature =                                      1  (New project)

Project Scope =                                          21  (New components; new application)

Project Class =                                      5  (External, bundled with hardware)

Project Type =                                                14  (Communications or telecommunications)

Problem Complexity =                         5  (Average complexity)

Code Complexity =                                      4  (Below average complexity)

Data Complexity =                                      6  (Above average complexity)

Primary Size metric =                                      1  (IFPUG function points with SNAP)

Secondary size metric =                          8  (Logical code statements)

Programming language(s) =                        14  (CHILL)

Programming language level =             3

Certified reuse percent =                        15% (default – can be adjusted by users)

 

Development compensation =    $10,000 per month (default)

Maintenance compensation =                  $8,000 per month (default)

User compensation =                            $10,000 per month (default)

Additional project costs =                   $0 (default)

Project financial value (if known) =   $0 (default)

 

Project goals =                                      3 (Average staffing; average schedule)

Work hours per month =                      132 hours per month (default)

Monthly unpaid overtime hours =        0 (default)

Monthly paid overtime hours =            0 (default)

 

Project CMMI level =                                      3 (default)

Project Methodology =                           8 Agile/Scrum (default)

Methodology experience =                          2 (Above average: majority of team are experts)

Client experience level =                          4 (Below average: inexperienced with project type)

Project management experience =        2 (Above average: managed many similar projects)

Development team experience =          3 (Average)

Test team experience =                          1 (Well above average:  all certified test personnel)

Quality assurance experience =            3 (Average)

Customer support experience =            5 (Very inexperienced:  totally new to project type)

Maintenance team experience =              3 (Average)

 

Note that the taxonomy captures in a concise fashion all of the major factors that influence the results of software projects for better or for worse.  A good taxonomy is a working tool for many scientific fields, and software engineering is no exception.

By converting all of the critical variable information into numeric form statistical benchmark studies are easy to carry out.

The automated prototype SRM tool uses a short version of the author’s full assessment and benchmark questionnaire.  A full commercial version would include additional topics that will collect and predict the results of:

  • Any combination of ISO standards used for the application.
  • The presence or absence of certified project personnel such as by the Project Management Institute (PMI) or various test and quality assurance professional associations, or by Microsoft, IBM, and other corporations that offer certifications.
  • Specific tool suites used for the application such as the Mercury test tool suite, the Coverity or CAST static analysis tools, or the CAI automated project work bench (APO).

The full version of the SRM questionnaire is annotated like a Michelin Guide.  Questions are annotated with a star system.  The four-star “****” questions are the most important.

The original idea for SRM was to capture every factor that influences software projects by as much as 1%.  However this turned out to be impossible for legal and policy reasons.  A number of influential factors cannot be measured or studied. Topics where law or policy prohibits measurements include the appraisal scores of team members, their academic grade averages, their age, and their membership in trade unions.

As to the latter factor, trade unions, in many organizations where software personnel are unionized it is not permitted to collect benchmark data or measure team performance at all because these violate union rules.


 

Software Risk Master ™ Benchmarks and Estimating Output Information

The input taxonomy data discussed here feeds into the Software Risk Master ™ tool.  The outputs from the tool include, but are not limited to, the following set of 45 factors:

Software Risk Master ™ Outputs

 

  1. Size in IFPUG function points
  2. Size in logical code statements
  3. Probable size of requirements creep
  4. Probable size of deferred functions
  5. Size in 12 other metrics (story points, use-case points, COSMIC, NESMA, etc.)
  6. Size and completeness of software documents
  7. Numbers of test cases needed for all test stages
  8. Development staffing by activity
  9. Development staffing by occupation (analysts, coders, testers, etc.)
  10. Development schedules by activity and net schedule
  11. Probability of achieving desired target schedule
  12. Development costs by activity and total cost
  13. Productivity in work hours per function point
  14. Productivity in function points per staff month
  15. Development costs per activity and total costs
  16. Development costs per function point by activity and in total
  17. Defect potentials by origin (requirements, design, code, documents, bad fixes)
  18. Defect prevention effectiveness (JAD, Quality Function Deployment, etc.)
  19. Pre-test defect removal efficiency for inspections and static analysis
  20. Testing defect removal efficiency for all major forms of testing
  21. Delivered defects by severity level
  22. Cost of quality (COQ) for the application
  23. Technical Debt (TD) for the application
  24. Total Cost of Ownership (TCO)
  25. Probable number of “error prone modules” if any
  26. Reliability in mean time to failure (MTTF)
  27. Stabilization period after delivery
  28. Security vulnerabilities present at delivery
  29. Installation and user training
  30. Maintenance (defect repairs) for five years after delivery
  31. Enhancements (new features) for five years after delivery
  32. Customer support for five years after delivery
  33. Project management for five year after delivery
  34. Odds of litigation for breach of contract for outsource projects
  35. Cost of litigation for plaintiff and defendant if case goes through trial
  36. Venture capital investment for start-up software companies
  37. Dilution of ownership due to multiple rounds of venture capital
  38. Risk of project cancelation
  39. Risk of major schedule delays
  40. Risk of major cost overruns
  41. Risk of litigation for poor quality
  42. Risk of poor customer satisfaction
  43. Risk of executive dissatisfaction
  44. Risk of poor team morale
  45. Risk of post-release security attacks

 

The taxonomy and Software Risk Master ™ are designed for ease of use and achieving rapid results.  SRM can size any application in about 90 seconds.  The full set of input questions can be entered in less than five minutes for experienced users and no more than 10 minutes for first-time users.

 

Once the inputs are complete, SRM produces estimates in just a few seconds.  The speed is so fast that SRM works well as a teaching tool because students don’t have to wait or spend time carrying out model calibration.

 

Another benefit of high-speed data entry and quick predictions is that it makes it very interesting and even enjoyable to try alternate scenarios.  For example SRM can predict the results of Waterfall, Agile, XP, RUP, and TSP in less than 15 minutes.  About five minutes are needed for the initial inputs, and then only about 30 seconds to change assumptions to switch from one method to another.

 

Table 2 shows a sample development prediction from Software Risk Master ™ for a generic systems software application of 1,000 function points or 53,000 Java statements:

 

 

 

 

Table 2:  Example of  Activity Software Estimating Equations      
               
Application Class       External Systems Software  
Programming Language(s)    

Java

     
Application Size in Function Points              1,000      
Application Size in Lines of Code           53,000      
Work Hours per Month    

132

     
Average Monthly Salary    

$10,000

     
               
Activity

Ascope

Prate

Whours/

Staff

Effort

Schedule

Cost

Percent

Func. Pt.

Months

Months

 

 

Requirements

500

75.00

1.76

2.00

13.33

6.67

$133,333

10.00%

Prototyping

500

175.00

0.75

2.00

5.71

2.86

$57,143

4.29%

Design

400

75.00

1.76

2.50

13.33

5.33

$133,333

10.00%

Design Reviews

250

175.00

0.75

4.00

5.71

1.43

$57,143

4.29%

Coding

200

30.00

4.40

5.00

33.33

6.67

$333,333

25.01%

Code Inspections

125

160.00

0.83

8.00

6.25

0.78

$62,500

4.69%

Testing

150

35.00

3.77

6.67

28.57

4.29

$285,714

21.44%

Quality Assurance

1000

175.00

0.75

1.00

5.71

5.71

$57,143

4.29%

Documentation

1000

215.00

0.61

1.00

4.65

4.65

$46,512

3.49%

Management

1000

60.00

2.20

1.00

16.67

16.67

$166,667

12.50%

TOTAL

147

7.50

17.59

6.80

133.28

16.67

$1,332,821

100.00%

 

 

Note that some abbreviations were needed to fit the table on the page in portrait mode.

 

The column labeled “Ascope” stands for “Assignment Scope” which is the number of function points one person can be responsible for.

 

The column labeled “Prate” stands for “Production Rate” and is the amount of functionality that one person can finish in one calendar month with 132 work hours.  Raising or lowering the number of work hours per month has an impact on this variable.

 

The column labeled “Whours” stands for “Work hours per function point.”  This is essentially the reciprocal of function points per staff month.  The two are easily converted back and forth.  Here too raising or lowering the number of work hours would change the result.

 

Unpaid overtime would shorten schedules and lower costs, since the work is being done for free.  Paid overtime, on the other hand, would shorten schedules but would raise costs due to the normal premium pay of 150% for paid overtime.  In some cases special overtime such as work on public holidays may have a higher premium of 200%.

 

The default metrics for showing productivity rates are work hours per function point and function points per work month.  It is planned in later versions to allow users to select any time unit that matches local conventions, such as hours, days, weeks, months, or years.  Smaller projects below 250 function points normally use hours.  Larger systems above 10,000 function points normally use months.

 

The sample above uses only 10 activities.   In a commercial version of SRM the number of activities can be expanded to 50 if the users want a more detailed prediction.  In normal use, which is prior to the completion of requirements, the details of 50 activities are a distraction.  Ten activities are all that are needed to show clients the likely outcome of a project before its requirements are fully known.

 

SRM has a utility feature that makes side-by-side comparison easy.  The utility is able to convert applications to any desired even number.  For example if three PBX applications were 1,250, 1,475, and 1,600 function points in size SRM can convert all of them to an even 1,500 for side-by-side comparisons.  This is a special feature that is not true estimation because the original technology stack is locked.  However the size adjustments do match the empirical result that as sizes get bigger, paperwork and defect volumes grow faster than size in function points or logical code statements.

 

Some of the samples in this report used the size conversion feature, such as the examples of the 10 PBX switching applications shown below.

 

Because changing assumptions is easy to do, it is possible to explore many different options for a future project.  Since PBX switches were discussed earlier, table 3 illustrates the possible results for doing the same PBX switch using 10 different programming languages:

 

 

Table 3:  Productivity Rates for 10 Versions of the Same Software Project
(A PBX Switching system of 1,500 Function Points in Size)

 

Language

Effort

Funct. Pt.

Work Hrs.

LOC per

LOC per

 

(Months)

per Staff

per

Staff

Staff

 

Month

Funct. Pt.

Month

Hour

Assembly

781.91

1.92

68.81

480

3.38

C

460.69

3.26

40.54

414

3.13

CHILL

392.69

3.82

34.56

401

3.04

PASCAL

357.53

4.20

31.46

382

2.89

PL/I

329.91

4.55

29.03

364

2.76

Ada83

304.13

4.93

26.76

350

2.65

C++

293.91

5.10

25.86

281

2.13

Ada95

269.81

5.56

23.74

272

2.06

Objective C

216.12

6.94

19.02

201

1.52

Smalltalk

194.64

7.71

17.13

162

1.23

Average

360.13

4.17

31.69

366

2.77

 

 

In addition to productivity measures and predictions, SRM also carries out quality measures and predictions.  Table 4 shows the possible quality results for the same PBX switch using 10 different programming languages:

Table 4:  Delivered Defects for 10 Versions of the Same Software Project
(A PBX Switching System of 1,500 Function Points in Size)

 

 

 

 

 

 

 

 

Language

Total

Defect

Delivered

Delivered

Delivered

 

 

Defects

Removal

Defects

Defects

Defects

 

 

 

Efficiency

 

per

per

 

 

 

 

 

Funct. Pt.

KLOC

 

Assembly

12,835

91.00%

1,155

0.77

3.08

C

8,813

92.00%

705

0.47

3.70

CHILL

8,093

93.00%

567

0.38

3.60

PASCAL

7,635

94.00%

458

0.31

3.36

PL/I

7,276

94.00%

437

0.29

3.64

Ada83

6,981

95.00%

349

0.23

3.28

C++

6,622

93.00%

464

0.31

5.62

Ada95

6,426

96.00%

257

0.17

3.50

Objective C

5,772

96.00%

231

0.15

5.31

Smalltalk

5,510

96.00%

220

0.15

7.00

Average

7,580

94.00%

455

0.30

3.45

 

Software Risk Master ™ predicts size, productivity, and quality using both function points and logical code statements.  However readers are cautioned that only function points produce correct economic results.

Lines of code metrics actually reverse true economic productivity results and make the lowest-level programming languages look better than modern high-level languages.  Table 5 shows the productivity rankings of the 10 samples as measured using both function points and lines of code:

 

Table  5:  Rankings of Productivity Levels Using Function Point Metrics
and Lines of Code (LOC) Metrics

Productivity Ranking

 

Productivity Ranking

Using Function Point

 

Using LOC Metrics

Metrics

 

 

1

Smalltalk

1

Assembly

2

Objective C

2

C

3

Ada95

3

CHILL

4

C++

4

PASCAL

5

Ada83

5

PL/I

6

PL/I

6

Ada83

7

PASCAL

7

C++

8

CHILL

8

Ada95

9

C

9

Objective C

10

Assembly

10

Smalltalk

 

Because “lines of code” metrics violate standard economic assumptions and show incorrect reversed productivity rates, LOC should be considered to be professional malpractice for economic studies that involve more than one programming language.

Incidentally the venerable “cost per defect metric” also violates standard economic assumptions and does not show quality economics at all.  Cost per defect penalizes quality and achieves its lowest values for the buggiest software applications!

SRM displays data using both LOC and cost per defect as well as function points.  The reason for this is to show clients exactly what is wrong with LOC and cost per defect, because the errors of these metrics are not well understood.

Another use of pattern matching is to compare various software development methods.  Table 4 illustrates the results for 10 common software development methods.  Table 6 is not a PBX switch but a generic IT application of 1000 function points:

Table  6:  Software Schedules, Staff, Effort, Productivity  
             
Methodologies

Schedule

Staff

Effort

FP

Development

   

Months

 

Months

Month

Cost

1

Extreme (XP)

11.78

7

84

11.89

$630,860

2

Agile/scrum

11.82

7

84

11.85

$633,043

3

TSP

12.02

7

86

11.64

$644,070

4

CMMI 5/ spiral

12.45

7

83

12.05

$622,257

5

OO

12.78

8

107

9.31

$805,156

6

RUP

13.11

8

101

9.58

$756,157

7

Pair/iterative

13.15

12

155

9.21

$1,160,492

8

CMMI 3/iterative

13.34

8

107

9.37

$800,113

9

Proofs/waterfall

13.71

12

161

6.21

$1,207,500

10

CMMI 1/waterfall

15.85

10

158

6.51

$1,188,870

Average

13.00

8.6

112.6

9.762

$844,852

 

When used in estimating mode, Software Risk Master ™ could produce these 10 examples in roughly 12 minutes.  It would take about 5 minutes for the first prediction and then changing methodologies takes less than 30 seconds each.  Of course these 10 examples are all the same size.  Sizing each one separately takes about 90 seconds per application with SRM.

Large software projects can have up to 116 different kinds of occupation group.  In today’s world many specialists are needed.  The current prototype of SRM predicts the staffing levels for 20 of these occupation groups.

Staffing predictions vary with project size as do the numbers of kinds of specialists that are likely to be deployed.

The following list of specialists and generalists is taken from a prediction for a 25,000 function point military application.

At this large size all 20 of the occupation groups are used and the organization structure will no doubt involve over a dozen organizational units such as a project office, several development groups, one or more test teams, an integration and configuration control group, software quality assurance, technical publications, and others.  There will also be metrics specialists and function point counters, although function point counting is often carried out by contract personnel rather than by in-house employees.

 

 

 

Occupation Groups and Part-Time Specialists
 
 

Normal

Peak

 

Staff

Staff

 

1

Programmers

                     94

                       141

2

Testers

                     83

                       125

3

Designers

                     37

                         61

4

Business analysts

                     37

                         57

5

Technical writers

                     16

                         23

6

Quality assurance

                     14

                         22

7

1st line managers

                     15

                         21

8

Data base administration

                       8

                         11

9

Project Office staff

                       7

                         10

10

Administrative support

                       8

                         11

11

Configuration control

                       5

                           7

12

Project librarians

                       4

                           6

13

2nd line managers

                       3

                           4

14

Estimating specialists

                       3

                           4

15

Architects

                       2

                           3

16

Security specialists

                       1

                           2

17

Performance specialists

                       1

                           2

18

Function point counters

                       1

                           2

19

Human factors specialists

                       1

                           2

20

3rd line managers

                       1

                           1

 

There are also predictions for organization structures.  For example large systems above 10,000 function points in size normally have project offices.  They also tend to have specialized test departments rather than having testing done by the developers themselves.

Correcting “Leakage” From Software Benchmark Data

A common benchmark problem with software projects developed under a cost-center model is that of “leakage.”  Historical data has gaps and omissions.  Sometimes omits more than 60% of the actual effort and costs.  The most common omissions are unpaid overtime, management, and the work of part-time specialists such as quality assurance, business analysts, function point counters, and project office personnel.

Projects that are built under time and materials contract or under a profit model tend to be more accurate, since they need high accuracy in order to bill clients the correct amounts.

Software Risk Master ™ has an effective method for correcting leakage that is based on pattern matching.  Prior to collecting actual benchmark data the project is run through SRM in predictive estimating mode.

The SRM algorithms and knowledge base know the most common patterns of leakage and offer corrected values.  If the clients agree with the SRM predictions, then the SRM estimate becomes the benchmark.  If the client wants to add information or make adjustments, they can be made to the SRM outputs, which speeds up and simplifies benchmark data collection time.  Following in table 7 are 25 software development activities with the ones that tend to “leak” being identified:

Table 7: Common Leakage Patterns from Software Historical Data

Activities Performed Completeness of historical data

 

01  Requirements Missing or Incomplete

02  Prototyping Missing or Incomplete

03  Architecture Missing or Incomplete

04  Project planning Missing or Incomplete

05  Initial analysis and design Missing or Incomplete

06  Detail design Incomplete

07  Design reviews Missing or Incomplete

08  Coding Complete

09  Reusable code acquisition Missing or Incomplete

10  Purchased package acquisition Missing or Incomplete

11  Code inspections Missing or Incomplete

12  Independent verification and validation Complete

13  Configuration management Missing or Incomplete

14  Integration Missing or Incomplete

15  User documentation Missing or Incomplete

16  Unit testing Incomplete

17  Function testing Incomplete

18  Integration testing Incomplete

19  System testing Incomplete

20  Field testing Missing or Incomplete

21  Acceptance testing Missing or Incomplete

22  Independent testing Complete

23  Quality assurance Missing or Incomplete

24  Installation and training Missing or Incomplete

25  Project management Missing or Incomplete

26  Total project resources, costs Incomplete

 

On average projects developed under a cost-center model, which means that they do not charge users for development, historical data is only about 37% complete.

Quality data also leaks, since many companies don’t measure bugs or defects until after release.  Only a few major companies such as IBM and AT&T start collecting defect during requirements and continue through static analysis, inspections, all forms of testing, and out into the fields.

IBM was so interested in complete quality data that they asked for volunteers to record bugs found via desk checking and unit testing, which are normally unmeasured private forms of defect removal.  The volunteer data allowed IBM to calculate the defect removal efficiency levels of both desk checks and unit testing.

Because finding and fixing bugs is the #1 cost driver for major software projects, SRM is very thorough in both measuring and predicting the results of all known forms of defect removal:  inspections, static analysis, and many kinds of testing.

Table 8 shows approximate levels of defect removal efficiency a full series of pre-test defect removal and test stages.  Table 8 illustrates a major system of 10,000 function points or 533,000 Java statements.

Few real projects use so many different forms of defect removal so table 8 is a hypothetical example of really advanced quality control:

 

Table 8:  Pre-Test and Test Defect Removal  Predictions from SRM
        (Note: 10,000 function points or 533,000 Java statements)

Pre-Test Defect

Architect.

Require.

Design

Code

Document

TOTALS

Removal Methods

Defects per

Defects per

Defects per

Defects per

Defects per

 

 

Function

Function

Function

Function

Function

 

 

Point

Point

Point

Point

Point

 

Defect Potentials per FP

0.25

0.95

1.15

1.35

0.55

4.25

           
Defect potentials

3,408

12,950

15,676

18,403

7,497

57,935

             
Security flaw %

1.50%

0.75%

2.00%

3.00%

0.00%

7.25%

1

Requirement inspection

5.00%

87.00%

10.00%

5.00%

8.50%

25.14%

Defects discovered

170

11,267

1,568

920

637

14,562

Bad-fix injection

5

338

47

28

19

437

Defects remaining

3,232

1,346

14,062

17,455

6,841

42,936

2

Architecture inspection

85.00%

12.00%

10.00%

2.50%

12.00%

12.98%

Defects discovered

2,748

161

1,406

436

821

5,572

Bad-fix injection

82

5

42

13

25

167

Defects remaining

402

1,179

12,613

17,006

5,995

37,196

3

Design inspection

10.00%

14.00%

87.00%

7.00%

26.00%

37.45%

Defects discovered

40

165

10,974

1,190

1,559

13,928

Bad-fix injection

1

5

329

36

47

696

Defects remaining

361

1,009

1,311

15,779

4,390

22,850

4

Code inspection

12.50%

15.00%

25.00%

85.00%

15.00%

63.87%

Defects discovered

45

151

328

13,413

658

14,595

Bad-fix injection

1

5

10

402

20

438

Defects remaining

315

853

973

1,965

3,712

7,817

5

Static Analysis

2.00%

2.00%

10.00%

87.00%

3.00%

24.83%

Defects discovered

6

17

97

1,709

111

1,941

Bad-fix injection

0

1

3

51

3

58

Defects remaining

308

836

873

204

3,597

5,818

6

IV & V

10.00%

12.00%

23.00%

7.00%

20.00%

18.32%

Defects discovered

31

100

201

14

719

1,066

Bad-fix injection

1

3

6

0

22

32

Defects remaining

276

732

666

189

2,856

4,720

7

SQA review

10.00%

17.00%

20.00%

12.00%

17.00%

25.52%

Defects discovered

28

125

133

23

486

794

Bad-fix injection

1

4

4

1

15

40

Defects remaining

248

604

529

166

2,356

3,887

Pre-test defects removed

3,160

12,346

15,148

18,237

5,142

54,032

Pre-test efficiency %

92.73%

95.33%

96.63%

99.10%

68.58%

93.26%

 

Test Defect Removal

Stages

Architect.

Require.

Design

Code

Document

Total

1

Subroutine testing

0.00%

1.00%

5.00%

45.00%

2.00%

3.97%

Defects discovered

0

6

26

75

47

154

Bad-fix injection

0

0

1

2

1

5

Defects remaining

248

598

502

89

2,307

3,728

 

 

 

 

 

 

2

Unit testing

2.50%

4.00%

7.00%

35.00%

10.00%

8.42%

Defects discovered

6

24

35

31

231

327

Bad-fix injection

0

1

1

1

7

10

Defects remaining

241

573

465

57

2,070

3,391

3

Function testing

7.50%

5.00%

22.00%

37.50%

25.00%

20.29%

Defects discovered

18

29

102

21

517

688

Bad-fix injection

1

1

3

1

16

21

Defects remaining

223

544

360

35

1,537

2,682

4

Regression testing

2.00%

2.00%

5.00%

33.00%

7.50%

5.97%

Defects discovered

4

11

18

12

115

160

Bad-fix injection

0

0

1

0

3

5

Defects remaining

218

533

341

23

1,418

2,517

5

Integration testing

6.00%

20.00%

27.00%

33.00%

22.00%

21.11%

Defects discovered

13

107

92

8

312

531

Bad-fix injection

0

3

3

0

9

16

Defects remaining

205

423

246

15

1,097

1,970

6

Performance testing

14.00%

2.00%

20.00%

18.00%

2.50%

5.92%

Defects discovered

29

8

49

3

27

117

Bad-fix injection

1

0

1

0

1

3

Defects remaining

175

414

196

12

1,068

1,850

7

Security testing

12.00%

15.00%

23.00%

8.00%

2.50%

8.42%

Defects discovered

21

62

45

1

27

156

Bad-fix injection

1

2

1

0

1

5

Defects remaining

154

350

149

11

1,041

1,690

8

Usability testing

12.00%

17.00%

15.00%

5.00%

55.00%

39.86%

Defects discovered

18

60

22

1

573

673

Bad-fix injection

1

2

1

0

17

20

Defects remaining

135

289

126

11

451

996

9

System testing

16.00%

12.00%

18.00%

38.00%

34.00%

23.74%

Defects discovered

22

35

23

4

153

236

Bad-fix injection

1

1

1

0

5

7

Defects remaining

112

253

103

7

293

752

10

Cloud testing

10.00%

5.00%

13.00%

10.00%

20.00%

12.84%

Defects discovered

11

13

13

1

59

97

Bad-fix injection

0

0

0

0

2

3

Defects remaining

101

240

89

6

233

669

11

Independent testing

12.00%

10.00%

11.00%

10.00%

23.00%

14.96%

Defects discovered

12

24

10

1

54

100

Bad-fix injection

0

1

0

0

2

3

Defects remaining

88

215

79

5

178

566

12

Field (Beta) testing

14.00%

12.00%

14.00%

17.00%

34.00%

19.55%

Defects discovered

12

26

11

1

60

111

Bad-fix injection

0

1

0

0

2

3

Defects remaining

76

189

68

4

115

452

13

Acceptance testing

13.00%

14.00%

15.00%

12.00%

24.00%

19.43%

Defects discovered

11

22

9

1

46

89

Bad-fix injection

0

1

0

0

1

3

Defects remaining

65

166

58

4

68

360

Test Defects Removed

183

438

471

162

2,288

3,527

Testing Efficiency %

73.96%

72.55%

89.05%

97.86%

97.11%

90.74%

Total Defects Removed

3,343

12,784

15,618

18,399

7,429

57,559

Total Bad-fix injection

100

384

469

552

223

1,727

Cumulative Removal %

98.11%

98.72%

99.63%

99.98%

99.09%

99.35%

Remaining Defects

65

166

58

4

68

376

High-severity Defects

10

28

11

1

9

56

Security flaws

0

0

1

0

0

2

 
Remaining Defects

0.0047

0.0122

0.0042

0.0003

0.0050

0.0276

per Function Point
Remaining Defects

4.73

12.17

4.25

0.26

5.00

27.58

per K Function Points
Remaining Defects

0.12

0.31

0.11

0.01

0.13

0.70

per KLOC

Table 8 shows a total of 8 pre-test removal activities and 13 test stages.  Very few projects use this many forms of defect removal.  An “average” U.S. software project would static analysis and probably 4 kinds of testing:  1) unit test, 2) function test, 3) regression test, and 4) system test.  Average U.S. defect removal circa 2013 is below 90%.  Only a few top companies such as IBM achieve DRE results higher than 99%.

Military and defense software, medical systems, and systems software for complex physical devices such as telephone switching systems and computer operating systems would use several kinds of inspections, static analysis, and at least six to eight forms of testing.  For example only military projects tend to use independent verification and validation (IV&V) and independent testing.

Table 8 is intended to show the full range of defect removal operations that can be measured and predicted using Software Risk Master ™.  Table 8 also assumes that all defect removal personnel are top-guns, fully trained, and that test personnel are certified.

Pattern matching is useful for measuring and predicting quality as well as for measuring and predicting software development productivity.

 Summary and Conclusions about Software Pattern Matching

Pattern matching based on formal taxonomies has had a long history in science and has proven its value time and again.  Pattern matching for business decisions such as real estate appraisals or automobile costs are more recent but no less effective and useful.

The Software Risk Master ™ tool uses pattern matching as the basis for sizing applications, process assessments, benchmark data collection, and predictive estimating of future software projects.

As of 2013 more than 95% of software applications are not “new” in the sense that they have never been designed or built before.  The vast majority of modern software projects are either replacements for legacy applications or minor variations on existing software.

Whenever there are large numbers of similar projects that have been built before and have accurate historical data available, pattern matching is the most effective and efficient way of capturing and using historical results to predict future outcomes.

References and Readings on Software Pattern Matching

The primary citation for modern taxonomic analysis is:

Linneaeus, Carl; Systema Naturae; privately published in Sweden in 1735.

The American Society of Indexing has a special interest group on taxonomy creation and analysis: www.taxonomies-sig.org.

Note:  All of the author’s books use various forms of taxonomy such as defect classifications, defect removal methods, and application classes and types.

Jones, Capers; “A Short History of Lines of Code Metrics”; Namcook Analytics Technical Report; Narragansett, RI; 2012.

This report provides a mathematical proof that “lines of code” metrics violate standard economic assumptions.  LOC metrics make requirements and design invisible.  Worse, LOC metrics penalize high-level languages.  The report asserts that LOC should be deemed professional malpractice if used to compare results between different programming languages.  There are other legitimate purposes for LOC, such as merely measuring coding speed.

Jones, Capers; “A Short History of the Cost Per Defect Metrics”; Namcook Analytics Technical Report; Narragansett, RI 2012.

This report provides a mathematical proof that “cost per defect” penalizes quality and achieves its lowest values for the buggiest software applications.  It also points out that the urban legend that “cost per defect after release is 100 times larger than early elimination” is not true.  The reason for expansion of cost per defect for down-stream defect repairs is due to ignoring fixed costs.   The cost per defect metric also ignores many economic topics such as the fact that high quality leads to shorter schedules.

Jones, Capers; “Early Sizing and Early Risk Analysis”; Capers Jones & Associates LLC;

Narragansett, RI; July 2011.

Jones, Capers and Bonsignour, Olivier; The Economics of Software Quality; Addison Wesley Longman, Boston, MA; ISBN 10: 0-13-258220—1; 2011; 585 pages.

Jones, Capers; Software Engineering Best Practices; McGraw Hill, New York, NY; ISBN 978-0-07-162161-8; 2010; 660 pages.

Jones, Capers; Applied Software Measurement; McGraw Hill, New York, NY; ISBN 978-0-07-150244-3; 2008; 662 pages.

Jones, Capers; Estimating Software Costs; McGraw Hill, New York, NY; 2007; ISBN-13: 978-0-07-148300-1.

 

Jones, Capers; Software Assessments, Benchmarks, and Best Practices;  Addison Wesley Longman, Boston, MA; ISBN 0-201-48542-7; 2000; 657 pages.

Jones, Capers;  Conflict and Litigation Between Software Clients and Developers; Software Productivity Research, Inc.; Burlington, MA; September 2007; 53 pages; (SPR technical report).

 

Leave a Reply

Your email address will not be published. Required fields are marked *