There’s a lot we don’t yet know about this classic disaster, but nonetheless it is not too early to examine several obvious flaws in the approach. Top software professionals knew at the outset of various measures that would have saved a lot of expense and embarrassment, but were not applied. It seems clear this project was “managed” by comparative amateurs clearly not up to the management challenge. This effort did not fail for technical reasons – virtually all of the issues that have arisen were foreseeable and preventable.
According to published reports $630 million has been spent to develop this system – perhaps 50% of that to build the software and the other half for servers and networks. In this article I will focus on the software development risks and risk reduction strategies that could have been (but apparently were not) employed. I examine these more fully in my book Managing the Black Hole: The Executive’s Guide to Software Project Risk.
DIFFERENT PARTIES – DIFFERENT RISKS
Kaplan and Mikes risk management framework[i] identifies three categories of risks: ‘Preventable’, ‘Strategy’, and External.’ The authors propose different strategies (control models) to address the different categories of risk. In the case of Healthcare.gov we might apply this approach as summarized in the following table:
Party | Risk Category | Control Model |
Obama Administration | Strategic | Independent Experts |
Software Development Contractors (CGI, et al) | Internal/Preventable | Rules Based |
Insurance Industry | External | Tail-risk stress tests |
STRATEGIC RISK CONTROL
The Administration could have, and should have, formed a panel of expert advisors to help identify significant risks and ensure those risks were consistently addressed with known best practices. Critical categories of expertise that were apparently missing include:
- Size / cost / schedule / quality estimating and forecasting – as far as is known no independent estimates were developed to ‘sanity test’ contractor estimates. This is always necessary for large projects. Several firms with extensive expertise in this area are available. Buyers should NEVER dictate a schedule – unrealistically compressed schedules invariably lead to failure – don’t deny the ‘laws of physics’ – if no one else has done a project of the indicated size on the schedule you want, your team won’t either. When the deadline is really fixed, as in this case, scope MUST be adjusted.
- Capacity / performance management – surely the initially long responses times and capacity limitations were foreseeable by experts such as Amazon and Google have on their staffs.
- Application domain expertise – surely something useful could have been learned for prior experience in Massachusetts. Perhaps some components could have been reused. Coordination with California and others could have been fruitful.
- Systems engineering, systems integration, project and program management. Firms in the aerospace industry know how to organize and manage large projects – DHS staff clearly does not possess the necessary experience – evidently they did not even know what they did not know.
INTERNAL RISK CONTROL
Surely CGI and others incurred substantial damage to their reputations – with an appropriate set of internal rules and standards they could have saved themselves a great deal of embarrassment. Among the most important are:
- Formal quantification of the ‘size’ of a project (using a generally accepted method such as ‘function points[ii]‘) before budget or schedule commitments are made. This is a shared responsibility between the contractor and the buyer – neither should agree to undertake to put five pounds of sand into a three pound bag.
- Detailed quality plan that includes a forecast of the number of defects likely to be ‘inserted’ during each phase of a project. Industry benchmarks are available that will enable estimation of the number of requirements, design, code, and bad fix defects likely to be ‘inserted’ per size during the development process. The quality plan must specify a specific target for delivered quality (e.g., 99% of forecasted defects will be removed prior to delivery[iii]). The quality plan must include a credible estimate of effort and methods planned to be used to find and fix the expected number of defects. Again, industry benchmarks provide a basis for evaluating plausibility of the plan. Testing alone will NEVER be sufficient. Require periodic independent monitoring of forecast vs. actual. 40-60% of total software cost is typically associated with finding and fixing defects – best in class groups may reduce that to 20-30%. Failure to do this leads to inadequate time to test at the end – exactly what happened in this case.
EXTERNAL RISK CONTROL
Clearly the Insurance industry has been burned by this experience. Equally clearly they have had little control over the outcome. Kaplan and Mikes suggest “tail-risk stress tests” as an approach the industry might have used to foresee and perhaps prepare to respond to the risk that actually materialized.
These are just some of the risks to think about in your next software project.
Bio:
Feel free to contact me if you would like to dive deeper in to any of this – ggack@process-fusion.net
[i] “Managing Risks: A New Framework”, Harvard Business Review, June 2012
[ii] Function Points are defined by an ANSI/ISO std. – see http://en.wikipedia.org/w/index.php?title=ISO/IEC_20926:2009&action=edit&redlink=1. Function Point counting is labor intensive and is based on known requirements. In practice size is needed before requirements are fully known. Capers Jones has devised an approach known as Software Risk Master that uses an analogy process to determine size very early – see www.namcook.com
[iii] This generally means using a metric known as “Total Containment Effectiveness” (TCE). Industry data show cost and schedule are minimized when TCE is above 95%