Why do so many avoid failure? In product development of plant asset management, we are surrounded by people who steadfastly do not want to know about nor talk about failures.
Failure does happen. We cannot ignore this simple fact.
The Blame Game
Unlike a murder mystery, failure analysis is not a game of whodunit. The knee jerk response to blame someone rarely resolves the problem nor creates a reliability-minded workplace. If the routine is to blame someone, when a failure is revealed, fewer people will reveal failures. If it is clear that we do not want to talk about failures in a civilized manner, well, we’ll just not talk about failures.
Failures will still occur, however. The blame-centric organization will have the majority of people who could understand and solve problems simply turn and avoid ‘seeing’ failures. When friends and colleagues are vilified in order to ‘solve problems’, it’s not safe to recognize failures.
Root Cause Analysis
Root cause analysis constitutes just one step in the failure analysis process, yet it is critical to get it right. The basic idea is to understand the fundamental (molecular, physics, chemistry, or material property) level of the circumstances and events leading to failure. We should be able to reproduce at will the issue and turn off or avoid the failure at will. Then we understand the root cause.
Techniques such as 5 Whys provide a framework to ensure we understand the cause of failure. Equipment from magnifying lenses to scanning electron microscopes help us ‘see’ the physical and chemical clues.
The Failure Analysis Process
The eight disciplines (8D) method is a common failure analysis process. There are many variations, yet the pattern tends to remain the same. Here’s the basic approach:
0 Gather information, symptoms, and circumstances. This should be done upon initial recognition of a failure. If needed, implement any emergency response required.
- Form a team. This can be just a couple of people or a formal multidisciplinary team depending on the magnitude of the failure and associated consequences.
- Describe the problem. Make a list of what is and is not known. The more detail and facts compiled here the better will be the failure analysis.
- Develop an immediate response and containment plan. Isolate the batch, stop shipments of suspect products, etc. Limit the occurrence of additional failures if at all possible. If there is an immediate workaround or patch, use that to mitigate and avoid failures. This is not the solution; it is just a stop-gap action.
- Perform a root cause analysis. This is the sleuthing part, in which we determine what actually happened at a fundamental level. One piece of advice here is not to send suspect components to suppliers or vendors for failure analysis work. It takes too long and rarely results in a meaningful root cause analysis. Instead, use internal or contracted failure analysis labs. These may cost more but the analysis will be quicker and clearer.
- Take corrective action. Once you have a fundamental understanding of the root cause, only then should you implement corrective action. This may include a design, material, or process change.
- Test the solution and verify that it actually works. Monitor as long as necessary to validate whether the solution provides a fundamental resolution.
- Take preventive measures. Based on what you learned, what can your organization learn to avoid similar issues in the future? This is often the most difficult step. Distance yourself from the immediate problem and review the processes in design and production that created a situation where the failure occurred.
This is not the step in which to add more controls and checks; rather, it is the step during which you assess the process and improve your ability to make better decisions in the future. For example, if the root cause for a material defect is the use of an unstable additive, then simply concluding that we list that additive to a ‘do not use’ list is shortsighted. Instead, investigate what part of the process should have revealed the faulty material choice. Why was the stability question not asked earlier in the process? Was it a lack of resources or the team’s focus on time to market? What system structure blinded us in identifying the issue earlier? Use this as an opportunity to learn from the failure, not just as a step to resolve the immediate issue, but as a chance to learn how to avoid making similar mistakes in the future.
- Celebrate the success.
Summary
Every organization has stories about failures—especially organizations that ‘do not talk about failures.’ Failures happen and when they do we can learn and improve our organization.
Bio:
Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this articles consider subscribing to the ongoing series at Accendo Reliability.