One of the enjoyable parts of reliability engineering work is the consistent need to learn. We learn how new materials, designs, applications, and systems work, and fail. Sometimes we learn through proactive characterization studies, sometimes via unwanted field failures.
Failures will occur, it is what we learn from them that matters. The ability to gather and remember the lessons learned is a common and ongoing need for every organization. We are not very good at it, in general.
The Motivation to Create an Effective Lessons Learned System
Years ago, Phil, a technical marketing manager, was called into the general manager’s office. The organization had shipped a new product that was experiencing a very high field failure rate. Instead of the 2 to 5% expected failure rate over the first year of use, the new product in less than two months exceeded 30% failures of all products shipped.
The engineering teams quickly sorted out the root cause and discovered they had systematically fell into a common design paradigm. As Henry Petroski describes in his book Design Paradigms one peril when making a design change to solve a problem, we often then do not set back to determine what else the change may impact. We assume the change is minor enough to not have adverse impact to other parts of the system.
The latest, and significant, issue was not the first to occur due to not checking the impact of a design change. Thus, the conversation with Phil included a request to establish a system to prevent all repeats of already learned lessons.
Phil was asked to do this with no budget, no staff, and the sooner the better. As an aside, this group worked near the 1800’s California gold rush region.
What would you do?
The Basic Idea and Process for Golden Nuggets
Petroski also describes the problem organizations face of forgetting lessons learned in a systematic manner. Phil found the everyone in the organization fully understood the three recent root causes of major field failures, yet did not easily recall other issues from a few years ago. Few folks that we involved with past issues did recall some details, yet that knowledge was often isolated to just a few people in the organization.
So, Phil reviewed the records of past failure reports and made a list. Then condensed the list to the memorable stories of past lessons learned for the lessons that encompassed an entire class of potential mistakes due to a similar faulty logic, assumptions, or lack of awareness.
For example, deep in the records this organization seemed to forget that ceramic capacitors are susceptible to cracking, thus failing. The had to redesign a circuit board 9 years ago and 4 years ago for the same basic issue. The organization had forgotten that some parts are fragile, so despite the basic design guidelines on component placement for such devices, the knowledge of why the guidelines existed faded.
It was the decay of knowledge and awareness that lead to forgetting lessons learned.
What Phil did was summarize on one page a short description of 8 past lessons learned. Each one, when briefly described included the essence of the lesson learned and a reference for the past episode details. The capacitor story was generalized to something like, ‘ceramic is like glass, it breaks easily’. Instead of focusing on one component from one vendor, the lesson focused on the element that the team had learned which would apply to many similar components and materials.
The second thing Phil realized was the best person and time to have lessons learned knowledge refresher discussion was with the person laying out the development plan for a new product or redesign of an existing product. That person was the one in charge of allocating resources to solve the design and production challenges for the project. That person also set priorities for the team.
Phil spent 30 minutes discussing the shortlist of golden nuggets (key lessons learned) and asked the project manager what they were going to do to avoid forgetting each lesson. Then wrote it down and commented that near the end of the development process he would return to ask what had been done.
For example, for the ceramic cracks lesson, a project manager may assign an engineer to review the placement and handling of all ceramic components to determine the risk of cracking. Phil would write down a note that Sara, the assigned engineer, had that task.
The second short meeting just prior to the decision to launch was again 30 minutes. It was the check step, did the plan actually occur. Did the team remember and apply all the key lessons, golden nuggets?
Over a period of the next five years, not one field issue could be traced to repeated a mistake or forgetting a golden nugget. The list of key lessons learned grew to 21 issues as the team continued to ‘learn’ the hard way by making mistakes, as we all do. Yet, they enjoyed a simple process to never forget.
Why Common Lesson Learned Systems do not Work
With close to 200 reliability program assessments, I’ve seen some great and not-so-great product development organizations. One element I always explore is their lessons learned system.
Most use what I call the ‘legends and lore’ method. Essentially, the process is based on the premise that we’re all smart folks and will recall those essential lessons we learned during our careers. It also relies on the person that does recall a specific lesson is in the room during design reviews to share the lesson.
In Petroski’s book, he describes that the engineers that first design a novel solution for a problem, like the first suspension bridge, are already accomplished, experienced engineers. Their reputation is on the line. They understand what they know and don’t know. Their designs often include sufficient margin to account for the aspects of the design that is still uncertain.
The second set of design using that new solution typically learns directly from the experienced engineer that first implemented the design. Over time, as the concept becomes standard practice the task to implement the design falls to less and less experienced engineers. It is this third group that doesn’t have the experience nor the understanding of the uncertainties. So, they are likely to build the bridge a foot longer than the materials are able to span, and the bridge (the design) fails spectacularly.
A second method a few organizations use is the ‘record everything’ approach. These organizations have databased full or failure records and root cause analysis reports. One group captured and stored meeting minutes, development drawings, supplier contracts, plus every analysis and experiment the team had accomplished.
After collecting the data and some information, they would ask the development team to review the lessons learned as they started work on the new project. A few engineers would visit the ‘vault’ of dusty, disorganized stacks of records and poke around a bit. In my discussions with engineers and managers, few spent much time when faced with the daunting amount of data/information in the ‘vault’. In one organization, a search for capacitor returned 40,000 records and did not include which of those were useful to refresh the reader’s memory that ceramic cracks.
A third method is to implement a test to detect the fault or error due to something the organization had previously learned. One such organization had over 50 specific tests expected to be conducted late in the development process, just to make sure they didn’t repeat a mistake. If you call, you cannot test in quality, and this approach didn’t retain the why the test was implemented, it just was required, and so done.
Keep Golden Nuggets System Simple
The essential element of the golden nuggets system Phil implemented was the distillation of past lessons learned into a description of the generalized lesson. Each lesson did have a poignant story from the organization’s past, yet the takeaway was applicable to a broader type of error that the team should avoid. Instead of remembering that vendor X’s part 3P789UIE5503 failed in 1994, the lesson is ‘ceramic cracks’.
Phil was part of a small team the regularly reviewed field failures and proposed solutions. That team is common within organizations to prioritize resolution of issues and to focus on implementing short and long-term solutions.
Only, a small subset of the issues routinely seen by the team would graduate to become a golden nugget. The nugget had to have a fundamental lesson with broad applicability. And, the nugget had to be memorable with just a short description.
The two short meetings, one when the project manager allocated resources and set priorities, and the second to check on what had actually been accomplished, both refreshed awareness of key lessons learned and established accountability to not forget these key lessons.
The golden nuggets process is deceptively simple and takes discipline to implement well. What has your organization forgotten that it should not have?
Bio:
Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this articles consider subscribing to the ongoing series at Accendo Reliability