#100 – REACTIVE VS. PROACTIVE APPROACH TO RELIABILITY – FRED SCHENKELBERG

Do you let events happen to you, or do events follow your designs and expectations? Are you a spectator or an actor? Do you wonder about your product’s future or do you control your product’s future? Are you reactive or proactive?

Every reliability and maintenance program is a system. Every program has inputs, such as product testing results and field returns. Likewise every reliability program has outputs, such as product design and production. In the most basic terms, a reliability program includes product specifications for functionality including expected durability. The program includes some form of design, verification, production, and field performance. Given this basic lifecycle description, two types of approaches to executing the product lifecycle are possible: reactive or proactive.

Let’s consider the notion that every product will eventually fail. Even the most robust product on Earth will fail when the Sun expires. Well before the collapse of the solar system, most products made today will have completely failed. The failures will result from deterioration of materials, stress conditions (e.g., a lightning strike), or simply misuse. Some will products simply wear out, others will become obsolete and lose compatibility with other systems, while others will simply cease to provide sufficient value.

Another important notion is that, upon product design, there are a finite number of faults in the design. A button has a limited number of actuation cycles before accumulated stress cracks the switch dome. A material has a degradation mechanism (e.g., corrosion or polymer chain scission) that slowly deteriorates the material’s strength. A software bug can disable the equipment temporarily. Further, there are possible defects designed into the product that do not account for production variation, user demand, or environment variations or do not anticipate user expectations. In every case, sooner or later, the design flaw will lead to failure. Nonetheless, given only a finite number of failures, it is possible to find and remove most design errors.

The most common way to approach product reliability is to wait for product failures and then respond with analysis, adjustments, and refinements in an attempt to improve product reliability. The naive wait for the failure reports from customers before taking action. The team’s logic, if even considered, is the following:

We are good designers.
The customer will use the product in unforeseen environments and applications.
If there are customer failures, then we will consider improvements.

For some products, with limited release and ample time to redesign the product, this may be a perfectly feasible approach.

A simple improvement the design team could consider is an estimate of the customer’s use profile and environmental conditions. Armed with this information, the team then evaluates the impact of the conditions on the product’s reliability though standardized testing. Setting testing conditions at or slightly above expected operating environments enables direct evaluation of the design to meet expected conditions. The faults found would be similar to the failure expected to occur in the customer’s hands, and there may be time for a redesign before the product is shipped to customers. However, following this logical path may lead to a broad spectrum of testing that is both expensive and time consuming.

Part of the logic of product testing includes the notion that “if we test in enough ways over the full range of use and environmental conditions, we should find and correct every design fault.” There is often a heavy reliance on industry standards and common test methods for every product.

Further improvements to product reliability can refine this reactive method. These include using simulations and risk analysis and performing early evaluation and testing of subsystems and components. The overall approach is often limited by knowledge of actual use conditions, lack of test samples, and lack of time.

Moving to a proactive approach can reduce the amount of product testing and increase product reliability. Although this may seem similar to the reactive approach, it involves a focus on failure mechanisms instead of test methods. Products fail because they do not have sufficient strength to withstand a single application of high stress (being dropped, static discharge, etc.) or they accumulate damage (wear, corrosion, drift, etc.) with use or over time. Thinking though how a product could fail by considering the materials, design, assembly process, and the same for vendor-supplied elements helps the product team determine a list of possible failure mechanisms.

In this approach, not all the failure mechanisms will be fully understood or characterized. The risk in this case is the decision to launch the product while not understanding the possibility or potential magnitude of product failure. The amount of risk itself is unknown. Therefore, the proactive team proceeds to characterize the design or material under the expected use conditions. The intent is to reduce the uncertainty of the risk.

A second result of the proactive approach to risk assessment is the rank ordering of failure mechanisms by expected rate of occurrence. One way to accomplish this ranking is to evaluate the stress versus strength relationships. Items with the largest overlap of the two distributions (stress and strength) have the highest potential for failure. The solutions may include increasing strength or reducing the variance of the strength.

A third result of the risk assessment is similar to the stress and strength evaluation and includes the impacts of time or usage on the change in the stress and strength distributions. Either curve may experience changes to the mean or variance over time. This may be due to degradation, wear, or increased expectation of durability by customers.

The proactive approach takes more thinking and an understanding of how testing stresses create failures and includes characterization of product designs, materials, and processes and their related failure mechanisms.

In summary, in a reactive approach one creates a design and then waits for field returns or standard product testing failures to prompt product improvements. In a proactive approach one anticipates failure mechanisms, experimentally or via simulation, characterizes the response of the design and materials to expected stresses, and then designs.

There are other aspects that identify a reactive versus proactive reliability program. For example, if the only time management discusses product reliability is when a major customer complains about product failures, that is a reactive approach. If the management team regularly inquires and discusses the risk a particular design presents to reliability performance, that is a proactive approach.

Bio:

Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this articles consider subscribing to the ongoing series Musings on Reliability and Maintenance Topics.

CERM ® RISK INSIGHTS

Future of Quality: Risk™

#100 – REACTIVE VS. PROACTIVE APPROACH TO RELIABILITY – FRED SCHENKELBERG

Leave a Reply Cancel reply