#109 – FAILING TO GET FEEDBACK ON FIELD FAILURES – FRED SCHENKELBERG

ABC FredImagine you are requested to assist a design team in determining how to best improve the reliability of a product. You learn that the organization produces a range of point of sale (POS) devices and they have invited you to a meeting to discuss the product and ways to improve the field reliability.

To help understand the situation, you may have already started to think of a set of questions whose answers would lead to suitable recommendations:

1. What is the current field failure rate?
2. What is the Pareto of field failure mechanisms or modes?
3. What is the desired level of field failures that is acceptable (i.e., the goal)?
4. How is the product designed with respect to reliability (i.e., what are the design for reliability activities)?
5. What is the current estimate for field reliability based on internal measurements and modeling?
6. What happens when the product fails?
7. What do the failure analysis reports say about the possible causes of field failures?
8. Do field failures match internal testing results?

The meeting includes directors of engineering, manufacturing, quality, and procurement, and a handful of key engineers from those departments. They each provide a brief introduction to their products and reiterated the desire to improve field reliability. You start to ask the above questions in an attempt to understand the situation.

At first there is little provided by way of response from anyone on the team. Did you hit upon some trade secret? Were you showing your own ignorance by asking such questions? No; they did not know how many or how the product failed in the field. They had made some assumptions about use, environment, and what could, maybe, possibly go wrong. They had little evidence of field problems. They had not even talked to anyone about the nature of the field issues.

The most interesting part of the product’s design was the security feature that destroyed the memory and custom IC when the case was opened or sensed tampering. Destroyed was a pretty accurate description given the physical damage to the components on the circuit board they showed you. Once the product is assembled and the security system activated, it was nearly impossible to disassemble and conduct a circuit analysis. This would make the field failures difficult to analyze.

Compounding this ”feature” with the relatively low cost of the device led to a replacement rather than repair strategy when addressing field failures. Furthermore, the failed units were destroyed because they were deemed to have no value for further study.

One other piece of information that pertains to this search for reliability improvements is that the organization only has one customer. Every unit created went to one customer who bundled the POS device with inventory, payroll, building security, cash register, and various other elements that a small business may require to operate efficiently. The POS is only one piece of a larger kit. The service provides a single point of contact for training, installation, maintenance, and service and support.

The design team worked closely with other departments to design as robust a product as it could under the cost and other design constraints. Team members performed component derating, qualified their vendors, and conducted a wide range of product testing under a wide range of stresses. They actually did a decent job in creating a reasonably reliable product.

The problem was that they did not really know whether any of their assumptions and educated guesses were correct. They really did not know the use environment, the range of expected stresses, or even how often the devices were actually used. They did not know how to relate their internal product design and testing to what would occur with actual use.

Since any fielded unit was destroyed before any failure analysis could be conducted, they did not even have a count of how many failed for any reason nor did they have the basic information a Pareto of field failures would provide. They were blind to how the product actually performed. Also, this team had been producing POS devices for over five years and in terms of sales the devices were relatively successful.

Without even having a count of failures, how did they know they needed to improve the reliability? Was this a part-per-million improvement or a 20% field failure rate problem attributable to first-year product introduction? No one really knew. They were told to make the product more reliable, but it was impacting the warranty costs. Warranty costs are something tangible that you can analyze. How much was the company paying in warranty? What was the cost per unit shipped of warranty? Again, no one had answers to these questions.

The Director of Engineering then spoke up and tried to explain the situation. Once a year the Chief Financial Officer (CFO) and the (only) customer sit down to discuss pricing, warranty, and sales projections. It was the CFO who asked for reliability improvements. It was also the CFO who, if he had the warranty and field failure information, was not sharing it, as he considered it company-sensitive information. The CFO did not even talk about the magnitude of the field issues with anyone even in his office. He was not providing any information except to insist that they “make it better.”

At this point you likely would be rather frustrated and at a loss for what to recommend. Surely, no organization should be so blind as to how its product was performing. After some thought and further discussion you and the directors decide on two courses of action. First, you would go talk to the CFO and attempt to understand the field failure situation by explaining the importance of the information to the rest of the team. Second, the team would conduct a series of highly accelerated life tests to attempt to understand the design’s weaknesses. In parallel with this testing an attempt would be made to fully characterize the use environment and use profiles by conducting surveys, field observations, and questionnaires. To effectively conduct the tests you need to know the types of stresses the product would experience. Any process operates better when there are clear goals and a measure of performance. Is it the comparison of the goal and measure that provides the necessary feedback that enables design or process improvement.

Bio:

Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this articles consider subscribing to the ongoing series Musings on Reliability and Maintenance Topics.

 

Leave a Reply

Your email address will not be published.