Complications When Tracking Field Data
Fielded products fail day by day. Customers report these failures, generally seeking a way to remedy this issue. Gathering the reported or returned products or confirmed failures is common practice.
Depending on the product, a simple replacement or exchange may suffice. For other products, repair or a refund may be appropriate. In general, and not always, when a product fails in the hands of a customer, the organization designing, manufacturing, and distributing the product learns of the failure.
A common practice is to count the number of returns per week or month, counting as the items arrive. This tally per month is then easy to plot using a simple bar chart showing the count of returns per month over time. The issue is, as the number of units shipped change month to month, the number of items that could possibly fail changes. The number of failed units could double even when the actual failure rate for products has not changed when we ship twice as many units.
A Very Simple Example
Let’s look at a very simple example. Let’s say that a new product has a 10% failure rate in the first month and no failures after the first month, and we ship 100 units. Then the first month we would receive 10 failed units back. If this occurs for the first three months of the year, and we ship 100 units per month, then we would receive 10 units back each month.
Now let’s say in April another customer orders an additional 100 units, so we ship 200 items. Given the same failure rate, we would receive 20 units back. That effectively doubles the number of returns month over month. That’s a 100% increase in field returns per month.
In this very simple example, it is obvious that the number of units shipped doubled and maybe tracking the failure rate would be an appropriate measure because we are interested in noticing a change in the failure rate. Being able to identify such a change enables identification and resolution of the contributing factors causing the increase in the failure rate or the continuation of the causes of a lower failure rate.
Two things complicate this approach: The numbers of units produced and shipped both vary and the chance of a specific unit failing changes over time.
Shipment Variation
First, we often change the actual number of shipments per unit time. Although the forecast for shipments or sales may include nice round numbers per month, in reality it is often quite variable. If the average shipments per month is planned to be 5,000 units the long-term average may work out to be 5,000 units per month, yet the actual monthly shipments may vary. During the first month there may be only 100 units shipped, as production started just days before the end of the month. During the next month, as the production capability ramped up the production line, production was limited so only 2,523 units could be shipped. In the third month, to meet early demand the team works overtime and creates 6,467 units. Future monthly shipment will also vary.
Variation in product capability, availability of necessary components and materials, holidays (during which production is shut down), changes in customer demand, and many other elements change how many units are actually produced and shipped per month.
Failure Rate Variation
Even a simple product has dozens if not hundreds or thousands of way it can fail. Each failure mechanism has a finite probability of occurring on any specific day. It’s a race to see which failure mechanism succeeds in causing a failure.
For a specific product that experienced an error during assembly, say a missing component for a specific function, let’s say it somehow shipped to a customer. It may fail immediately on first use, or it may lie dormant for months before that specific function is called into action and then exhibits the failure, or the missing part could lead to slow degradation of a function over many years, only resulting in a reported failure many years after first use.
The same basic variability applies for each specific failure mechanism. A wear-out mechanism may occur early with aggressive overuse or only after an exceptionally long period of light, infrequent use. Corrosion-related failure mechanisms may occur quickly or not at all given the local humidity conditions. In general, there is some pattern to specific failure mechanisms, yet they do exhibit variability of when failures occur.
A Slightly More Complex Example
Let’s complicate the simple example described above. Instead of a fixed first month failure rate of 10%, let’s say we have the following number of returns given 100 units initially shipped:
Month Returns
Jan 1
Feb 5
Mar 4
At the end of three months the total failure rate is 10%, yet in the first month is was only 1%, then jumped to 6% in the second month.
Now let’s imagine this organization ships 100 units in February and then again in March and each month’s production follows the same failure pattern. What would that look like over the first three months of production tracking cumulative shipments and returns per month?
Month Returns Shipments
Jan 1 100
Feb 7 200
Mar 17 300
Plotting the number of failures per month alone in not informative. Plotting the failure rate per month accounts for the number of units shipped, yet again it is not very informative. The cumulative failure rates for the three months are 1%, 3.5%, and 5.6%.
The problem is that after three months customers will experience a 10% chance of product failure—not 5.6%. Tracking cumulative failure rates using the cumulative number of returns and shipments under-reports the failure rate in this case for customers that have the initial month’s units, as those units are now three-months old. It may take many more months to recognize the underlying pattern of failures based on the age of the individual units.
Tracking and reporting based on the age of the unit is a better approach. Conducting a time-to-failure analysis of the data allows us to consider the probability of failure over time, just as the customer experiences the product.
Bio:
Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this articles consider subscribing to the ongoing series at Accendo Reliability.