As the Danish physicist Niels Bohr once proclaimed “Prediction is very difficult, especially if it’s about the future.” From the general wondering about the enemy’s next move, to the corporate board members estimating the capabilities of the competition’s next product, to the maintenance manager ordering spare parts, we have many uses for knowing the future.
Lacking a crystal ball, we often look to past performance to provide an indication of the future. For instance, has this mutual fund regularly provided adequate returns? If so, we predict it will do so going forward. However, anyone who has reviewed mutual fund performance also has read the admonishment to not use past performance to estimate future returns. Mutual funds, markets, businesses, and battlefields all change and respond in sometimes unforeseen ways.
Of course, when faced with a decision we often do need to formulate some prediction about future conditions and possible outcomes. Whether investing or ordering spare parts or preparing a design for production, we use predictions about the future to help determine the right course of action.
Reliability Predictions: A Case Study
Let me recount one of my own experiences with prediction. While I was working as a new reliability engineer at corporate headquarters, a senior reliability engineer in division called to ask me if I could run a parts count prediction on one of its products, specifically a Bellcore (now Telecordia) prediction on the product’s two circuit boards. I said yes, despite having never done one before nor really even knowing what a parts count prediction was or how it was useful.
I quickly learned that the basic parts count prediction used the bill of materials and a database of failure rates to tally the expected failure rate for the circuit board. A multilayer ceramic capacitor had a failure rate of 5 FIT (failures per 109 h), and the analog ASIC was listed with 450 FIT. The software I had helped match the components to their failure rates and did the math, resulting in a final estimate for the expected failure rate of the product when used by customers.
It took about 2 h to make the prediction, of which half or more of the time was spent learning the software. Not having any information other than the bill of materials, I set all the prediction software values to defaults: nominal temperature, derating, quality level, etc.
Prediction Questions
This prediction process seemed like magic: Pour in a list of parts and after a few milliseconds of computing time we know the future—or do we?
My first check was on the notion that many of our products failed because of power supplies, connectors, and fans. The prediction results listed the power supply and connectors in the top five of expected failure rates, and there was no fan in the system, so this seemed about right. The more complex components were expected to fail more often or sooner than simpler components.
Where did the failure rates listed in the table come from? How did the folks at Bellcore know enough to list the values? With a little reading and a phone call I learned that periodically the team at Bellcore would gather failure rate information from a wide range of sources, including the Government–Industry Data Exchange Program (GIDEP) and major telecommunications companies. They would sort and analyze the data and create historical models of the failure rates, including the effects of temperature, derating, quality, etc. The equipment studied was primarily used in the military and telecommunications infrastructure, being mostly boxes with circuit boards.
The electronics industry changes considerably in five years, yet it was clear that unless we carefully resolved every failure to the component level and knew the use conditions we would be hard pressed to do better than the team at Bellcore. The product for which I made the prediction was similar to products in the telecommunication industry.
Then I wondered about the calculations being done once the software had the bill of materials. Apparently, the approach was rooted in the time prior to computers and used a few simplifying assumptions to make the calculations easy to accomplish with mechanical adders and a slide rule. One of the properties of the exponential function is the ability to add exponents. So, if we assume every failure rate is constant over time, we can use the exponential distribution to model the failure rate. Then for a list of component failure distributions we simply add the failure rates. Then we can estimate the reliability R at any time period of interest by calculating a single product and a single exponent: R(t) = e−lt, where l is the failure rate and t is time.
The underlying premise of this assumption is that components and therefore products exhibited a constant failure rate. Despite our knowing that this was not true for any of our products based on careful qualification and field data analysis, for the parts count prediction we made this assumption. This cast a serious shadow over the accuracy of the prediction
There were additional questions that produced inadequate answers, further eroding my acceptance of the results produced by using the parts count prediction. I did not want to send back a report with faulty predictions but did not know how to proceed. Furthermore, I recalled that admonishment included with financial historical data and wondered way we even tried to estimate the future of failure rates.
Value of Predictions
To get a better handle on the value of my predictions, I first called the reliability engineer who had requested the prediction. He agreed that what I did was fine but shared my concerns that the result was not even close to the actual failure rate. He assured me that he and the team would not take the value too seriously—in fact they were not going to use it at all!
So, why did I just spend my entire morning doing this prediction for them? It turned out that the prediction report was requested by a major customer as a condition of the purchase. The customer did not really know what to do with the reported parts count prediction: It was simply a check-off box to be marked for the sale to occur.
Second, as a troubled young engineer, I sought advice from my mentor. He said that we basically understood that any prediction was wrong, just as all models are wrong. Some predictions can be useful, and so some reliability predictions can also be useful. In this case, the value of my two hours was to help secure a multimillion dollar sale by meeting the customer requirements.
The value of any prediction, whether from a parts count or physics of failure model, was not in the actual resulting value. The value was in what we did with the result. For reliability engineering work, even a parts count, even in its simplest form, encourages using fewer parts and operating at lower temperatures. Both are good for product reliability in general; thus the resulting behavior to reduce part counts and temperature increases product reliability.
We use reliability predictions to estimate a product’s performance. There are many ways to create an estimate but all of them are most certainly wrong. Yet, there are times when the prediction provides insight or information that allows critical improvements to be made, and at other times it is just a check box. Reliability professionals strive to enable decisions to be made with the appropriate tools and analysis. This is done by matching the approach to the task and the task’s importance. Assumptions, limitations, accuracy, and options must all be disclosed. Lacking a crystal ball, we can still make predictions but their validity must be understood within the proper context. After all, we are dealing with the future and even Neils Bohr cannot help us.
Bio:
Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this articles consider subscribing to the ongoing series Musings on Reliability and Maintenance Topics.