#435 – BRIEF INTRODUCTION TO HALT – FRED SCHENKELBERG

Highly Accelerated Life Testing (HALT) is a technique to expose weaknesses or faults with a product.

HALT uses individual or combined stresses in a step stress approach to quickly apply sufficient stress to reveal defects.

HALT is not a specific chamber or fixed set of test conditions. It is an exploratory process to reveal weaknesses in a design.

The product development process naturally includes a check step, to determine if the expected functions of the product work as expected.

Some teams then add a measured amount of stress (temperature, vibration, dust, load, etc.) to the product to explore functionality at elevated stress levels.

When the stress levels continue to increase until the product ceases to function, that could be called HALT.

I like to call it discovery. The idea is to determine what fails.

Action Upon Failure During HALT

For HALT to be effective, the two part response to failures is critical.

  • First, determine the root cause of the failure.
  • Second, improve the design or manufacturing process to eliminate the failure cause.

The symptoms that indicate a product function has failed is only a failure mode.

Determine if the cause of the failure was due to over current/heating, a logic error, a timing error, or technological limit.

The basic steps outlined in the discussion on failure analysis apply here.

In practice, we mitigate or patch the fault in order to continue discovering faults.

One of the tenets of HALT is to find as many faults as possible as quickly as possible. Balance the speed of finding failures with gathering sufficient information to later determine the root causes.

The second step is to take action based on the failure root cause information.

This may take a number of different paths depending on the nature of the failure.

If the failure is due to a technology limit, say a polymer softens at high temperature, then the team must decide if the difference between the expected operating temperatures and the temperature when the material softens is a sufficient margin.

Is there little to no chance that the temperature during use will approach the temperature that causes the failure?

If the polymer softens at 70°C and the expected use temperature is 60°C, that may indicate little margin, compared to a polymer with a softening point of 150°C.

The chance of the producing seeing 70°C is higher than experiencing 150°C given an expected operating temperature of 60°C.

In general, the more margin the better.

HALT in 3 phases

Harry W. McLean in HALT, HASS, and HASA Explained: Accelerated Reliability Techniques, describes the three phases of HALT.

  1. Pre-HALT
  2. HALT
  3. Post-HALT

Pre-HALT is the planning stage.

Discuss what is likely to fail.

You can use an FMEA or risk analysis study to understand specific functions and stresses to explore during HALT. In my experience, each product development engineer has a short list of what is likely to fail first.

This is based on their experience, design tradeoffs, and often little more than hunches.

Document what is expected to fail.

Plan to be able to detect those specific failures. Include the ability to check the general functionality of the product during testing.

For example, if there is a test mode or simulated operation capability, plan to have the supporting equipment available during the HALT.

Being aware of what could failure and being able to detect failures, increases that capability of HALT to find faults, both expected and unexpected.

Determine which stresses to apply.

In actual operation, the product experiences all stresses simultaneously. In the lab, we have the capability to focus on a single or small set of stresses. In practice, the product should be functioning in normal or ambient conditions.

Then increase the application one or more stresses in order to determine the extent of the margin for specific stresses.

The selection of stresses to employ includes those the risk analysis expect will have the least margin, plus those that commonly cause failures for the technology and materials involved.

For electronics, temperature and vibration typically cause a majority of failures, yet voltage and current variation, signal or traffic loading, and other stresses may apply.

Fixtures, cabling, stress application, fault detection, and failure analysis capabilities, are all part of HALT planning.

Once the prototypes are available, be well prepared to start HALT.

During HALT start at ambient. 

Turn on the product and check the diagnostics or fault detection process is working as expected.

A common practice is to apply steps of increasing stress for the stress least likely to cause failures.

Establish a reasonably large margin between expected operating conditions and the product is still working then move to the next stress.

For example, for an electronics product, cold temperatures cause fewer types of failure than high temperature. So, for a product expected to operate as low as 0°C, start there and step down in temperature.

If the product still operates at -40°C, then switch to another stress, say high temperature. If the product still operates 40°C over specifications, move to another stress.

Explore the boundary by probing each stress.

Think of it as exploring the size of a room.

Move in one direction, then another to define the open space of a room. The walls are defined by failures.

Given a limited number of prototypes available for HALT, first, explore each stress to determine if there is some margin.

Then revisit each stress and find the point of failure, the wall in the room analogy.

With each failure, gather as much information as possible about the stress conditions, the status of various elements of the product, and anything observable about the failure.

For example, was there flickering prior to the screen going black?

Return the stress levels to nominal and check if the functionality returns. If so, the stress level that caused the failure is an operating limit.

This often is well beyond the product specification operating limit, which is fine. If the product does not recover, we call that a destruct limit.

Keep testing and exploring.

Patch, repair, replace or isolate elements that cause failures. Continue to explore stresses to find failures.

Add combined stresses, such as temperature and vibration, to continue to reveal faults.

According to Mike Silverman in the paper, “Summary of HALT and HASS results at an accelerated reliability test center,” approximately 20% of unique failures occurred only when using combined stresses.

Post-HALT is time to complete the root cause analysis and improve the design.

You may need to prioritize which failures to remedy based on the nature of the failure (safety issues, for example) or the amount of margin.

Improve the design to be robust to the suite of expected stresses, thus making the product less likely to fail when in use.

Summary

There are books by Gregg Hobbs and Harry McLean about HALT. There are many articles and papers, too.

The basic idea of HALT is to explore the functioning of a product using elevated stresses to quickly reveal faults.

Then improve the design to create a robust and reliable product.

BIO:

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

Leave a Reply

Your email address will not be published.